This article explores how re-ranking methods enhance retrieval recall across anatomical structures in AI models. By applying re-ranking, all evaluated models—DreamSim, DINOv1, and SwinTransformer—show improved performance. While DreamSim consistently achieves the best results in region-based and localized retrieval, DINOv1 and SwinTransformer also excel in specific conditions. The findings highlight how re-ranking not only raises recall rates but also strengthens localization, proving its critical role in medical imaging and anatomical AI systems.This article explores how re-ranking methods enhance retrieval recall across anatomical structures in AI models. By applying re-ranking, all evaluated models—DreamSim, DINOv1, and SwinTransformer—show improved performance. While DreamSim consistently achieves the best results in region-based and localized retrieval, DINOv1 and SwinTransformer also excel in specific conditions. The findings highlight how re-ranking not only raises recall rates but also strengthens localization, proving its critical role in medical imaging and anatomical AI systems.

Boosting Anatomical Retrieval Accuracy with Re-Ranking Methods

3 min read

Abstract and 1. Introduction

  1. Materials and Methods

    2.1 Vector Database and Indexing

    2.2 Feature Extractors

    2.3 Dataset and Pre-processing

    2.4 Search and Retrieval

    2.5 Re-ranking retrieval and evaluation

  2. Evaluation and 3.1 Search and Retrieval

    3.2 Re-ranking

  3. Discussion

    4.1 Dataset and 4.2 Re-ranking

    4.3 Embeddings

    4.4 Volume-based, Region-based and Localized Retrieval and 4.5 Localization-ratio

  4. Conclusion, Acknowledgement, and References

3.2 Re-ranking

This section presents the retrieval recalls after applying the re-ranking method of Section 2.5.

\ 3.2.1 Volume-based

\ Table 12 and Table 13 show the retrieval recalls for 29 coarse anatomical structures and 104 original TS anatomical structures using the proposed re-ranking method. All the recalls are improved using re-ranking. The performance of the models for 29 classes is close with only slight differences. DINOv1 and DreamSim have a slightly better recall in comparison, with an average recall of .967 but the standard deviation of DINOv1 is slightly lower (.040 vs. .045). In 104 anatomical regions, SwinTransformer performs better than the other models with an average recall of .924 but its standard deviation (.072) is the lowest.

\ Table 12: Volume-based retrieval recall of coarse anatomical regions (29 classes) using HNSW Indexing and re-ranking. In each row, bold numbers represent the best-performing values, while italicized numbers indicate the worst-performing. The separate average and standard deviation (STD) columns are color-coded, with blue indicating the best-performing values and yellow indicating the worst-performing values across different models. Additionally, bold numbers in colored columns represent the best classes in terms of average and standard deviation, while italicized values represent the worst-performing class across the models.

\ Continue with the next figure

\ Table 13: Volume-based retrieval recall of all TS anatomical regions (104 classes) using HNSW Indexing and re-ranking. In each row, bold numbers represent the best-performing values, while italicized numbers indicate the worst-performing. The separate average and standard deviation (STD) columns are color-coded, with blue indicating the best-performing values and yellow indicating the worst-performing values across different models. Additionally, bold numbers in colored columns represent the best classes in terms of average and standard deviation, while italicized values represent the worst-performing class across the models.

\ 3.2.2 Region-based

\ Table 14 and Table 15 show the retrieval recall for 29 coarse anatomical structures and 104 original TS anatomical structures employing the proposed re-ranking method. Using the re-ranking, the overall performance of all the models has improved. DreamSim performs the best with the average retrieval recall of .987 ± .027 and .987 ± .024 for 29 and 104 classes, respectively. There are only slight variations between the performance on coarse and all the original TS classes. Similar to the count-based method in the anatomical region retrieval many classes are perfectly retrieved (recall of 1.0). There is a low variation among models and between classes where the highest standard deviation is .064 and .042.

\ Table 14: Region-based retrieval recall of coarse anatomical regions (29 classes) using HNSW Indexing and re-ranking. In each row, bold numbers represent the best-performing values, while italicized numbers indicate the worst-performing. The separate average and standard deviation (STD) columns are color-coded, with blue indicating the best performing values and yellow indicating the worst-performing values across different models. Additionally, bold numbers in colored columns represent the best classes in terms of average and standard deviation, while italicized values represent the worst-performing class across the models.

\ Continue with the next figure

\ Table 15: Region-based retrieval recall of all TS anatomical regions (104 classes) using HNSW Indexing and re-ranking. In each row, bold numbers represent the best-performing values, while italicized numbers indicate the worst-performing. The separate average and standard deviation (STD) columns are color-coded, with blue indicating the best-performing values and yellow indicating the worst-performing values across different models. Additionally, bold numbers in colored columns represent the best classes in terms of average and standard deviation, while italicized values represent the worst-performing class across the models.

\ 3.2.3 Localized

\ Localized Retrieval Recall Table 16 and Table 17 show the retrieval recall for 29 coarse anatomical structures and 104 original TS anatomical structures after re-ranking for L = 15. Re-ranking improved the localization for all the models. DreamsSim is the best-performing model with an average recall of .955 ± .062 for coarse anatomical structures and .956 ± .055 for original TS classes. Although the retrieval is lower compared to region-based or volume-based evaluation, it is still high which shows that the pretrained vision embeddings not only can retrieve similar cases but also can localize the corresponding region of interest.

\ Continue with the next figure

\ Table 17: Localized retrieval recall of all TS anatomical regions (104 classes) using HNSW Indexing and re-ranking, L = 15. In each row, bold numbers represent the best-performing values, while italicized numbers indicate the worst-performing. The separate average and standard deviation (STD) columns are color-coded, with blue indicating the best-performing values and yellow indicating the worst-performing values across different models. Additionally, bold numbers in colored columns represent the best classes in terms of average and standard deviation, while italicized values represent the worst-performing class across the models.

\ Localization-ratio The localization-ratio is calculated based on (7). Table 18 and Table 19 demonstrate the localization-ratio for 29 coarse and 104 original TS anatomical regions after re-ranking for L = 15. The bestperforming embedding is still DreamSim with a localization-ratio of .837 ± .159 and .790 ± .142 for 29 coarse and 104 original TS classes. After re-ranking the overall localization-ratio is reduced (previously, .864 ± .0145 and .803 ± .130, respectively).

\ Table 18: Localization-ratio of coarse anatomical regions (29 classes) using HNSW Indexing and re-ranking, L = 15. In each row, bold numbers represent the best-performing values, while italicized numbers indicate the worst-performing. The separate average and standard deviation (STD) columns are color-coded, with blue indicating the best-performing values and yellow indicating the worst-performing values across different models. Additionally, bold numbers in colored columns represent the best classes in terms of average and standard deviation, while italicized values represent the worst-performing class across the models.

\ Continue with the next figure

\ Table 19: Localization-ratio of all TS anatomical regions (104 classes) using HNSW Indexing and re-ranking, L = 15. In each row, bold numbers represent the best-performing values, while italicized numbers indicate the worst-performing. The separate average and standard deviation (STD) columns are color-coded, with blue indicating the best-performing values and yellow indicating the worst-performing values across different models. Additionally, bold numbers in colored columns represent the best classes in terms of average and standard deviation, while italicized values represent the worst-performing class across the models.

\

:::info Authors:

(1) Farnaz Khun Jush, Bayer AG, Berlin, Germany (farnaz.khunjush@bayer.com);

(2) Steffen Vogler, Bayer AG, Berlin, Germany (steffen.vogler@bayer.com);

(3) Tuan Truong, Bayer AG, Berlin, Germany (tuan.truong@bayer.com);

(4) Matthias Lenga, Bayer AG, Berlin, Germany (matthias.lenga@bayer.com).

:::


:::info This paper is available on arxiv under CC BY 4.0 DEED license.

:::

\

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Tom Lee’s BitMine Hits 7-Month Stock Low as Ethereum Paper Losses Reach $8 Billion

Tom Lee’s BitMine Hits 7-Month Stock Low as Ethereum Paper Losses Reach $8 Billion

The post Tom Lee’s BitMine Hits 7-Month Stock Low as Ethereum Paper Losses Reach $8 Billion appeared on BitcoinEthereumNews.com. In brief Shares of BitMine Immersion
Share
BitcoinEthereumNews2026/02/06 04:47
Headwind Helps Best Wallet Token

Headwind Helps Best Wallet Token

The post Headwind Helps Best Wallet Token appeared on BitcoinEthereumNews.com. Google has announced the launch of a new open-source protocol called Agent Payments Protocol (AP2) in partnership with Coinbase, the Ethereum Foundation, and 60 other organizations. This allows AI agents to make payments on behalf of users using various methods such as real-time bank transfers, credit and debit cards, and, most importantly, stablecoins. Let’s explore in detail what this could mean for the broader cryptocurrency markets, and also highlight a presale crypto (Best Wallet Token) that could explode as a result of this development. Google’s Push for Stablecoins Agent Payments Protocol (AP2) uses digital contracts known as ‘Intent Mandates’ and ‘Verifiable Credentials’ to ensure that AI agents undertake only those payments authorized by the user. Mandates, by the way, are cryptographically signed, tamper-proof digital contracts that act as verifiable proof of a user’s instruction. For example, let’s say you instruct an AI agent to never spend more than $200 in a single transaction. This instruction is written into an Intent Mandate, which serves as a digital contract. Now, whenever the AI agent tries to make a payment, it must present this mandate as proof of authorization, which will then be verified via the AP2 protocol. Alongside this, Google has also launched the A2A x402 extension to accelerate support for the Web3 ecosystem. This production-ready solution enables agent-based crypto payments and will help reshape the growth of cryptocurrency integration within the AP2 protocol. Google’s inclusion of stablecoins in AP2 is a massive vote of confidence in dollar-pegged cryptocurrencies and a huge step toward making them a mainstream payment option. This widens stablecoin usage beyond trading and speculation, positioning them at the center of the consumption economy. The recent enactment of the GENIUS Act in the U.S. gives stablecoins more structure and legal support. Imagine paying for things like data crawls, per-task…
Share
BitcoinEthereumNews2025/09/18 01:27
European Blockchain Convention Drives Digital Finance Revival Amid 90% Blockchain Job Postings Decline

European Blockchain Convention Drives Digital Finance Revival Amid 90% Blockchain Job Postings Decline

The post European Blockchain Convention Drives Digital Finance Revival Amid 90% Blockchain Job Postings Decline appeared on BitcoinEthereumNews.com. This content is provided by a sponsor. PRESS RELEASE. Global leaders convene in Barcelona showcasing resilience as EU advances digital euro and fintech investment reaches €3.6bn in H1, 2025. Barcelona, Spain, September 22nd — The 11th European Blockchain Convention (EBC11) will gather global leaders in Barcelona on October 16-17 to challenge perceptions of European decline […] Source: https://news.bitcoin.com/european-blockchain-convention-drives-digital-finance-revival-amid-90-blockchain-job-postings-decline/
Share
BitcoinEthereumNews2025/09/23 07:16