Patch-wise Retrieval: An Interpretable Instance-Level Image Matching

Wonseok Choi1 Sohwi Lim2 Nam Hyeon-Woo1 Moon Ye-Bin1
Dong-Ju Jeong3 Jinyoung Hwang3 Tae-Hyun Oh2
POSTECH1    KAIST2    Samsung Research3
WACV 2026, Tucson AZ
Also presented at ICCVw 2025 (eXCV: Explainable Computer Vision: Quo Vadis?)

Poster

Abstract

Instance-level image retrieval aims to find images containing the same object as a given query, despite variations in size, position, or appearance. To address this challenging task, we propose Patchify, a simple yet effective patch-wise retrieval framework that offers high performance, scalability, and interpretability without requiring fine-tuning. Patchify divides each database image into a small number of structured patches and performs retrieval by comparing these local features with a global query descriptor, enabling accurate and spatially grounded matching. To assess not just retrieval accuracy but also spatial correctness, we introduce LocScore, a localization-aware metric that quantifies whether the retrieved region aligns with the target object. This makes LocScore a valuable diagnostic tool for understanding and improving retrieval behavior. We conduct extensive experiments across multiple benchmarks, backbones, and region selection strategies, showing that Patchify outperforms global methods and complements state-of-the-art reranking pipelines. Furthermore, we apply Product Quantization for efficient large-scale retrieval and highlight the importance of using informative features during compression, which significantly boosts performance.

Motivation

Teaser figure showing interpretability, performance, and scalability gains.

Patchify provides interpretable matching by localizing where retrieval evidence comes from, improves performance via local cues, and scales efficiently with product quantization.

  • Interpretability: explicitly reveals where the match occurs.
  • Performance: local representations provide stronger instance matching.
  • Scalability: product quantization supports efficient large-scale retrieval.

Method: Patchify

Patchify pipeline: patch extraction, encoding, product quantization, and similarity search.
  • Each database image is split into structured multi-scale patches and encoded independently.
  • Retrieval compares query representation with patch representations and ranks by maximum similarity.
  • Patch-level indexing with product quantization keeps memory and retrieval latency practical.

LocScore: Localization-Aware Metric

Experimental Results

Comparison with SOTA Methods

Comparison with SOTA methods across global and reranking settings.
Patchify is effective as both a standalone method and a strong first-stage representation for reranking.

Memory Efficiency

Memory usage comparison among baseline and Patchify variants.
Patchify reaches stronger retrieval quality with lower memory overhead.

Qualitative Results

Qualitative retrieval examples from the presentation.

Patchify retrieves correct instances even when targets are small or off-centered and clearly indicates which regions trigger the match.

Takeaways

Patchify

  • Interpretable instance retrieval with explicit spatial grounding.
  • Training-free, plug-and-play design that can leverage stronger backbones.
  • Scales efficiently with low memory overhead.

LocScore

  • Localization-aware metric that evaluates both ranking quality and spatial alignment.
  • Complements ranking-only metrics by revealing where the match occurs.

BibTeX

@inproceedings{choi2026PatchwiseRetrieval,
  title={Patch-wise Retrieval: A Bag of Practical Techniques for Instance-level Matching},
  author={Wonseok Choi and Sohwi Lim and Nam Hyeon-Woo and Moon Ye-Bin and Dong-Ju Jeong and Jinyoung Hwang and Tae-Hyun Oh},
  booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)},
  year={2026}
}