Abstract:The purpose of this paper is to introduce a new approach for the free-hand sketch representation in the sketch-based image retrieval (SBIR), where the sketches are treated as the queries to search for the natural photos in the natural image dataset. This task is known as an extremely challenging work for 3 main reasons:(1) Sketches show a highly abstract visual appearance versus natural photos, fewer context can be extracted as descriptors using the existing methods. (2) For the same object, different people provide widely different sketches, making sketch-photo matching harder. (3) Mapping the sketches and photos into a common domain is also a challenging task. In this study, the cross-domain question is addressed using a strategy of mapping sketches and natural photos in multiple layers. For the first time, a multi-layer deep CNN framework is introduced to train the multi-layer representation of free hand sketches and natural photos. Flickr15k dataset is used as the benchmark for the retrieval and it is shown that the learned representation significantly outperforms both hand-crafted features as well as deep features trained by sketches or photos.