Abstract:Semistructured data has irregular or incomplete structure. In recent research on semistructured data sources and integration for heterogeneous data sources, models for semistructured data are based on direct graph with root vertex, so querying semistructured data is equivalent with searching in graph. In addition, path with wildcard characters brings more complexity in query processing. In this paper, the authors present the strategies deployed in querying and optimizing OIM (model for object integrating) data in Versatile——a system for integrating heterogeneous data sources. Algorithms for generating query plan and extending path are discussed in detail and three optimization methods, path index (Pindex), level index(Lvindex) and knowledge of data source are introduced. Also the approach can be applicable to other graph-based semistructured data easily.