Abstract:In recent years, with the rise of the mobile Internet, underground mobile applications primarily involved in scams, gambling, and pornography have become more rampant, requiring effective control measures. Currently, there is a lack of research on underground applications by researchers. Due to the continuous crackdown by law enforcement agencies on traditional distribution channels for these applications, the existing collection methods based on search engines and app stores have proven to be ineffective. The lack of large-scale and representative datasets of real-world underground applications has become a major constraint for in-depth research. Therefore, this study aims to address the challenge of collection of large-scale real-world underground applications, providing data support for a comprehensive in-depth analysis of these applications and their ecosystem. A method is proposed to capture underground applications based on traffic analysis. By focusing on the key distribution channels of underground applications and leveraging their characteristics of mutation and accompanying traffic, underground applications can be discovered in the propagation stage. In the test, the proposed method successfully obtained 3 439 application download links and 3 303 distinct applications. Among these apps, 91.61% of the samples were labeled as malware by antivirus engine, while 98.14% of the samples were zero-days. The results demonstrate the effectiveness of the proposed method in the collection of underground applications.