Abstract:Fuzz testing techniques play a significant role in software quality assurance and software security testing. However, when dealing with systems like compilers, which have complex input semantics, existing fuzz testing tools often struggle due to a lack of semantic awareness in their mutation strategies, resulting in generated programs that fail compiler frontend checks. This paper proposes a semantically-aware greybox fuzz testing method aimed at enhancing the efficiency of fuzz testing tools in the domain of compiler testing. We designed and implemented a series of mutation operators that maintain input semantic validity and explore contextual diversity, and developed efficient selection strategies tailored to these operators. By integrating these strategies with traditional greybox fuzz testing tools, we developed the greybox fuzz testing tool SemaAFL. Experimental results indicate that with the application of these mutation operators, SemaAFL achieved approximately 14.5% and 11.2% higher code coverage on GCC and Clang compilers compared to AFL++ and similar tools like GrayC. During a week-long experimental period, SemaAFL discovered and reported six previously unknown bugs in GCC and Clang.