Abstract:Long methods, as well as other categories of code smells, are preventing software applications from reaching their maximal readability, reusability, and maintainability. Consequently, automated detection and decomposition of long methods have been extensively studied. Although such approaches have significantly facilitated the decomposition, their solutions are often substantially different from the optimal ones. To this end, in this paper, we investigated the automatable portion of a publicly available dataset containing real-world long methods. Based on the findings from this investigation, we propose a method called Lsplitter, which utilizes large language models to automatically decompose long methods. For a given long method, Lsplitter employs heuristic rules and large language models to decompose the method into a series of shorter methods. However, large language models often result in the decomposition of similar methods. To address this, Lsplitter uses a location-based algorithm to merge physically contiguous and highly similar methods into a longer method. Finally, it ranks these candidate results. We conducted experiments on 2849 long methods from real-world Java projects. The experimental results show that Lsplitter improves the hit rate by 142% compared to traditional methods combined with modularity matrix, and by 7.6% compared to methods purely based on large language models.