An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript

A.P. Shivarpatna Venkatesh, R. Sunil, S. Sabu, A.M. Mir, S. Reis, E. Bodden, Empirical Software Engineering 30 (2025).

Download
No fulltext has been uploaded.
Journal Article | English
Author
Abstract
Large Language Models (LLMs) are increasingly being explored for their potential in software engineering, particularly in static analysis tasks. In this study, we investigate the potential of current LLMs to enhance call-graph analysis and type inference for Python and JavaScript programs. We empirically evaluated 24 LLMs, including OpenAI's GPT series and open-source models like LLaMA and Mistral, using existing and newly developed benchmarks. Specifically, we enhanced TypeEvalPy, a micro-benchmarking framework for type inference in Python, with auto-generation capabilities, expanding its scope from 860 to 77,268 type annotations for Python. Additionally, we introduced SWARM-CG and SWARM-JS, comprehensive benchmarking suites for evaluating call-graph construction tools across multiple programming languages. Our findings reveal a contrasting performance of LLMs in static analysis tasks. For call-graph generation, traditional static analysis tools such as PyCG for Python and Jelly for JavaScript consistently outperform LLMs. While advanced models like mistral-large-it-2407-123b and gpt-4o show promise, they still struggle with completeness and soundness in call-graph analysis across both languages. In contrast, LLMs demonstrate a clear advantage in type inference for Python, surpassing traditional tools like HeaderGen and hybrid approaches such as HiTyper. These results suggest that, while LLMs hold promise in type inference, their limitations in call-graph analysis highlight the need for further research. Our study provides a foundation for integrating LLMs into static analysis workflows, offering insights into their strengths and current limitations.
Publishing Year
Journal Title
Empirical Software Engineering
Volume
30
Issue
6
LibreCat-ID

Cite this

Shivarpatna Venkatesh AP, Sunil R, Sabu S, Mir AM, Reis S, Bodden E. An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript. Empirical Software Engineering. 2025;30(6). doi:10.48550/ARXIV.2410.00603
Shivarpatna Venkatesh, A. P., Sunil, R., Sabu, S., Mir, A. M., Reis, S., & Bodden, E. (2025). An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript. Empirical Software Engineering, 30(6). https://doi.org/10.48550/ARXIV.2410.00603
@article{Shivarpatna Venkatesh_Sunil_Sabu_Mir_Reis_Bodden_2025, title={An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript}, volume={30}, DOI={10.48550/ARXIV.2410.00603}, number={6}, journal={Empirical Software Engineering}, publisher={Springer}, author={Shivarpatna Venkatesh, Ashwin Prasad and Sunil, Rose and Sabu, Samkutty and Mir, Amir M. and Reis, Sofia and Bodden, Eric}, year={2025} }
Shivarpatna Venkatesh, Ashwin Prasad, Rose Sunil, Samkutty Sabu, Amir M. Mir, Sofia Reis, and Eric Bodden. “An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript.” Empirical Software Engineering 30, no. 6 (2025). https://doi.org/10.48550/ARXIV.2410.00603.
A. P. Shivarpatna Venkatesh, R. Sunil, S. Sabu, A. M. Mir, S. Reis, and E. Bodden, “An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript,” Empirical Software Engineering, vol. 30, no. 6, 2025, doi: 10.48550/ARXIV.2410.00603.
Shivarpatna Venkatesh, Ashwin Prasad, et al. “An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript.” Empirical Software Engineering, vol. 30, no. 6, Springer, 2025, doi:10.48550/ARXIV.2410.00603.

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar