An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript

Shivarpatna Venkatesh, Ashwin Prasad; Sunil, Rose; Sabu, Samkutty; Mir, Amir M.; Reis, Sofia; Bodden, Eric

An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript

A.P. Shivarpatna Venkatesh, R. Sunil, S. Sabu, A.M. Mir, S. Reis, E. Bodden, Empirical Software Engineering 30 (2025).

Download

No fulltext has been uploaded.

DOI

10.48550/ARXIV.2410.00603

Journal Article | English

Author

Shivarpatna Venkatesh, Ashwin Prasad^LibreCat; Sunil, Rose^LibreCat; Sabu, Samkutty; Mir, Amir M.; Reis, Sofia; Bodden, Eric^LibreCat

Department

Secure Software Engineering / Heinz Nixdorf Institut

Abstract

Large Language Models (LLMs) are increasingly being explored for their potential in software engineering, particularly in static analysis tasks. In this study, we investigate the potential of current LLMs to enhance call-graph analysis and type inference for Python and JavaScript programs. We empirically evaluated 24 LLMs, including OpenAI's GPT series and open-source models like LLaMA and Mistral, using existing and newly developed benchmarks. Specifically, we enhanced TypeEvalPy, a micro-benchmarking framework for type inference in Python, with auto-generation capabilities, expanding its scope from 860 to 77,268 type annotations for Python. Additionally, we introduced SWARM-CG and SWARM-JS, comprehensive benchmarking suites for evaluating call-graph construction tools across multiple programming languages. Our findings reveal a contrasting performance of LLMs in static analysis tasks. For call-graph generation, traditional static analysis tools such as PyCG for Python and Jelly for JavaScript consistently outperform LLMs. While advanced models like mistral-large-it-2407-123b and gpt-4o show promise, they still struggle with completeness and soundness in call-graph analysis across both languages. In contrast, LLMs demonstrate a clear advantage in type inference for Python, surpassing traditional tools like HeaderGen and hybrid approaches such as HiTyper. These results suggest that, while LLMs hold promise in type inference, their limitations in call-graph analysis highlight the need for further research. Our study provides a foundation for integrating LLMs into static analysis workflows, offering insights into their strengths and current limitations.

Publishing Year

2025

Journal Title

Empirical Software Engineering

Volume

Issue

LibreCat-ID

62973

Cite this

Shivarpatna Venkatesh AP, Sunil R, Sabu S, Mir AM, Reis S, Bodden E. An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript. Empirical Software Engineering. 2025;30(6). doi:10.48550/ARXIV.2410.00603

Shivarpatna Venkatesh, A. P., Sunil, R., Sabu, S., Mir, A. M., Reis, S., & Bodden, E. (2025). An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript. Empirical Software Engineering, 30(6). https://doi.org/10.48550/ARXIV.2410.00603

@article{Shivarpatna Venkatesh_Sunil_Sabu_Mir_Reis_Bodden_2025, title={An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript}, volume={30}, DOI={10.48550/ARXIV.2410.00603}, number={6}, journal={Empirical Software Engineering}, publisher={Springer}, author={Shivarpatna Venkatesh, Ashwin Prasad and Sunil, Rose and Sabu, Samkutty and Mir, Amir M. and Reis, Sofia and Bodden, Eric}, year={2025} }

Shivarpatna Venkatesh, Ashwin Prasad, Rose Sunil, Samkutty Sabu, Amir M. Mir, Sofia Reis, and Eric Bodden. “An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript.” Empirical Software Engineering 30, no. 6 (2025). https://doi.org/10.48550/ARXIV.2410.00603.

A. P. Shivarpatna Venkatesh, R. Sunil, S. Sabu, A. M. Mir, S. Reis, and E. Bodden, “An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript,” Empirical Software Engineering, vol. 30, no. 6, 2025, doi: 10.48550/ARXIV.2410.00603.

Shivarpatna Venkatesh, Ashwin Prasad, et al. “An Empirical Study of Large Language Models for Type and Call Graph Analysis in Python and JavaScript.” Empirical Software Engineering, vol. 30, no. 6, Springer, 2025, doi:10.48550/ARXIV.2410.00603.

Export

Marked Publications

Open Data LibreCat

Search this title in

Google Scholar