---
_id: '62973'
abstract:
- lang: eng
  text: "Large Language Models (LLMs) are increasingly being explored for their potential
    in software engineering, particularly in static analysis tasks. In this study,
    we investigate the potential of current LLMs to enhance call-graph analysis and
    type inference for Python and JavaScript programs. We empirically evaluated 24
    LLMs, including OpenAI's GPT series and open-source models like LLaMA and Mistral,
    using existing and newly developed benchmarks. Specifically, we enhanced TypeEvalPy,
    a micro-benchmarking framework for type inference in Python, with auto-generation
    capabilities, expanding its scope from 860 to 77,268 type annotations for Python.
    Additionally, we introduced SWARM-CG and SWARM-JS, comprehensive benchmarking
    suites for evaluating call-graph construction tools across multiple programming
    languages.\r\n Our findings reveal a contrasting performance of LLMs in static
    analysis tasks. For call-graph generation, traditional static analysis tools such
    as PyCG for Python and Jelly for JavaScript consistently outperform LLMs. While
    advanced models like mistral-large-it-2407-123b and gpt-4o show promise, they
    still struggle with completeness and soundness in call-graph analysis across both
    languages. In contrast, LLMs demonstrate a clear advantage in type inference for
    Python, surpassing traditional tools like HeaderGen and hybrid approaches such
    as HiTyper. These results suggest that, while LLMs hold promise in type inference,
    their limitations in call-graph analysis highlight the need for further research.
    Our study provides a foundation for integrating LLMs into static analysis workflows,
    offering insights into their strengths and current limitations."
author:
- first_name: Ashwin Prasad
  full_name: Shivarpatna Venkatesh, Ashwin Prasad
  id: '66637'
  last_name: Shivarpatna Venkatesh
- first_name: Rose
  full_name: Sunil, Rose
  id: '97670'
  last_name: Sunil
- first_name: Samkutty
  full_name: Sabu, Samkutty
  last_name: Sabu
- first_name: Amir M.
  full_name: Mir, Amir M.
  last_name: Mir
- first_name: Sofia
  full_name: Reis, Sofia
  last_name: Reis
- first_name: Eric
  full_name: Bodden, Eric
  id: '59256'
  last_name: Bodden
  orcid: 0000-0003-3470-3647
citation:
  ama: Shivarpatna Venkatesh AP, Sunil R, Sabu S, Mir AM, Reis S, Bodden E. An Empirical
    Study of Large Language Models for Type and Call Graph Analysis in Python and
    JavaScript. <i>Empirical Software Engineering</i>. 2025;30(6). doi:<a href="https://doi.org/10.48550/ARXIV.2410.00603">10.48550/ARXIV.2410.00603</a>
  apa: Shivarpatna Venkatesh, A. P., Sunil, R., Sabu, S., Mir, A. M., Reis, S., &#38;
    Bodden, E. (2025). An Empirical Study of Large Language Models for Type and Call
    Graph Analysis in Python and JavaScript. <i>Empirical Software Engineering</i>,
    <i>30</i>(6). <a href="https://doi.org/10.48550/ARXIV.2410.00603">https://doi.org/10.48550/ARXIV.2410.00603</a>
  bibtex: '@article{Shivarpatna Venkatesh_Sunil_Sabu_Mir_Reis_Bodden_2025, title={An
    Empirical Study of Large Language Models for Type and Call Graph Analysis in Python
    and JavaScript}, volume={30}, DOI={<a href="https://doi.org/10.48550/ARXIV.2410.00603">10.48550/ARXIV.2410.00603</a>},
    number={6}, journal={Empirical Software Engineering}, publisher={Springer}, author={Shivarpatna
    Venkatesh, Ashwin Prasad and Sunil, Rose and Sabu, Samkutty and Mir, Amir M. and
    Reis, Sofia and Bodden, Eric}, year={2025} }'
  chicago: Shivarpatna Venkatesh, Ashwin Prasad, Rose Sunil, Samkutty Sabu, Amir M.
    Mir, Sofia Reis, and Eric Bodden. “An Empirical Study of Large Language Models
    for Type and Call Graph Analysis in Python and JavaScript.” <i>Empirical Software
    Engineering</i> 30, no. 6 (2025). <a href="https://doi.org/10.48550/ARXIV.2410.00603">https://doi.org/10.48550/ARXIV.2410.00603</a>.
  ieee: 'A. P. Shivarpatna Venkatesh, R. Sunil, S. Sabu, A. M. Mir, S. Reis, and E.
    Bodden, “An Empirical Study of Large Language Models for Type and Call Graph Analysis
    in Python and JavaScript,” <i>Empirical Software Engineering</i>, vol. 30, no.
    6, 2025, doi: <a href="https://doi.org/10.48550/ARXIV.2410.00603">10.48550/ARXIV.2410.00603</a>.'
  mla: Shivarpatna Venkatesh, Ashwin Prasad, et al. “An Empirical Study of Large Language
    Models for Type and Call Graph Analysis in Python and JavaScript.” <i>Empirical
    Software Engineering</i>, vol. 30, no. 6, Springer, 2025, doi:<a href="https://doi.org/10.48550/ARXIV.2410.00603">10.48550/ARXIV.2410.00603</a>.
  short: A.P. Shivarpatna Venkatesh, R. Sunil, S. Sabu, A.M. Mir, S. Reis, E. Bodden,
    Empirical Software Engineering 30 (2025).
date_created: 2025-12-08T13:20:30Z
date_updated: 2025-12-08T13:25:49Z
department:
- _id: '76'
doi: 10.48550/ARXIV.2410.00603
intvolume: '        30'
issue: '6'
language:
- iso: eng
publication: Empirical Software Engineering
publisher: Springer
status: public
title: An Empirical Study of Large Language Models for Type and Call Graph Analysis
  in Python and JavaScript
type: journal_article
user_id: '15249'
volume: 30
year: '2025'
...