LLMCoVa: LLM Generated Code Vulnerability Analysis

This project investigates using LLMs for static code analysis and vulnerability detection.

Relevant Papers:

A. Diagnostic Classifier and Representation Probing

T.-D. Bui, T. T. Vu, T.-T. Nguyen, S. Nguyen, and H. D. Vo, “Correctness Assessment of Code Generated by Large Language Models Using Internal Representations,” J. Syst. Softw., vol. 230, 2025, Art. no. 112570. [Online]. Available: https://doi.org/10.1016/j.jss.2025.112570
M. Starace, A. Brunato, and A. Lavelli, “Do Related Linguistic Categories Share Similar Internal Representations in Multilingual LMs?” in Proc. EMNLP Findings, Dec. 2023.
M. Liu, K. H. Chow, and D. Roth, “Exploring Lexical Semantic Representations in Encoder and Decoder Transformers,” in Proc. ACL Findings, Jul. 2024.
D. Nikolaev and S. Padó, “Layerwise Probing of Sentence Meaning Across Transformer Architectures,” in Proc. BlackboxNLP (EMNLP Workshop), Dec. 2023.
S. Nadipalli and A. B. Smith, “Layerwise Evolution of Representations in Fine-Tuned Language Models,” arXiv:2403.05123, Mar. 2024.
A. Tenney, P. Xia, B. Van Durme, and A. McCoy, “What Do You Learn From Context? Probing for Sentence Structure in Pretrained Transformers,” in Proc. ICLR, Apr. 2019. (Background reference)

B. Attention Head Analysis

Y. Nam, J. S. Park, D. K. Das, and D. Krueger, “Causal Head Gating: Quantifying Attention Head Importance in Transformers,” arXiv:2401.12875, Jan. 2024.
Z. Chen, M. Kharitonov, and R. Schwartz, “Syntactic Attention Structure Emerges in Transformer Language Models,” in Proc. ACL, Jul. 2024.
L. Elhelo and M. Geva, “Mapping Attention Parameters to Their Semantics,” arXiv:2404.06511, Apr. 2024.
A. Vig, “A Multiscale Visualization of Attention in the Transformer Model,” in Proc. ACL RepL4NLP, Aug. 2019. (Classic visual interpretation reference)

C. Neuron Probing and Causal Interventions

X. Duan, Z. Yao, Y. Zhang, S. Wang, and Z. G. Cai, “How Syntax Specialization Emerges in Language Models,” Preprint, 2025.
M. Mueller, N. Dankers, and M. Keller, “Discovering Subject–Verb Agreement Neurons in Multilingual Transformers,” in Proc. CoNLL, Dec. 2022.
R. Meng, A. Bau, S. Belinkov, and D. Andreas, “Locating and Editing Factual Associations in GPT,” in Proc. NeurIPS, Dec. 2022.
M. Duan, S. Ravichander, A. Drozdov, and J. Eisenstein, “Syntax Emerges Early in Pretraining,” in Proc. ACL, Jul. 2025.
M. Geva, R. Schuster, Y. Goldberg, and D. Levy, “Transformer Feed-Forward Layers Are Key-Value Memories,” in Proc. ACL, Aug. 2021. (Prior causal probing of factual neurons)
J. Vig et al., “Causal Mediation Analysis for Interpreting Neural NLP Models,” in Proc. EMNLP, Nov. 2020. (Methodological foundation)

D. Sparse Autoencoder (SAE) and Feature Decomposition

N. Cunningham, C. Olsson, and A. H. Miller, “Interpreting Transformer Models with Sparse Feature Sets,” in Proc. NeurIPS, Dec. 2023.
J. Heap, S. Lyu, and K. Chugunova, “SAEs Extract Interpretability from Random Networks: Rethinking Sparsity and Meaning,” in Proc. ICLR, May 2025.
M. Kantamneni, S. Wu, and D. Kiela, “Are Sparse Autoencoders Useful Probes?” arXiv:2403.04070, Mar. 2024.
N. Templeton, L. M. Smith, and R. Boix-Adsera, “Dark Matter in Language Models: Extracting Rare Concepts with Specialized Autoencoders,” arXiv:2402.01928, Feb. 2024.
A. Bricken, L. Lilienfeld, C. Olsson, and J. Hilton, “Monosemanticity: Training Interpretable Models by Encouraging Sparse Feature Representations,” in Proc. ICML, Jul. 2023.
A. Elhage et al., “Toy Models of Superposition: Directly Analyzing Representation Space Geometry,” arXiv:2206.07701, Jun. 2022. (Foundational work)
M. Muhamed, S. Salehi, and J. Pujara, “Learning Specialized Representations with Tail-Aware Sparse Autoencoders,” in Proc. ACL Findings, Aug. 2025.
Goodfire AI, “SAE Editing and Live Model Steering with LLaMA-3,” Technical Report, Apr. 2025. [Online]. Available: https://www.goodfire.ai/sae-llama3
Anthropic, “Superposition, Sparse Features, and Mechanistic Interpretability,” Research Blog, Nov. 2023. [Online]. Available: https://www.anthropic.com/research
OpenAI, “Feature Vectors in GPT-4: Building a Dictionary of Internal Concepts,” Technical Report, Mar. 2024. [Online]. Available: https://openai.com/research/feature-vectors-gpt4

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

Saif Mahmud

Relevant Papers:

A. Diagnostic Classifier and Representation Probing

B. Attention Head Analysis

C. Neuron Probing and Causal Interventions

D. Sparse Autoencoder (SAE) and Feature Decomposition

Share on