Research Radarcs.SEJun 8, 2026classified

FASE: Fast Adaptive Semantic Entropy for Code Quality

Shizhe Lin, Ladan TahvildariarXivPDF
cs.SEcs.AIcs.MA

Paper Guide Brief

Reading Brief

The paper introduces Fast Adaptive Semantic Entropy (FASE), a lightweight metric that estimates functional correctness of LLM-generated code by using embedding models and minimum spanning tree-based adaptive clustering, replacing expensive LLM-driven equivalence checks. Evaluated on HumanEval and BigCodeBench, FASE achieves a 25% improvement in Spearman correlation and 19% increase in ROCAUC over state-of-the-art semantic entropy, while requiring only 0.3% of the runtime cost.

Central Claim

FASE: a novel metric that approximates functional correctness of LLM-generated code using embedding-based semantic distance, minimum spanning tree extraction, and adaptive density-based clustering, eliminating costly LLM-driven equivalence checks.

Contribution

FASE: a novel metric that approximates functional correctness of LLM-generated code using embedding-based semantic distance, minimum spanning tree extraction, and adaptive density-based clustering, eliminating costly LLM-driven equivalence checks.

Why It Matters

FASE provides a practical, cost-effective alternative to LLM-based semantic entropy for code quality estimation, achieving superior correlation with functional correctness at a fraction of the computational cost.

Prerequisites

semantic entropy, minimum spanning tree, adaptive density-based clustering, embedding models, pairwise semantic distance

Atlas Placement

Software Engineering (subfield)

Read If

You care about semantic entropy, minimum spanning tree, adaptive density-based clustering.

Skip If

You only care about Pass@1, Spearman correlation.

Methods
semantic entropyminimum spanning treeadaptive density-based clusteringembedding modelspairwise semantic distancecosine distanceDBSCANHDBSCAN
Tasks
code quality estimationuncertainty quantificationfunctional correctness predictionmulti-agent code generation
Datasets
HumanEvalBigCodeBench
Benchmarks
Pass@1Spearman correlationROCAUC

Noosaga Placements

  • The paper addresses code quality estimation in multi-agent code generation, a core software engineering concern. The primary arXiv category is cs.SE, and the paper is situated within software engineering workflows and evaluation.
    Multi-agent code generation offers a promising paradigm for autonomous software development by simulating the human software engineering lifecycle.CCS Concepts: • Software and its engineering → Automatic programming;
  • Large Language Modelsframework90%
    The paper is situated within the Large Language Models framework, as it addresses uncertainty and hallucination in LLM-generated code, a core concern of LLM-based systems.
    Recent advances in large language models (LLMs) have accelerated the emergence of autonomous multi-agent systems for software engineering tasksLLM hallucinations and error propagation across interacting agents
  • The paper heavily relies on large language models (LLMs) for code generation and uses embedding models (e.g., Qwen3-Embedding) to capture semantic representations of code. The work is situated within LLM-based code generation and uncertainty estimation.
    Recent advances in large language models (LLMs) have accelerated the emergence of autonomous multi-agent systems for software engineering tasksFASE leverages the semantic signals captured by code embedding models
  • Representation Learningframework80%
    The paper uses embedding models (e.g., Qwen3-Embedding) to generate semantic representations of code, which is a form of representation learning.
    FASE leverages the semantic signals captured by code embedding modelsPairwise Semantic Distance via Encoder-Only Embedding Models
  • The paper explicitly addresses multi-agent code generation workflows and error propagation across agents. The evaluation includes both coder-only and analyst+coder workflows.
    Multi-agent code generation offers a promising paradigm for autonomous software developmenterror propagation across interacting agentspractical, cost-effective solution for optimizing uncertainty quantification in real-world multi-agent workflows
  • Neural NLPframework70%
    The paper uses neural embedding models (e.g., All-MiniLM, Qwen3-Embedding) to compute semantic distances, situating it within neural NLP approaches.
    four encoder-only embedding models of different sizes are selected with high performance in text embedding, searching, ranking and clustering tasks

Abstract

Multi-agent code generation offers a promising paradigm for autonomous software development by simulating the human software engineering lifecycle. However, system reliability remains hindered by LLM hallucinations and error propagation across interacting agents. While semantic entropy provides a principled way to quantify uncertainty without ground-truth answers, current methods often rely on costly LLM-driven equivalence checks. In this work, we introduce Fast Adaptive Semantic Entropy (FASE), a novel metric that approximates functional correctness based on the minimum spanning tree of structural and semantic dissimilarity graphs. Evaluations on HumanEval and BigCodeBench demonstrate that FASE outperforms state-of-the-art semantic entropy by LLM entailment, achieving a 25% average improvement in Spearman correlation and a 19% increase in ROCAUC score against Pass@1 from ground-truth test cases when using the Qwen3-Embedding-8B model. Furthermore, by eliminating costly LLM-driven equivalence evaluation, FASE incurs negligible computational overhead, requiring only approximately 0.3% of the runtime cost of traditional semantic entropy approaches. These results position FASE as a practical, cost-effective solution for optimizing uncertainty quantification in real-world multi-agent workflows.

Paper Context

Source ContextWhole paper
Budget100,000 tokens
Coverage62,501 chars

Classified from the full extracted paper text (62,501 characters). The Paper Guide brief above is the user-facing synthesis; raw context is kept out of the page.

Full-paper context sent 62,501 of 62,501 extracted characters to classification.