A preprint posted to arXiv on June 18, 2026 describes AutoPass, a system that places a large language model inside the workflow of an optimizing compiler. The paper, authored by Zepeng Li, Jie Ren, Zhanyong Tang, Jie Zheng, and Zheng Wang, is filed under the software-engineering and artificial-intelligence categories and targets a specific problem: getting language models to help with runtime performance tuning, as opposed to the code-generation and compilation tasks where the authors say LLMs have already shown promise.

The authors frame the difficulty in concrete terms. Runtime tuning, they write, is hard "due to complex microarchitectural effects and noisy runtime measurements." A model that proposes a sequence of compiler flags cannot reliably predict how those flags will interact with the underlying hardware, and the measured result of any given configuration carries measurement noise that can obscure whether a change actually helped. AutoPass is presented as an attempt to give the model enough evidence to reason about those effects rather than guess at them.

"Rather than treating the compiler as a black box like prior auto-tuning schemes, AutoPass opens up the compiler to the LLM, enabling it to query compiler-internal optimization states and analyze the intermediate representation to orchestrate compiler options."— arXiv abstract, AutoPass (2606.20373), source

What the paper says the system does

According to the abstract, AutoPass is structured as a multi-agent framework. The distinguishing design choice the authors emphasize is that the compiler is not opaque to the model. Where prior auto-tuning approaches treat the compiler as a black box and search over its inputs, AutoPass is described as exposing the compiler's internal optimization states and its intermediate representation to the LLM, so the model can inspect what the compiler is doing and orchestrate its options on that basis. The paper positions this access to internal state as the mechanism by which the model's optimization decisions are guided by evidence.

The search itself is iterative. The authors write that the process "iteratively refines optimization configurations using measured runtime feedback to diagnose regressions and guide latency-improving edits." In the terms the abstract uses, the system measures how a candidate configuration actually performs, uses that measurement to identify where performance regressed, and then directs the next round of edits toward reducing latency. The runtime feedback is therefore part of the loop, not just a final scoring step, which is consistent with the authors' stated goal of working through the noise in runtime measurements rather than around it.

One claim the paper makes about deployment cost is that AutoPass requires no model training. The abstract states the system "operates in an inference-only, training-free setting and requires no offline training or task-specific fine-tuning, making it readily applicable to new benchmarks and platforms." Read literally, that describes a system that uses an existing model as-is and adapts to a new workload through its search procedure and the compiler evidence it gathers, rather than through any gradient updates to the model. The authors connect that property directly to portability: because nothing has to be trained for a particular benchmark or platform, the same approach can, in their framing, be pointed at a new target without a setup phase.

The reported numbers

The authors say they implemented AutoPass on the LLVM compiler and evaluated it on two classes of hardware: server-grade x86-64 systems and embedded ARM64 systems. Against LLVM's standard high-optimization setting, -O3, the abstract reports geometric-mean speedups of 1.043x on x86-64 and 1.117x on ARM64. The paper also states that AutoPass "outperforms expert-tuned heuristics and classical autotuning methods," placing the comparison not only against the compiler's default optimization level but against hand-tuned baselines and established autotuning techniques.

The choice of -O3 as the reference point is worth stating plainly, because it sets what the reported figures mean. -O3 is the aggressive optimization level that LLVM applies when a developer asks for maximum standard optimization, so a speedup over -O3 is a speedup over code that the compiler has already optimized heavily. The two platforms also produce different numbers in the paper: the larger reported gain is on ARM64 (1.117x) than on x86-64 (1.043x). The abstract does not, in the text available here, break those figures down by individual benchmark, and the geometric mean is the aggregate the authors chose to report.

Where this sits

Compiler autotuning is an established field, and the paper situates itself against it directly by naming "classical autotuning methods" and "expert-tuned heuristics" as the baselines AutoPass is measured against. The contribution the authors describe is not the idea of searching over compiler options, which long predates language models, but the combination of an LLM-driven search with direct access to the compiler's internal state and intermediate representation, run in a training-free configuration. The abstract's recurring contrast is the black-box-versus-open-box distinction: prior schemes tune the inputs to a compiler they cannot see inside, while AutoPass is described as reading the compiler's internals as part of how it decides what to change.

The preprint is a single-version arXiv posting (v1) dated June 18, 2026, and the figures cited here come from its abstract. The full methodology, the specific LLM used, the benchmark suites, and the per-benchmark results would appear in the body of the paper rather than the summary. What the record shows at this stage is a stated design, a stated set of baselines, and two stated aggregate speedup figures over -O3 on the two hardware targets the authors evaluated. Readers can consult the abstract page linked below for the authors' own framing and for access to the full text as the preprint is updated.