Working AI-Enabled
Becoming the Human in the Loop
Makeljana Shkurti
Business Strategy
Makeljana leads business strategy at VRULL and chairs the RISC-V International AI & ML Market Development Committee. A frequent speaker on technology strategy, she connects the commercial reality of AI silicon with the ecosystem decisions that make it viable.
We build the systems that run AI — compilers, ISA extensions, performance frameworks. But AI is also changing how we build them. We’ve been working AI-enabled across our compiler and performance engineering, and the gap between teams that adopt it well and teams that don’t is already visible.
The acceleration is real
When our engineers started integrating AI into their daily work on GCC and LLVM — across RISC‑V and AArch64 targets — the throughput change was immediate and measurable. Not in some abstract “productivity metric” sense, but in the tangible compression of iteration cycles that used to define the pace of compiler engineering.
Implementation turnaround. Exploring design alternatives for a new optimisation pass — prototyping matching strategies, generating structural code, iterating on edge cases — used to take days of focused work. AI compresses the mechanical part of that cycle. The engineer still decides what to build and why, but the time from “I want to try this approach” to “I can evaluate whether it works” drops dramatically. The same applies to triage: point AI at a failing test, a regression trace, or a set of compiler dumps, and it produces a structured analysis of the likely root cause. It doesn’t always get it right — but it gets the engineer to the right neighbourhood faster than manual bisection.
Test generation. Every edge case in type widths, alignment, calling conventions, and target-specific behaviour needs coverage. AI can generate test scaffolding from a specification or from the transformation itself — hundreds of variants that would have taken a week of manual effort. The engineer reviews, curates, and adds the cases that exercise real boundaries. The coverage is broader and it arrives faster.
Design-space exploration. Evaluating ISA encoding alternatives, cost-model parameters, or scheduling heuristics involves generating and comparing many variants. AI turns this from a serial “try one, evaluate, try another” process into a parallel exploration where the engineer can evaluate a dozen options in the time it used to take to generate three.
The human in the loop is the whole point
The “tasks” in compiler engineering are not repetitive — they require judgement at every step. Is this the right optimisation to pursue? Does this test actually exercise the boundary condition I care about? Is this root-cause analysis pointing at the real problem or a correlated symptom?
AI without that judgement is — to borrow a phrase we use internally — an unguided missile. It’s fast, it’s confident, and it’s perfectly happy to head somewhere wrong. It will generate a plausible-looking compiler pass that silently miscompiles edge cases. It will explore a design space enthusiastically in a direction that violates a constraint the engineer knows about but didn’t state explicitly.
The value isn’t in the AI’s output. The value is in the engineer’s ability to evaluate, redirect, and refine that output. The human in the loop isn’t a safety net — the human in the loop is the reason the loop produces anything worth shipping.
This means that the quality of the engineer matters more in an AI-assisted workflow, not less. The engineer who can look at AI-generated code and immediately spot that it’s using the wrong calling convention, or that the cost model doesn’t account for pipeline forwarding, or that the test is passing for the wrong reason — that engineer gets an enormous throughput multiplier. The engineer who can’t evaluate the output is just shipping mistakes faster.
A conversation with Konstantinos
Konstantinos works on compiler and performance engineering across RISC‑V and AArch64, spending most of his time in GCC and LLVM. We asked him what it’s like to work AI-enabled — as the human in the loop.
How has your day-to-day workflow changed since you started working with AI tools?
The biggest change is how much faster I can get to the point where I’m evaluating a real result. Before, a significant part of my day was writing scaffolding — setting up test harnesses, writing boilerplate for a new pass, generating the repetitive parts of a patch series. Now I describe what I need, get a first version, review and correct it, and move on. The time I save on scaffolding goes directly into thinking about the actual problem.
What’s the single biggest throughput gain?
Test generation, without question. For a pattern-matching transformation in GCC’s
match.pd, I need tests that cover every commutative variant, every type width, every edge case in operand ordering. AI can produce a comprehensive set from a description of the transformation, and I review them to make sure they’re actually testing what I care about. What used to take two or three days of test writing now takes an afternoon of test review.
When does the AI get it wrong?
Regularly. And that’s the part people underestimate. It gets the structure right and the details wrong — wrong register classes, wrong sign extension behaviour, wrong assumptions about what the ABI guarantees. The output looks correct if you’re reading it casually. You have to read it the way you’d review a patch from a colleague who doesn’t know the target well — line by line, checking every assumption against what you know about the architecture.
The dangerous case is when it generates something that passes the tests you already have but fails on an edge case you haven’t written a test for yet. That’s where your own knowledge of the architecture is the only safety net. If you don’t know that AArch64 has different behaviour for
wregisters vsxregisters in certain contexts, you won’t catch the AI getting it wrong.
Has this changed what skills matter in your work?
It’s shifted the balance. The ability to generate code is less differentiating now — AI can generate code. What matters more is the ability to evaluate code. Can you look at a compiler pass and tell whether it’s correct? Can you read a test and tell whether it’s actually exercising the condition it claims to test? Can you spot when an optimisation is valid for RV64 but not RV32? Those are the skills that make AI useful instead of dangerous.
In a way, it’s made the job more like being a senior reviewer full-time. I spend more time reading and evaluating, less time typing. That’s a better use of my time.
What would you tell an engineer who’s sceptical?
Try it on something you know well. Pick a task where you already know what the correct output should look like — a pattern you’ve implemented before, a test suite you’ve written by hand. Use AI to generate a first version, and then review it critically. You’ll immediately see both the speed and the failure modes. Once you know where it’s reliable and where it needs watching, you can start applying it to new problems.
A conversation with Dr. Philipp Tomsich
Philipp is VRULL’s Chief Technologist. We asked him what AI-enabled engineering means for companies like ours — and for the broader semiconductor services industry.
VRULL has always positioned itself around architect-level expertise rather than headcount. How does AI change that equation?
It amplifies it. AI changes the throughput constraint without changing the quality constraint. Our engineers produce more iterations per day, explore more design alternatives, generate more comprehensive test coverage — but the quality bar is the same, because the same people are evaluating every result.
AI amplifies senior engineers disproportionately. A junior engineer using AI gets faster at generating code, but they still can’t evaluate the output reliably — they don’t have the domain knowledge to catch subtle errors. A senior engineer using AI gets faster at generating code and can evaluate it effectively, because they have twenty years of experience telling them what to look for. The multiplier effect scales with expertise.
The conventional outsourcing model is to use large teams to cover more ground. What happens to that model when AI enters the picture?
It gets squeezed from both sides. The cost advantage of large, lower-cost teams was always about person-hours — more people means more tasks completed per week. But AI makes the marginal cost of generating code and tests close to zero. The bottleneck shifts entirely from “how many person-hours can you throw at this” to “how good is the judgement applied to each result.”
A team of ten architect-level engineers working AI-enabled can now match the raw throughput of a team of fifty — while maintaining a quality bar that the larger team structurally cannot match, because quality in compiler engineering comes from domain depth, not from process or headcount.
Some customers see AI-enabled engineering and expect either lower prices or the ability to bring this work in-house. What’s your response?
Those are two versions of the same misunderstanding — that AI commoditises the work. It doesn’t. It commoditises the typing. The judgement is what the customer is paying for, and that judgement is applied to more output per unit time than it ever was before. The customer isn’t getting the same thing cheaper — they’re getting more alternatives explored, more edge cases caught, broader test coverage, and faster delivery. A surgeon with a better instrument doesn’t charge less. The patient gets a better outcome.
As for doing it in-house: the tools are available to everyone, absolutely. The question is what happens when you point them at a RISC‑V vector extension and your team hasn’t spent a decade working on the target. AI doesn’t know that your RVV implementation has a microarchitectural quirk in tail-agnostic masking. It doesn’t know which GCC pass ordering will cause a phase-ordering problem three optimisation levels later. You’ll generate output at full speed, and you won’t know which of it is wrong until it’s in silicon. The gap isn’t access to AI. The gap is knowing what to do with what it gives you.
What does “human in the loop” mean at the company level, not just the individual level?
It means the company’s value is its people’s judgement, full stop. The AI tools are available to everyone — we don’t have a proprietary AI advantage. What we have is a team that knows how to use those tools on problems that require deep architectural knowledge. We know what good compiler output looks like for a RISC‑V vector core. We know what a correct cost model accounts for. We know which GCC passes interact badly and why.
That knowledge is what makes the loop productive. Without it, you’re just generating plausible-looking output at high speed — which is worse than generating nothing, because the debugging bill lands on the customer.
Is there a risk that companies adopt AI without the engineering depth to steer it?
Absolutely, and we’re already seeing it. There’s a temptation to treat AI as a way to skip the expertise step — to hire less experienced engineers, give them AI tools, and assume the tools compensate for the experience gap. In our domain, that’s genuinely dangerous. A miscompilation that passes all existing tests but fails on a customer’s workload six months later is an expensive problem.
The human in the loop has to be good enough to catch those problems. If they’re not, you’ve built an unguided missile — fast, confident, and pointed in a direction nobody verified.
Where do you see this heading in the next two to three years?
The workflows will mature. Right now, we’re in the early phase where engineers are figuring out which tasks benefit from AI and which don’t, calibrating trust levels, developing review habits. In two years, this will be table stakes — every serious engineering team will be working AI-enabled, and the differentiator will be the depth of the expertise being amplified.
For companies like ours, that’s a structural advantage. We’ve always hired for depth. AI doesn’t change that strategy — it vindicates it.
The loop
The economics are simple. When throughput is nearly free, the only differentiator is the depth of the judgement applied to it.
The human in the loop is not a safety net. It is not the person who clicks “approve” at the end of an automated pipeline.
The human in the loop is the engineer who knows that a register allocation looks wrong before running the test. The architect who knows that a cost-model parameter is off by a factor of two because they’ve seen what that microarchitecture actually does. The reviewer who reads AI-generated code the same way they’d review a patch from a talented but inexperienced colleague — carefully, critically, and with the expectation that the important mistakes will be subtle.
AI makes that engineer faster. It does not make that engineer unnecessary. If anything, it makes the gap between a good engineer and a great one wider than it has ever been — because the great one gets a bigger multiplier.
AI has not reduced the demand for deep engineering — it has raised the bar. The skills have shifted from generation to evaluation, but the required depth of expertise hasn’t decreased. If anything, it’s increased: the engineer who thrives in an AI-enabled workflow is the one with enough architectural knowledge to verify output that arrives faster than ever before.
Working AI-enabled means becoming the human in the loop. It means the technology works for you, at the pace you set, in the direction you choose. The speed is the tool’s contribution. The direction is yours.