← All writing
·6 min read

Essay

Why most AI-in-AEC pilots die in month two

Adoption isn't the problem. Workflow fit is. Lessons from shipping an AI knowledge platform across 19 offices.

I've watched four AI rollouts inside architecture and engineering firms. Three of them died in month two. Same shape every time: launched with a senior-partner email, week-one adoption of around 40%, then a slow, embarrassing decline through weeks five to eight, then a quiet retirement.

The fourth survived. The Knowledge Hub at Bollinger Grohmann — 19 offices, 9 countries, 62% adoption six months in, ~4 hours per employee per week reclaimed. Not because the model was smarter. Because the workflow fit was.

This essay is what the dead pilots taught me, and what the survivor did differently.

How do you deploy AI inside an engineering firm?

Deploying AI inside an engineering firm requires moving past generic chatbot interfaces and integrating directly into existing workflows. The most successful approach is a Retrieval-Augmented Generation (RAG) platform that indexes your firm's historical project data, embeds within tools like Microsoft Teams, and mandates strict inline citations so engineers can instantly verify the source of the generated answer.

How long does it take to deploy a RAG system across multiple offices?

A working multi-office RAG deployment—including production retrieval, internal UI, basic guardrails, and onboarding documentation—typically takes 10 to 14 weeks for a firm of 100 to 600 employees. The bottleneck is rarely the model itself; it is data preparation, authentication, and adoption design. Deploying faster usually skips critical trust-building features like citation UI.

The story everyone wants to tell

The story most firms want to tell about an AI pilot is the press release. "We deployed [LLM] across our practice to [reduce / enable / accelerate] [vague verb]." A partner emails the firm. The platform team gets a logo. Adoption is reported in week one. Everyone moves on.

This story is fine — for the press release.

It is not what an AI pilot actually is.

An AI pilot is a behaviour change, deployed inside an organisation that has a working set of behaviours, almost all of which were optimised for the absence of this new tool. Treating it like a software rollout is the first mistake. Treating it like a model demo is the second.

What kills pilots in month two

Three things, in order of frequency.

1. The system answers the wrong question

The model is fine. The model is always fine — that's the trap. The system was scoped against the question someone in a strategy session decided was important. The actual question the team asks at 11am on Tuesday is something else. So the team asks once, gets a plausible-sounding wrong answer, and the tool gets quietly delisted from the bookmark bar.

In Bollinger Grohmann's case, the first answerable question wasn't "design me a façade" — it was "what was the loading assumption on the Project X retaining wall we did in 2019, and who's left in the firm who'd remember." That's a retrieval problem dressed as an engineering problem. Once we scoped against the retrieval shape of the work — what we called the re-asking problem — adoption compounded.

2. The trust gap closes on the second wrong answer

Engineers don't trust black boxes. Neither do architects with their professional indemnity at stake. The first hallucinated structural recommendation kills the second hundred true ones.

The fix isn't "better model." The fix is citation UI on day one. Every answer footnoted to source documents. Click-through to the original. A confidence-weighted banner when the retrieval set is thin. The model can be wrong — the system has to make wrongness legible.

We learned this the slow way. Months one and two of the Knowledge Hub shipped without inline citations; adoption stalled through weeks three to six. The citation UI shipped in week seven. Adoption compounded from week eight forward. The cleanest correlation in the deployment data.

3. The tool sits outside the workflow

This is the most common and the most boring. The pilot ships as a standalone web app. To use it, the team has to remember it exists, switch context, log in, ask, paste the answer back into the document they were writing.

That round trip is the killer. By week four the team has reverted to asking the partner directly, because the partner is one Slack message away and the tool is six clicks.

The Knowledge Hub eventually shipped inside the tools the team already used — surfaced in the project management UI, surfaced in Teams, surfaced as a contextual sidebar in the document review flow. Same model, same retrieval, same data. Adoption tripled in eight weeks.

What the survivor did differently

Three things, also in order.

1. We scoped against the re-asking problem, not the AI problem

We didn't start with "what can the LLM do." We started with a transcript audit — what questions had been asked in Slack, in Teams, in email, in partner meetings, over the previous three months. The vast majority were retrieval problems: someone asks something a colleague three offices away knows, or that an archive from four years ago contains. The model wasn't there to generate new engineering. It was there to surface old engineering.

This sounds obvious. It is not. Half the failed pilots I've seen scoped against "AI for design" or "AI for analysis" — categories that sound ambitious in a partner meeting and produce nothing operationally useful in week six.

2. We instrumented adoption from day one

PostHog from week one. Per-user, per-office, per-question-type funnels. Time-to-first-answer. Time-from-answer-to-next-action. Re-query rates (a leading indicator of trust loss).

When adoption stalled in week three we could see which office, which question type, which user cohort. The fix was specific, not vibes-based.

Most failed pilots can't tell you who their power users are at month two. The survivor could.

3. We made the model less impressive on purpose

The platform refused to answer specification or load-bearing questions. Hard-coded. The refusal copy was specific: "This is outside my confidence range — ask the structural lead on the project." A small, dull, surgical refusal pattern.

This bored the partners. They wanted the model to do more. But every refusal that saved a wrong engineering answer bought a quarter of compounding adoption. The model that admits its limits is the model an engineer will actually use.

The interesting question

The interesting question isn't whether firms will adopt AI. It's what the firms that get it wrong in month two will look like in year three.

My bet: a long, quiet pile of internal pilots, each with a logo and a senior-partner email, each retired by month eight. Meanwhile the two or three firms that figured out the workflow-fit problem will have compounding institutional knowledge — the same model, but four years of evaluation data, integration depth, and trust capital behind it.

That gap is hard to close from a standing start. The architecture practice that ships its third dead pilot in 2026 will be three years behind in 2029.

The fix is not buying a better model. The fix is shipping a working system, instrumented from day one, scoped against the re-asking problem, with citation UI on the day of launch.

The model is fine. The model has always been fine.

Agentic AIAECKnowledge SystemsAdoption

Newsletter

Get the next one in your inbox.

One email per essay. No drip campaigns.

Subscribe on Substack ↗