Why most AI-in-AEC pilots die in month two

I've watched four AI rollouts inside architecture and engineering firms. Three of them died in month two. Same shape every time: launched with a senior-partner email, week-one adoption of around 40%, then a slow, embarrassing decline through weeks five to eight, then a quiet retirement.

The fourth survived. The Knowledge Hub at Bollinger Grohmann — 19 offices, 9 countries, 62% adoption six months in, ~4 hours per employee per week reclaimed. Not because the model was smarter. Because the workflow fit was.

This essay is what the dead pilots taught me, and what the survivor did differently.

How do you deploy AI inside an engineering firm?

Deploying AI inside an engineering firm requires moving past generic chatbot interfaces and integrating directly into existing workflows. The approach that actually stuck was unglamorous: enterprise semantic search over the firm's historical project data — documents, drawings, specifications — embedded inside the tools people already used, surfacing the source document rather than synthesising an answer. The win was finding the thing, not generating a new thing.

How long does it take to deploy across multiple offices?

A working multi-office deployment — including production search, document ingestion, internal UI, authentication, and onboarding — typically takes 10 to 14 weeks for a firm of 100 to 600 employees. The bottleneck is rarely the model. It is data preparation: getting clean, searchable, correctly-tagged content out of thirty years of multilingual, half-scanned archives. Search quality is a function of ingestion quality.

The story everyone wants to tell

The story most firms want to tell about an AI pilot is the press release. "We deployed [LLM] across our practice to [reduce / enable / accelerate] [vague verb]." A partner emails the firm. The platform team gets a logo. Adoption is reported in week one. Everyone moves on.

This story is fine — for the press release.

It is not what an AI pilot actually is.

An AI pilot is a behaviour change, deployed inside an organisation that has a working set of behaviours, almost all of which were optimised for the absence of this new tool. Treating it like a software rollout is the first mistake. Treating it like a model demo is the second.

What kills pilots in month two

Three things, in order of frequency.

1. The system answers the wrong question

The model is fine. The model is always fine — that's the trap. The system was scoped against the question someone in a strategy session decided was important. The actual question the team asks at 11am on Tuesday is something else. So the team tries it once, doesn't find what they need, and the tool gets quietly delisted from the bookmark bar.

In Bollinger Grohmann's case, the first question worth solving wasn't "design me a façade" — it was "what was the loading assumption on the Project X retaining wall we did in 2019, and who's left in the firm who'd remember." That's a retrieval problem dressed as an engineering problem. Once we scoped against the retrieval shape of the work — what we called the re-asking problem — adoption compounded.

2. The trust gap, and the decision not to generate

Engineers don't trust black boxes. Neither do architects with their professional indemnity at stake. A confidently-worded but wrong answer about a load case or a code clause is a liability, not a convenience.

This is exactly why the Knowledge Hub deliberately did not generate written answers. The obvious "v2" — type a question, get a paragraph back — was scoped and shelved on purpose. The cost of one hallucinated structural recommendation outweighs the convenience of a thousand typed answers. So the system retrieves and surfaces the source: it points you at the authoritative document and lets you read it. Nothing is synthesised that you can't trace back to an original.

That made the tool less impressive in a demo and far more trustworthy in production. An engineer will use a search box that always lands them on the real document. They will abandon a chatbot that's wrong once.

3. The tool sits outside the workflow

This is the most common and the most boring. The pilot ships as a standalone web app. To use it, the team has to remember it exists, switch context, log in, search, paste the result back into the document they were writing.

That round trip is the killer. By week four the team has reverted to asking the partner directly, because the partner is one message away and the tool is six clicks.

The Knowledge Hub shipped inside the tools the team already used — surfaced where the project work happened, with click-through straight to the source. Same search, same data, no context switch. That's where the adoption came from.

What the survivor did differently

Three things, also in order.

1. We scoped against the re-asking problem, not the AI problem

We didn't start with "what can the model do." We started with the shape of the questions the firm actually asked — the ones that pinged around Teams, email, and partner meetings, where someone needed something a colleague three offices away knew, or that an archive from four years ago contained. The vast majority were retrieval problems. The system wasn't there to generate new engineering. It was there to surface old engineering.

This sounds obvious. It is not. Half the failed pilots I've seen scoped against "AI for design" or "AI for analysis" — categories that sound ambitious in a partner meeting and produce nothing operationally useful in week six.

2. We invested upstream, in ingestion

Thirty years of documents in multiple languages, half of them scanned PDFs and drawings with no extractable text, inconsistent naming, and domain-specific structure. OCR, layout parsing, language detection, and structured metadata and standards extraction were the bulk of the engineering. The search box gets the credit; the ingestion pipeline does the work. Every hour spent on clean extraction paid for itself ten times over in result relevance — and relevance is what earns trust.

3. We made the system less impressive on purpose

No generated answers. No "AI that designs." A search tool that surfaces sources and gets out of the way.

This bored the partners. They wanted the system to do more. But every generated answer we didn't ship was a wrong engineering answer that never happened. The system that admits the boundary of what it should do is the system an engineer will actually use.

The interesting question

The interesting question isn't whether firms will adopt AI. It's what the firms that get it wrong in month two will look like in year three.

My bet: a long, quiet pile of internal pilots, each with a logo and a senior-partner email, each retired by month eight. Meanwhile the two or three firms that figured out the workflow-fit problem will have compounding institutional knowledge — clean data, integration depth, and trust capital behind it. Generation can come later, once that foundation has earned the right to it.

That gap is hard to close from a standing start. The architecture practice that ships its third dead pilot in 2026 will be three years behind in 2029.

The fix is not buying a better model. The fix is shipping a working system, scoped against the re-asking problem, built on clean ingestion, embedded where the work happens, and honest about what it should and shouldn't do.

The model is fine. The model has always been fine.