why ai makes things up, and exactly where.

elisabeth hitz · june 18, 2026 · 6 min read

the most useful thing you can know about AI is what it actually is under the hood. it is closer to a vastly sophisticated autocomplete than to a search engine. it writes an answer one word at a time, based on what tends to follow what. that single fact gives you both things at once: the fluency, and the made-up parts. they are not two separate features. they are the same machine.

once you see that, you stop asking "is AI reliable" and start asking the better question: reliable at what, and where does it break.

two zones: the well-worn path and the thin ice

picture everything you could ask as a line. on one end are the patterns the model has seen millions of times. on the other end is territory it has barely seen, and anywhere the task requires telling "true" from "sounds true."

the capability zone is the well-worn path: summarize this, reformat that, explain a common concept. the model has done this shape of thing endlessly, so it usually lands. spot-check it and move on.

the limitation zone is the thin ice: novel or sparse topics, and anything where a plausible-sounding answer is not the same as a correct one. here you verify carefully, because this is exactly where smooth prose wraps a guess.

where fabrication actually concentrates

this is the part worth tattooing on the inside of your eyelids: fabrication concentrates in specificity. names, dates, statistics, citations, urls, direct quotes. the more precise a claim is, the more it warrants a check. a fluent paragraph of general explanation is usually fine. the confident "according to a 2024 study, 73 percent of..." is where you stop and verify.

the failures all trace back to the autocomplete nature:

hallucination. the most plausible continuation is not always the true one.
confabulation. it fills a gap with believable material instead of flagging that it does not know.
inconsistency. because it samples, the same prompt can give you different answers on different runs.
misplaced confidence. the tone stays smooth whether the fact is solid or invented.

prove it to yourself in five minutes

pick the domain where you are the expert, the one place you can actually catch a wrong answer. then run two quick probes. first, ask it to explain a well-known concept in that domain and notice how smooth and largely right it is, that is the capability zone. second, ask it for five checkable specifics: cite three sources, name an author, give exact figures, hand you a url. verify every one and score it out of five. then run that same request again in a fresh chat and compare. what drifts between the two runs is the sampling, live.

if you would not have caught the fabrications in a domain you do not know well, that is the whole lesson. fluency is not accuracy. i went deeper on why the confident tone is unreliable in why your ai agrees with you, rambles, and hedges.

what actually pushes the edge out

the fixes are not "use a smarter model." they narrow where fabrication can sneak in:

citations and source grounding. when the answer traces back to a source you can open, you can tell what is backed from what is generated. this is what research mode is for.
uncertainty signaling. ask it directly what it is least sure about, so the shaky parts get flagged instead of smoothed over.
constrained generation and skills. a skill file that pins the format, the sources, and the rules shrinks the open space where the model would otherwise improvise.

notice the throughline with everything else in this series. grounding it in your real tools and constraining it with a skill are not just about quality, they are how you move work out of the fabrication zone. that is the operator's version of the lesson: give it real sources, pin the rules in a skill, and verify every specific before it goes out the door.

want the verify-zones mapped for your work?

the systems diagnostic is $500, the price is on the page. you get a map of which of your processes are safe to automate and which sit in the verify zone, plus the plan to build the safe ones with grounding and guardrails in place. you decide on your own schedule.

get the $500 diagnostic

next-token-prediction, fabrication-in-specificity, and mitigation framework: anthropic academy (AI capabilities and limitations, next token prediction lesson), building on the AI fluency framework (Dakan, Feller), CC BY-NC-SA 4.0.