What does a machine actually know?

A network that was only ever taught to guess the next move was never shown a map. So why did it build one — and then act on a false one when I lied to it?

There's a comforting story people tell about large language models: that they are "just predicting the next token." Glorified autocomplete. A stochastic parrot with no idea what it's saying. It's comforting because if a model doesn't understand anything, you don't have to worry too much about what it believes.

I want to show you why that story is incomplete — not with an argument, but with something you can run in your browser in the next sixty seconds.

An experiment you can poke

I trained a deliberately tiny neural network — about 155,000 parameters, small enough to dissect completely. Its entire universe is a stream of symbols describing an agent shuffling around a grid: up, up, right, down… It is trained to do exactly one thing — predict the next move. It is never shown a picture of the grid, never given a coordinate, never told the rules. Just the symbols.

By the "just autocomplete" story, it should learn some shallow statistics of which symbol tends to follow which, and stop there. It does not.

▶ open the live experience

The World Inside — read its mind, then change it. Runs entirely in your browser.

When you point a simple "mind-reader" — a linear probe — at the network's internal activations, the agent's position on the grid pops right out, with 98.8% accuracy. Nobody put it there. To get good at guessing the next move, the most useful strategy it found was to represent the world the symbols describe and keep it in its head. You can watch it do this, live: as it reads, a map it was never given assembles itself.

It didn't memorize sequences. It reconstructed the world that produced them.

And it isn't a decorative coincidence. The model never predicts a move that would walk it through a wall — 100% of the time — because it knows where it is. The inner world isn't a souvenir; it's load-bearing.

Now lie to it

Here's where it stops being cute. Because the world model lives in a specific place in the activations, I can reach in and overwrite it — a technique called activation patching. In the live demo you do this by clicking a cell: you implant the false belief "you are over here."

The model then acts on the false belief. It starts refusing moves that are only blocked where it now thinks it is. Place its false belief next to a landmark, and it will confidently report seeing that landmark — one that, in reality, is nowhere near it. A hallucination, on command, 99.7% of the time. Its behavior is steered not by the world, but by its belief about the world. That belief is causal. And it can be edited.

measured on the actual shipped model — reproduce it yourself

Position decodable from activations98.8%

Never predicts an illegal move100%

Editing its belief changes its behavior100%

Hallucinates a landmark on command99.7%

"But it's a toy"

It is — on purpose. I used a tiny model precisely because it's small enough to prove the whole chain end-to-end, and to run live in your browser with nothing to take on faith. The toy setup is deliberate, but the broader pattern is not isolated to this demo: related work has found internal board, space, and time representations in larger models.

A note on words: here, "belief" is shorthand for a decoded internal state representation. I am not claiming consciousness or human-like understanding. The claim is narrower and more useful — in this controlled setup, the representation is measurable, editable, and causally connected to behaviour.

A transformer trained only on Othello moves builds the board in its activations, and editing that internal board changes how it plays. Llama-class models contain a linear map of real-world place and time — real latitudes and longitudes, recoverable with a ruler. When Anthropic amplified a single internal feature, Claude became obsessed with the Golden Gate Bridge. The difference between my 155K-parameter grid-walker and a frontier model is not that small models can have representations and large ones cannot — it's scale, richness, and how hard those representations are to isolate cleanly.

Why I care, and why you should

I've spent close to twenty years building software systems, and the last few deep in AI. The thing that keeps me up is not whether these systems are powerful — they obviously are — but whether we can trust them in places where being wrong is expensive. In finance, in healthcare, in anything regulated, "the model said so" is not an answer. We need to know what it believes, why, and whether that belief tracks reality.

This little demo is a microscope on exactly that question. It shows, concretely, that a model's behavior flows from an internal representation we can locate, read, and change. That's not a threat — it's the most hopeful thing in AI right now. The same handle that lets me make a toy hallucinate is the handle that lets us audit, correct, and govern systems we'd otherwise have to take on faith.

Interpretability isn't an academic luxury. It's the difference between deploying a black box and deploying something you can be held accountable for.

So the next time someone tells you a model is "just predicting the next token," you can agree — and then point out that to predict the next token well, it had to build a world. The interesting question was never whether it understands. It's whether we can understand it.

I think we can. And I think the best way to convince people is to let them hold the thing in their hands. That's what this series is for.

Ankur ChrungooPrincipal engineer & AI architect · MSc, Artificial Intelligence

Next in the series (№2): instead of a 155K-parameter toy, we'll decode the real, accurate world-map hidden inside an actual large language model — and render it. Subscribe to Inside the Model to get it.

The model, the training code, and every test above are open-source and reproducible: github.com/ankur-chr/ankur-chr.github.io. Built as a static, single-file experience — the model runs locally and no model inputs or interaction state are sent to a backend. The site uses privacy-friendly, cookieless page-view analytics.