1.
Link to video essay version at the end of the post.
This is a discussion about language. More specifically the language systems NPCs use to communicate, how we communicate with them and how, overtime, those systems have evolved to the point where NPCs have started to sound frighteningly human.
For me, Oblivion was the first game where NPCs started to sound this way. I am fully aware than Oblivion’s dialogue is broken, unnatural, and often absurd. That the voice acting is repetitive and chock-full of comical errors.

But that was part of what made it feel so real. Oblivion marked Bethesda’s first real step toward creating a fully voiced world. Morrowind’s characters were only partially voiced, and most of the audio was limited to greetings like “Wealth beyond measure, outlander,” or “Wake up, we’ve reached Morrowind.”
In contrast, Oblivion had nearly 50,000 voice lines, shared between just 15 voice actors. To manage the sheer volume of work, Bethesda made the strange decision to organise the voice lines alphabetically. The voice actors attended sessions where they recorded hundreds of variations of “hi,” “hello,” and “how are you?”
Audio lead Mark Lampert later admitted that they simply “didn’t have a good way to organise it at the time.”

No wonder the dialogue sounds disjointed. The fragmented conversations, clumsy exposition, and stiff delivery were, in fact, a result of Bethesda being in over its head. Their methodology created a breeding ground for mistakes, and those mistakes were where the characters started to sound human. After all, there is nothing more human than making a mistake.
2.
In 1977, a group of MIT students created Zork, a text adventure that promised more freedom, more possibility, and fewer dead ends than the games that came before it.

The setup is simple. You are an “adventurer,” standing in a field beside a white house and a mailbox. The mailbox contains nothing more than the introduction to the game. Inside the house are a few curious objects: a brass lantern, an empty trophy case, an engraved sword, and an old rug concealing a trap door. Beneath that house lies the Great Underground Empire, a sprawling dungeon filled with puzzles, thieves, trolls, and treasure. At least, that is what Zork tells you is there.

By modern standards, Zork looks almost comically bare. It has no discernible visual elements The game describes a situation. You type a command. The game responds. It sounds simple until you realise how obtuse the system can be. You can’t scroll back to review information, so missing details bring your progress to a painful halt. Worse still, the game only understands a very small slice of language.
Typing the wrong word causes the game to immediately snap back: “I don’t know the word ‘there’,” or “You used the word ‘east’ in a way I don’t understand.” So, the challenge shifts from simply solving puzzles to learning how to communicate in a way the machine can process. In Zork, progress depends on shrinking your vocabulary until it fits inside the game’s.

3.
Playing Zork now feels less like playing an RPG and more like trying to crack a code. As such, its closest descendants are not fantasy adventures at all, but language games. Games like Chants of Sennaar, 7 Days to End with You, and Totem, built around the slow and uncertain work of deciphering alien symbols.

Each game provides an explicit goal — reach the top of the tower, understand an intimate conversation, or convince an alien race to halt its invation — but the primary obstacle remains, in a sense, the same. Before you can progress, you have to learn how the NPCs make sense of reality.
Chants of Sennaar, like Zork, casts you as a nameless figure moving through an unfamiliar, Babel-inspired world. But instead of one parser, it has you interacting with multiple cultures, each with its own symbolic language. The Devotees, Warriors, Bards, Alchemists, and Anchorites do not simply speak different languages; they lead different lives. They hold different ideals. Different beliefs. And their language directly reflects those differences.
The Warriors’ script is composed of sharp lines and hard angles. Their language is devoid of clear pronouns, as if individuality matters less than action or hierarchy. They separate themselves from those who are both physically and figuratively below them. The Bards, in contrast, write in flowing, decorative symbols that feel almost indulgent, mirroring a culture built on performance, beauty, and refinement. In both cases, language is not purely a means of communication, but a record of their values.

Unlike Zork, rather than simply studying what the NPC says, you are forced to investigated how it says it – how it addresses you, how it refers to others, and what kinds of values are encoded in its language. Now, the way the NPCs speak no longer feels scripted, nor does it feel limited by the hardware and code it was build on. Increasingly, it resembles the messy, contextual, and expressive way that people actually speak.
4.
To understand why it is so difficult to create believable conversations with NPCs, it helps to start with a simple fact: human language is absurdly complicated.
Even speech, in the narrowest physical sense, is a coordinated act involving the lungs, throat, tongue, mouth, nasal cavity, and a whole orchestra of muscles and structures working together to produce sound. And even that’s before you account for tone, pacing, rhythm, or hesitation.

But speech itself is only one small part of communication. Humans do not just exchange words. We read faces, posture, timing, emphasis, silence, emotional distance, and context. Meaning does not live exclusively in language, but in its delivery, in the tiny signals that surround a sentence and tell us whether someone is sincere, bored, embarrassed, nervous, flirtatious, threatened, or… lying.
In 1970, Paul Ekman and Wallace Friesen began developing the Facial Action Coding System, a framework that attempted to catalogue, organise, and understand facial expressions and all the information tied to them. It treats emotion not as a vague performance, but as a readable and, more importantly, predictable combination of physical actions.

Take the Pan Am smile, for instance: the strained, performative smile of someone who is technically being polite, but would clearly rather be anywhere else. It is the kind of smile a bartender might give when you order five complicated drinks moments before the bar closes. This smile has a label: AU12. What matters is not just that the mouth curves upward at the edges, but what the rest of the face is doing around it, or what it is not doing.
Fittingly, Cyberpunk 2077’s Panam often seems to wear exactly this kind of smile during the player’s increasingly doomed attempts to flirt with her.

This is the next stage of NPC communication. Not simply better dialogue. Not better voice delivery. But the simulation of everything around language: simulated expressions, reactions, hesitations, context. The moment NPCs stop simply delivering lines and begin to seem as though they mean what they are saying is the moment games cross into something much closer to conversation.
5.
Who’s Lila? was released in 2022. It begins with the disappearance of a local girl. Tanya Kennedy is missing, and every thread of suspicion leads back to William Clarke, the last person known to have seen her alive. That’s you.
Will struggles to express emotion in ways other people can easily read, which is difficult enough under ordinary circumstances. Under suspicion of murder, it becomes a catastrophic problem. A typical playthrough sets up and unravels a series of mysteries. What happened to Tanya? Who keeps calling you? And why do they keep calling you “Lila?”

The answers are horrifying. You killed Tanya while under the control of an unknown entity, then disposed of her body. But the real horror of Who’s Lila? is not in its plot. It’s in its interface.
You do not select a line from a dialogue tree, wheel, or menu. You speak by dragging pieces of Will’s face into something that resembles emotion. You need to physically construct a smile, and fear, and confusion, and sincerity, and indifference. Every exchange becomes an uneasy act of puppetry. You are not choosing what to say so much as trying to manufacture a believable rendition of feeling.
Garage Heathen, the developer of Who’s Lila?, said the mechanic was inspired by L.A. Noire’s facial performances and by interrogation analysis from the JCS Criminal Psychology youtube channel. But where L.A. Noire asks you to catch NPC deception, Who’s Lila? is built around hiding the lie instead. Heathen and a group of friends even built an “emotion library”: a collection of manually posed smirks, scowls, grimaces, and other expressions used to train the game’s system. It is a rough, game-specific equivalent of the Facial Action Coding System. The game was given a loose — and I mean very loose — guide to human emotion.

The result is unsettling not just because the faces are grotesque, but because they are recognisably human in the worst possible way. They capture something that games often miss about human communication: expressions are unreliable. Humans can weaponise affect. We can smile while panicking, look calm while lying, and appear detached while trying desperately to hide what we know. Who’s Lila? is a game that understands communication is never just the words we say. It is the performance that surrounds them.
6.
In Zork, the machine could barely understand what you were saying. In Oblivion, it could speak, but only in that strange, dislocated way produced by a computer straining to imitate conversation. But Who’s Lila? feels like a pivotal moment, one in which communication between man and machine became more organic: not a fixed line delivered from a script, but a system that can be manipulated to produce a desired impression, regardless of whether that impression is true.
The 2015 indie game Emily is Away took that concept in a different direction. On the surface, it is barely even a game: just another text interface styled like an old AOL chat room. One you use to communicate with Emily through varying stages of your adolescent life. In practice, it turns instant messaging into a study of how artificial our most casual communication already is.

In it, you do not freely compose your messages to Emily. You choose from a small set of predetermined responses, each loosely corresponding to a broad emotional direction, and then the game lets you “type” out the response. However you press the keys, the message arrives exactly the same. It is a performance of spontaneity, rather than spontaneity itself.
Each message is carefully decorated with human messiness — typos, redundant phrasing, and anxious jokes, the kind you could only expect from a teenager who is confused and scared by the prospect of their changing emotions. But before they are sent, the computer censors them. Anything that might resemble a genuine error or a truly unscripted moment of vulnerability is removed just before the finish line. What remains is not conversation, but a curated simulation of it.
7.
It was not that long ago that the internet still felt unruly. From the mid-1990s into the late 2000s, the web was often an ugly, overdesigned, deeply embarrassing place, full of flashing GIFs, broken pages, malware, obsessive hobbies, bad taste, and very little of the frictionless polish that now defines internet life. It was chaotic. It was unsafe. It was also unmistakably human.

Hypnospace Outlaw captures that feeling better than almost anything else. Its version of the web is cluttered, amateur, sincere, and only loosely governed: a space where personality leaks into every bad background, every self-important rant, every aggressively customised home page. It recalls an internet from before platforms standardised expression into the same clean boxes and monetisable feeds.
Heaven’s Gate is a real UFO religious group whose official website still remains online decades after the group’s mass suicide in 1997. It feels less like a webpage than a time capsule the modern internet forgot how to erase. It is also a webpage that would feel perfectly at home in Hypnospace Outlaw.

In the game you are essentially a moderator. You comb the fake 90’s OS internet for violations, including: copyright infringements, bullying, malware, harassment, and any illegal activity. There is more to the story, and Hypnospace Outlaw is well worth experiencing yourself, if you haven’t already. But, for me, what was most fascinating about this experience was sitting down and being responsible for dulling and standardising this wild version of the internet, into something much like our own.
Nowadays, our communication is full of shortcuts, abbreviations, and buffers — interfaces that sit between us and other human beings. These systems have not made language worthless. If anything, research has shown that our use of shorthand does not erode literacy, and in some cases may even sharpen certain aspects of the language we use. But something else may have been lost in the process.
Not just privacy. Not just attention. But: the visible mess of communication. The awkwardness of encountering another person’s mind in a form no system has yet sanded smooth. The old web was ugly, yes, much like many of the AOL chat rooms of the late ’90s. But it still looked like people made it.

8.
New platforms like Convai promise NPCs that can listen, interpret, and respond with startling fluency while remaining anchored to the world around them. It is, in many ways, the dream that games like Zork and Oblivion were always circling: the dream of an artificial character that can really talk back.

During my time writing this script I have been trying to understand why Oblivion’s characters felt so alive, and so real. I have played games which have more technically impressive conversations. Games with branching storylines which react and evolve depending on your decisions. And what I landed on was this.
Oblivion was a catalogue of errors. An actor stumbled over a confusing, disjointed line. Bethesda provided those lines in no sensible order, and then, after all that, left the mistake in the final release. But maybe that is why it felt so human: because real communication cannot be predicted, autocorrected, or made perfect on the first try and the more convincingly games try to simulate real human communication, the more they remind us how much of real communication is made up of mistakes. For years, I thought NPCs were learning to speak more like us, but it is only now that I am finally realising that maybe all this time, we have been learning to speak more like NPCs.


Leave a comment