The problem with this sort of thing is that, for some reason, AI-generated conversation just feels hollow compared to human-written conversation. It’s a weird thing, because I honestly can’t really articulate why that is, but hearing an NPC talk about some event as written by a human has a heightened feeling of importance compared to hearing an AI-generated text about that same event. Maybe it’s because when it’s human-written, we subconsciously know that someone cared enough to spend the time to write it specifically, so it must be important in some way.
It’ll be interesting to see how this plays out when it’s actually implemented in real games. If the AI NPCs were given specific plot points that they were supposed to hit to move the story along, how do they prevent situations where the AI just never gets to it, or where it doesn’t have the feeling of importance and the player just glazes over it? If it’s just used for random NPCs whose dialog isn’t really important to the narrative, how do they avoid it just becoming background noise that we don’t pay attention to or care about?
There’s also the issue of transferring emotions to the words when doing text to speech. Imagine how dry it’ll be listening to an NPC react to someone dying or something.
The other angle no seems to mention is that while the NPC can have dialouge that feels natural, there is no way to program them to be able to act on the dialouge.
This was something very obvious in some of the Skyrim LLM integration videos I’ve seen. They would “convince” the NPC to do something, like join their quest, but there is no logic behind the scenes to actually enable that interaction in-game.
For visual novels that may not matter (I’m not convinced, but maybe), but for rpgs it will. Just look at what happened with fallout. They didn’t limit it to a wheel of effectively yes/no just for dialouge/voice acting reasons. They do things like that to limit how many paths the story/character can take becuae you can’t program it all in.
A theory of mine about this problem is that an AI knows what it should do (because of training data) but not if it was effective; as it doesn’t have a metric to test if its output is engaging to humans. When an AI generates dialogue it does so by copying and merging many existing snippets of text, but without a clear set of goals in doing so. When a human writes dialogue, they have a specific atmosphere in mind, a set of goals, foreshadowing, the tone shifting throughout the sentences etc. AIs might accidentally do it right from time to time, but more often than not they mess this part up.
The problem with this sort of thing is that, for some reason, AI-generated conversation just feels hollow compared to human-written conversation. It’s a weird thing, because I honestly can’t really articulate why that is, but hearing an NPC talk about some event as written by a human has a heightened feeling of importance compared to hearing an AI-generated text about that same event. Maybe it’s because when it’s human-written, we subconsciously know that someone cared enough to spend the time to write it specifically, so it must be important in some way.
It’ll be interesting to see how this plays out when it’s actually implemented in real games. If the AI NPCs were given specific plot points that they were supposed to hit to move the story along, how do they prevent situations where the AI just never gets to it, or where it doesn’t have the feeling of importance and the player just glazes over it? If it’s just used for random NPCs whose dialog isn’t really important to the narrative, how do they avoid it just becoming background noise that we don’t pay attention to or care about?
There’s also the issue of transferring emotions to the words when doing text to speech. Imagine how dry it’ll be listening to an NPC react to someone dying or something.
It’ll take a few years before this technology is really viable. Until then, we get stuff like this.
Here is an alternative Piped link(s):
Until then, we get stuff like this.
Piped is a privacy-respecting open-source alternative frontend to YouTube.
I’m open-source; check me out at GitHub.
The other angle no seems to mention is that while the NPC can have dialouge that feels natural, there is no way to program them to be able to act on the dialouge.
This was something very obvious in some of the Skyrim LLM integration videos I’ve seen. They would “convince” the NPC to do something, like join their quest, but there is no logic behind the scenes to actually enable that interaction in-game.
For visual novels that may not matter (I’m not convinced, but maybe), but for rpgs it will. Just look at what happened with fallout. They didn’t limit it to a wheel of effectively yes/no just for dialouge/voice acting reasons. They do things like that to limit how many paths the story/character can take becuae you can’t program it all in.
A theory of mine about this problem is that an AI knows what it should do (because of training data) but not if it was effective; as it doesn’t have a metric to test if its output is engaging to humans. When an AI generates dialogue it does so by copying and merging many existing snippets of text, but without a clear set of goals in doing so. When a human writes dialogue, they have a specific atmosphere in mind, a set of goals, foreshadowing, the tone shifting throughout the sentences etc. AIs might accidentally do it right from time to time, but more often than not they mess this part up.