This is a big problem and most people don't realize this is what happens, so in the end when they have extra time or they find something that doesn't fit they can only change programing and all the dialog can't be changed. Though problem is with computer generated voices are that good yet and they don't have the time to create all that programing and the only mainstream generated voice (for both English an Japanese) is Vocaloid 2( with all of I think eleven voices, not enough don't you think?), other than that they would have to make it from scratch and that should be an endeavor that should be made at another time.
Yes, I know computer generated voice are still immature, as I have described http://www.gamesas.com/index.php?/topic/1108155-xzzz/page__view__findpost__p__16236688 why, but in the first post on that thread, I suggested a team of dedicated and exceptionally talented people can gather together and make a company that creates a middle-ware that generates procedurally generated voice-overs for in-game text, with the help of voice-actors' pre-captured voices and categorizing the small elements of their voices into banks and use those banks and combine those elements and add effect to them to generate emotional voices on the fly.
In that post I have suggested how the dedicated team can gradually fine-tune their general formula and for each voice set, until each voice-set becomes acceptable for general game media, and then they can add new voice sets,and start to grow, and as each voice set can produce several different voices, (with added effects and changing the pitch) then they can supply a lot of voices for each voice set.
And different locals are possible, so we can have voice set that imitate Imperial isle pronunciation, Nibenay pronunciation, Khajiit pronunciation, and so on.
Other languages are possible as well, and can be worked on.
And if such a system is developed, it would be such an evolution that the team that make those middle-wares would be millionaires in no time.
As the current trend of voice-acted dialogs put a great deal of limitation for future evolution of NPC AI and quest systems, for instance:
In future computer games, each object can be known for NPCs, and they would know its purpose and can generate dialogs regarding items on the fly, so if they see a ladder on the ground and they need to go up the roof of their shack, they can understand that they can use that ladder for the purpose and so they ask for permission of its owner for the task.
Each sentence that is generated, it is generated as a three part line, like this:
- The actual text, to show on the screen, in the logs and the like
- The meaning part, which would help with AI, (more on that later).
- The code part, which helps with tone, pronunciation, facial emotions, gestures, mid-dialog actions, and so on...
The voice-over engine looks at the code part to generate the voice, and it would help with the effects that infuse emotion to the voice, as well as commands that generate facial emotions and mid-dialog gestures and actions.
The meaning part is something like this:
Text: Can I borrow your ladder for a bit?
Meaning:
- [CAN?]: a question that can result in (Yes, for permission/No, for denial/Further investigation, to clarify additional required parameters for AI to decide)
- [_I]: A pointer to an actor, currently the speaker, so the AI can look at that actor to gain addition information about it.
- [BORROW]: The AI knows that it means temporarily taking an item and returning it later.
- [YOUR]: A pointer to another actor, which is the target of the conversation, with additional meaning of changing the subject of the conversation to one of the actor's possessions.
- [LADDER]: A pointer to the subject item, so the AI can check its characteristics.
- [FOR]: An additional sub-meaning that could claify another aspect of the interaction.
- [A BIT]: Clarify the duration of the action, to be a short period of time.
Those pointers to objects give AI the references to the sources of additional information to help with decisions.
The code part do not need to be sorted like the current language and can have a standard structure for sentences.
When the first NPC asks the question from the ladder owner, the AI can look at the meaning of the question and gather additional information from the referrence pointers and decide about the answer, like one of these:
- Do I know you, my dear lady?
- Hey, pal no problem, but you should know that it is unstable.
- Get lost, before I kill you right here, you cheating partner.
- You know, cousin, I know how you borrow, but I give you another chance as I like you, go ahead.
All of these answers can be generated on the fly, and would result in more responses from the dialog originator, again on the fly, and so on...
The AI can look at the reference objects, and the previous memories of the current NPC, and other local conditions, supplied from the local event manager, and hidden local guide objects, and so on...
As for another instance, we can have a lot of guide object scattered through out the landscape, scripted to help local NPC AI, but some of them can be as land-marks to help NPCs give the player character correct addresses and routes to a destination.
So when you ask an NPC if there is a monastery around, the AI can communicate with the local search engine and supply it with the subject of your conversation, currently a monastery, and wait for the result, as the NPC, stands still holding his chin in his hand and saying, "Hmm, let me see...".
Then the search engine gives back a list of the land marks in the middle until it reaches the subject of the search, i.e. the monsatery, and the NPC responses:
"Yes, there is one, but you are for a bit of trekking. You should go along this alley, until you reach the fork, then turn left and continue until you reach "The Village Center", then you should follow the road toward the river, until you reach a big building with a fishing boat sign.
Go inside and ask for a ferry, and tell the ferry-man you want to go to the "Bubbling Bay", and when you reach there, look around for a nearby bridge and cross it, then you should continue toward the forest and at the edge of the forest is the nearest monastery to us."
All of this can be generated on the fly with the help of the local search engine and local land marks, and AI.
But none of this is possible if we do not free ourselves from this trend of voice actors for each spoken sentence.
I wanted to go into the topic of procedurally generated events and quests, but I do not have time for it now.
So this is a reason why this trend is a bad idea.