This might not seem so, but this is the relatively brief version of what I want to say about dialogs and voice acting, and if anyone wants me to go into more detail about any part of the following text, please inform me, as I have thought about these matters for years now, and I was versatile enough to start afresh whenever needed.
The more I think about it, the more I am convinced that the root of a lot of problems in Oblivion was the decision to make it fully voice acted without the proper technology to implement it the right way, as it handicapped the quest builders a lot, and wasted a lot of time, and filled a lot of space, and used a lot of budget that could be used in a better way.
The voice actors needed to recite every line of text in the dialogs one or more times, and for each race the same line was to be voiced again with different voice actors, and if a quest builder decided to change the line, then the voice actors needed to repeat the procedure, and some crew would also be needed to check the results and decide to repeat it or correct it with sound editors, and so on...
Those voice files filled more than half of the DVD, so there was less space for models, textures and so on...
Those voice actors and sound editors and the like spent a lot of time on recording the voices, checking and editing sound files, and repeating the procedure when needed.
Those voice actors, sound editors and the celebrity people and the rest, needed a lot of budget to do their job, (the budget that could be used to hire more writers, quest builders, dungeon designers, artist and the like).
Those voiced dialogs made the life of the official quest designers a lot harder, and I do not want to talk about the modders, who do not have access to the official voice actors.
The end result was a shallower game that could have more wealth of story, lore, and quests, like its predecessor, and was quite hard for modders to add such content to the game.
OK, so how do we can overcome such a problem?
One solution is to revert to the old method of voiced greetings and special lines and silent bulk of the text. It's the easy way out of the problem.
Another option is developing a new specialized voice synthesis system that is focused on synthesizing in-game voices and dialogs, and performs better than the general voice synthesis software available.
So, instead of voice acting, voice actors and voiced lines, we would have voice building, voice donors, voice samples, voice sample sets, voice elements, text lines, text meanings, text http://www.nuspel.org/chart.gif, and emotion symbols (or codes), and so on...
We could still have celebrity voice actors and voiced lines, for important dialogs and key moments, but for the rest of the dialogs, we can let the system build the resulting voice output.
I will go into the detail about implementing such a system later, but for now, let's have a general look at what it can be and see what happens if we have such a system at our service:
First of all, All the text lines could be made of one or more different threads, so we can have the written text which is shown on the screen, a code thread that might contain phonetics, lip movements, emotions and other character behavior, which would help the system build the voiced line to play, and control the behavior of the speaker, voice thread which could store the voice acted line for special cases, and a meaning which is understood by NPC-AI, (I will go into more detail about that in another place).
Text: Written text shown on the screen.
Code: Phonetics, emotions, lip motions, and other behavior.
Voice: Voice acted line.
Meaning: For NPC-AI.
All those parts are optional, so if we have lines that would not to be voiced, or the default voice engine's output is enough, then there is no need for "Code" thread.
And if we have voiced lines that do not need to be shown on the screen then the "Text" part can be ignored, and if there is no need for any NPC to understand the meaning of the dialog line, then we can ignore the "Meaning" thread.
OK, I will write more about it later, but for now let's see what we will gain if we have a functional voice building system, and a related NPC-AI, but I do not want to go into more details about AI, events, and quests here, so I would be brief:
There can be an application that helps people gather voice samples and make voice sets. It can have some prebuilt voice sets for generic imperials, bretons, khajits, argonians, mers, red guards, and so on...
Bethesda can distribute that application and tell people that anybody who sends them a usable voice-set, would have his name in the credits as a voice set donor and would get the special add-on for the package when the game is gold, and if anyone can create a new distinct and usable sub-race pronunciation, he would be gifted a free copy of the final game with the add-on.
As for sub-races, we can have some variety of pronunciations in different areas of the land, so imperials of colovian highlands can have different pronunciation than imperials of the niben bay. People can try to make distinct new pronunciations, or mimic and add to a current available sub-race, with more chance of being accepted.
Those sub-race samples make a hierarchy of relations like this.
Men (Imperials, Bretons, Redguards...)
Mer (Altmers, Dunmers, Bosmers, Orsimers, Dwemers, Ayleids...)
Beast (Khajits, Argonians, Goblins...)
Daedra (Daedra Lords, Golden Saints, Seducers, Dremora, Xivilais, Scamps...)
Then each race can still have sub-races which would present the different pronunciations of different parts of the province.
And each voice-set would be related to a sub-race, or the generic race.
It works like this: You start the application, select the gender, race, and sub-race(if available), (or you might want to create a new sub-race), and then copy the voice-set that is most related to your choice to a new name, (or create a new one from scratch).
If you have already been working on a voice-set, then you open that and all the settings would be already selected.
After your current voice-set is created/loaded, you are shown a line of text, and if you like, you can listen to the current(if available) or the nearest generic voice for that line, then you can pronounce that line as you like.
The line texts are selected in a way to produce more voice elements that are needed to build the future text voices. So the engine tries to trim your voiced line, and extract the voice elements from that line, and you can listen those elements to see if they need some tweaks, like moving the beginning and end of the selected section of the voiced line for that sample by a few milliseconds to or fro.
After that you can listen to a few lines that could be produced by those samples to see if that worked, and if you found a glitch, you could try to tweak the problematic samples until you are good.
This procedure can be continued for new text lines until all the required voice elements of the voice-set are created, then you can start with voice effects like, Pain, Attack, Power Attack, and the like.
The engine might require you to repeat the procedure for "Whisper" mode and "Shout" mode as well.
If any voice element, or voice effect is not supplied with the voice-set, then the ones from its ancestor voice set are used. And sometimes we can supply a combination of some voice elements or a whole word, or a whole sentence, for the more used ones, and whenever the engine finds a place that it can use those words or sentences, then it would be selected instead of those elements. And those words or whole sentences would be given a code that could be added to or changed in a descendant voice-set.
After a while, Bethesda can gather those voice sets as they start to be delivered, and select the more usable ones from them, tweak them if needed, and add them to the available pool for each race and sub-race.
In the construction set, those voice-sets can be assigned for any NPC with pitch modulation and other effects, so each voice set can be used for more than one NPC without being exactly the same.
When in editor a designer types the text for a dialog line (or the like), he could open the behavior window of the editor to define the behavior of the character while speaks the line, like his emotions, looking down, or shaking of the head, or examining of the item in his hands, or looking at a nearby item and so on...
All of those can be selected from a list of actions, check boxes, and radio buttons and so on... But in the end all of those are compiled into some phonetics and behavior codes, and might be interlaced with lip motion data as well, but the designer can look at the codes and change them manually if he liked.
Those behavior codes might change the pitch of a section of the resulting voice a bit when needed to simulate a change of emotion, or other effects can be achieved as well.
All of these might seem a bit complicated to implement, and this is why, it is more suitable to be a middleware tool set where an independent developer could put all of its manpower on the single project to make an efficient middleware tool set for other developers to use within their projects.
Those middleware developers could make the application to collect voice-sets and make the api to use them in the dialogs.
The end result would be an efficient system, that would let us have voiced dialog lines without the need for available and on call voice actors while developing dialogs and quests, they do their job at the beginning of the projects and produce voice-sets to be used later within the dialogs and quests, and after that you could have voiced lines for anything you like, just by typing the text and defining some behaviors.
After that we would be only limited by our imagination when creating quests and dialogs, and put our resources and manpower where it would give better result.
We could develop an event manager that could create semi-random events and quests on the fly and the lines of the texts for those events and quests could be voiced by any actors, without the need to know beforehand who those speakers are. Think of the possibilities.
I have just scrapped the surface of what is needed for the effect, and what can be the outcome of such an effort, as I wanted to post this as soon as possible, and empty the burden on my mind, but I could go deeper in any section of those ideas and details, if needed.