Grah. I had a lengthy reply all written out and my browser went and crashed when I tried to preview it.
Perhaps you have lots of experience with professional-level TTS tools. Forgive me if so, but I think you may not... and the average gamer's extremely limited experience with TTS can't be compared with the current state-of-the-art. We can all admit that even the modern pro-level TTS engines & SDKs still don't sound "exactly" like real speech, but the tech's getting closer every year. Cepstral's VoiceForge and AT&T's Natural Voices were both huge improvements when they were new, and there are still-better engines yet to be released. Further still, though many of these engines support full SSML, you almost never hear it skillfully employed. Bethesda was already including cues for the Voice Actors in the dialog; TTS is better because the writer can preview how the SSML affected the prosody and change as needed. After all - isn't it the writer who best knows how their dialog is supposed to sound?
Indeed I don't have much experience with professional-level TTS tools.
I can only work off what I hear, and even these days you can still hear when something is TTS compared to a real voice.
True, the writer knows how they want their dialog to sound, but that doesn't necessarily mean they know how to make it sound its best. Knowing what words words to write is a different talent than knowing how to speak those words to an audience, if you get my meaning. A big part of being a voice actor, as I understand it, is how to speak to an audience. Sometimes an actor will do something the writer didn't expect, and the director finds it works better than what was originally intended.
Hence my qualifiers, friend. Having even just plain SSML in-dialog tags controlling when & where which "voice" is used, along with pitch, speed, and emphasis will make a huge difference. Per phrase, per word, etc. And Bethesda is great at taking a good middleware and pumping it full of new features & flexibility. I believe they can pull this off very well. Eventually this space-saving & flexible approach will be industry-standard for extensive-dialog games. Games like Oblivion, but better. Isn't that what we want?
I guess you could say I'm cautiously pessimistic.
If they can make it work with comparible quality, great. I just don't think we're at that point yet. And I'd be concerned about what is lost in the transition (i.e. input from the voice actors themselves).
High quality TTS would be great for modders, or people that don't have the budget for voice actors. But I think a sizable production studio like Bethesda would benefit more from having live voice actors. Although I could perhaps see them using live voice actors for a good number of lines and TTS'ing the rest, if they really wanted to go for diversity in dialog, but it would depend on the TTS being good enough to not stick out like a sore thumb.
Except that the way it happened in Oblivion was not even consistent per-character. In Oblivion you can get the same guy saying "D-a-e-dra" like four different ways - two of which are obviously a product of the Voice Actor's dyslexia (and sometimes typos in dialog). This is immersion-breaking because a speaker, speaking from their own mind, does not usually mix pronunciations. E.G., I have a friend who always mispronounces "ask" as "axe". That's how he says it; he does not mix it up. Actually, to get the consistent, character-specific effect you reference, TTS is more reliable.
And besides, who the heck is this "D e a dra" woman, anyway? Is there a Naked Nord quest involved?
True enough. Though something as blatant as a mispronunciation is a failing on multiple levels. A voice actor will usually do lines multiple times to try to do their best. The director tries to help them understand the original intention, and the producer helps select which take to use.
We agree more than you know. But not entirely. I don't know what makes you think going to TTS means you can't keep these voices; it's comments like these that give me the sense that you don't have current S-o-A TTS experience. Modern engines can have many different voices (even ones sampled off the Voice Actors you know & love) and express each voice profile in many different ways.
Part of the voice is how it's spoken, not just how it sounds. Sure, you could sample Jonathan Bryce, Jeff Baker, and the rest for the iconic voices.. but it would be performed by the computer, not the actors. It would sound like the actors, but it wouldn't be spoken like them.
I mean.. you have the the way Ordinators say "We're watching you... scum." which has become rather popular among fans, as well as other lines ("All I ask for is a pair of boots. How hard could it be?" being one of my favorites), and I don't really see how you can get that without an actor being behind it.
Not only could each race have its own unique voice, but extra parameters can make it a "younger-" or "older-sounding" voice - as dictated per the character's own "age" setting. Imagine FaceGen, but for voices. Move the slider, and not only do wrinkles appear or disappear, but also the TTS engine sounds accordingly aged or young.
I'd be lying if I said that didn't sound interesting. I'm just not sure TTS can do the original lines justice, before modifying the sound.
Take Morrowinds storyline for example again, you where simply given a letter and a mission to make your way to Balmora, but there was no real urgency behind it, you didn't know what the letter was about or why there should be a hurry, for all you know you've just been released from prison, breath in some freedom first.
Not only that, but there were points in the main quest where you're basically told "I got nothing for you to do right now. Go do other quests, level up, and come see me later." I thought that was great, as it let me relax and explore. By constrast, in Oblivion you start right in the main quest, and each point needs you urgently to go to the next, so it doesn't feel like there's a good time to do any other quests.. and there's no option to say you don't want to do the main quest (let alone politely).