Just found this interview on google that mentions instances where the radiant AI went horribly wrong - thought it might provide an interesting read to those who hadn't already read it -
"1. One character was given a rake and the goal “rake leaves”; another was given a broom and the goal “sweep paths,” and this worked smoothly. Then they swapped the items, so that the raker was given a broom and the sweeper was given the rake. In the end, one of them killed the other so he could get the proper item.
2. In another test, a minotaur was given a task of protecting a unicorn. However, the Minotaur repeatedly tried to kill the unicorn because he was set to be an aggressive creature.
3. In one Dark Brotherhood quest, the player can meet up with a shady merchant who sells skooma, an in-game drug. During testing, the NPC would be dead when the player got to him. The reason was that NPCs from the local skooma den were trying to get their fix, did not have any money, and so were killing the merchant to get it.
4. While testing to confirm that the physics models for a magical item known as the “Skull of Corruption,” which creates an evil copy of the character/monster it is used on, were working properly, a tester dropped the item on the ground. An NPC immediately picked it up and used it on the player character, creating a copy of him that proceeded to kill every NPC in sight.
5. In one test, after a guard became hungry and left his post in search of food, the other guards followed to arrest him. The town people looted the town shops, due to lack of guards.
Bethesda worked to fix these issues, balancing an NPC’s needs against his penchant for destruction so that the game world still functions in a usable fashion. In-game there are over 1,000 different NPCs, not including randomly spawned monsters and bandits. The result is that the AI in the release version is much reduced, only featuring NPC schedules."
If this had actually been the case, all of those incidents could easily have been fixed by raising (or adding) a threshold level that prevents the relative priority of a "law abiding" character's "task" from exceeding that of murder (or theft, in most cases). It should be "fixable". The fact that it was dropped or never implemented leads me to believe that these were "scripted" test cases of very specific "goals" and routines set up between two specific test NPCs, not random incidents out of a larger crowd of NPCs, and which turned out to be nearly impossible to link together into an "overall" or more universally applicable NPC AI package.
If you've only got two NPCs, and you set things up so that NPC #1 is told to do a task which involves getting an item carried by NPC #2, then you've got to set up routines for that first AI to search for the item, determine if it's in the posession of another NPC, and if so, do something about it. If it's given a further list of options, in case that item is in the posession of another NPC, to either: (1) ask for it, (2) steal it, or (3) kill for it, and no reason to choose one over the other, why would you be surprised if NPC #1 kills #2 for the item? I'd call the test case "working so far, but not finished". The task is going according to the options provided; the only problem is that the list has no priorities or "threshold" checks included to prevent the less desirable action from occurring. The example also doesn't give any information about whether there were any "backup" options in case the overall task or at least the one option proved impossible.
One way or another, it's all "scripting", but in a multi-layered and interconnected web that would be a nightmare to test and debug. Calling it "Radiant AI" is just a fancy label to put on the box, so you can say "WE have Radiant AI and you don't", until next year when everyone's got it listed on the box.