Yup, as usual Bethesda forgot to give the PC users a decent, functional UI for a keyboard and mouse so we're forced to fight with a shoddy, impractical console UI.
Only a fraction of the speedup comes from using SSE2 code. The original exe also uses SSE2 code, just not in the right places where it is truly needed. This could've been prevented by using automated SSE2 vectorization and/or another math library. Interestingly, in this case, it's the dot product function that has been rewritten, which is somewhat ironically the #1 textbook example for automated vectorization in compiler demos.
Much of the speedup is gained by manually eliminating (only possible if the entire function can be reduced to 5 bytes or less), or at least simplifying calls along the critical code paths as far as possible. This doesn't even produce nearly as good results as an optimizing compiler could have because of many restrictions a compiler doesn't have to deal with in the first place, so every optimizing compiler can do and usually does an excellent job at this if told to do it. Skyrim would probably experience an execution speed gain of over 100% just by applying this single optimization, as it has drastic consequences to the amount of code that could be detected as being redundant and thus completely eliminated. I know that sounds exaggerated, and normally would be, but it isn't when you've read and profiled enough of the code to know just how bad the compiled code is.
Just 3 functions have truly been rewritten, everything else is either a variant form or an instruction-level simplification of functions consisting of things like "return *this;" which are at the very top in the profiled list because the compiler was obviously told not to inline it. So, every time a certain kind of pointer needs to be dereferenced, the game will call a lengthy function to do what can be (and is) replaced by a single instruction. Fixing this manually isn't feasible after a certain point, but the compiler can do this for the whole binary at the cost of just a few seconds extra compiling time and much better than ever possible by a human (at least at these code dimensions).
In general, the TESV code has pretty high register pressure. A huge part of this is simply due to the completely missing optimizations which would otherwise eliminate the unneeded allocations, but an x86_64 build would also definitely help improving this condition.
Jump targets are completely unaligned, including the so-called hot targets which are hit millions of times in short periods, leading to cache stress due to multiple fetches being required to execute a jump, whether correctly predicted or not. Optimizing compilers can automatically align them properly.
I guess I don't have to mention how bad the threading is; this isn't trivial to fix though. Just sad that it's almost 2012 and this thing can't even properly use two threads. Besides all the other obvious flaws, this is the main reason why the game is so strongly limited by the CPU. Single-core speed didn't grow nearly as much as the number of cores did. Everyone knew it 10 years ago, but back then they could still just wait for the hardware to provide the additional power needed to run the sloppy code - this trick doesn't work anymore.
Yeah, Skyrim is a nice game, but many obstacles we've got here have trivial fixes compared to the size of their respective payoffs (little to none extra coding required). Especially with over 10 million copies already sold, I somewhat expect that it will at least run on recent hardware without sub-30 framerates.
Question: Thanks for explaining (and of course for making the awesome mod). Interesting to read. Do you think you will be able to do more optimization fixes like this?
Theoretically, yes, of course. I'd currently estimate that an additional +10% could be achieved in the game executable itself using the same procedure as before. However, to fix a problematic function using only a normal person's tools, it has to be detected and isolated from the call graph, then its assembly code has to be reverse engineered, rewritten and tested until it can be replaced by a better performing variant.
Now that many of the biggest CPU hogs have been softened, there still is a lot of potential gains ahead, but acquiring them would require multiple times the work already invested for a smaller amount of additional performance. It all has to be hand-crafted - there are no tools to automate this. Worse yet, some of the work has to be redone when porting the changes to a different game version. This way, only very simple optimizations are even possible at all, due to the sheer amount of places where a call must be inlined. Most function calls suitable for inlining have a varying amount of setup and cleanup code for that function call and I just can't go through each of the (tens of!) thousands of references to figure out the optimal replacement at each point. A compiler does all this for free, within seconds - and not only for the first dozen of hottest targets, but for the whole game, which consists of millions of lines of assembly code.
My true hope is therefore that this little demo will create a demand that Bethesda must answer. Much like what happened with the LAA patch. This would be the best outcome for all and also the best base for any further optimizations that can only be hand-crafted.
Question: And did I understood you right that you said that if an optimization compiler would have been used by Bethesda from the start, we would have experienced over 100% better performance?
Yeah, pretty sure about that. The current patch doesn't modify even nearly 1% of the code, but still manages to cut the cycles per frame by 30-40%. The whole binary is full of redundant code, it's just way too much to ever do manually, but sums up quickly when eliminated by an optimizing compiler.
(...) turning on optimizations takes this much effort:
http://www.netrostar.com/Admin/Files/Images/Tutorials/1vsoptimization.jpg
http://img822.imageshack.us/img822/5635/aiosh.png
One of my friends found this here some where, and posted it at the other Eldar scrolls and other game forums I goto.