Some of the technical stuff in that release that will probably just confuse you:
If you set CriticalSections\bEnableProfiling to 1 and LightCriticalSections\bEnableProfiling to 1 in the ini file then the game will periodically print summaries of critical section performance to the log file. A summary might look like this:
Critical Sections summary(PD3): time 409 tt: 2.4s tc:17k dynamic LCS 035B0044 1241ms 1082 1147us 20ms -99 86D7EA 86D94C static CS 011792B4 218ms 244 893us 5ms 1 BD3B65 BD1DA2 BD3F32 BD3D78 ... static CS 01179690 110ms 440 251us 110ms 1 BECEA1 BEE098 BF35DD dynamic CS 1808444C 161ms 164 981us 27ms 1 8284F7 878F95 876E9D 657FE5 ... dynamic LCS 035B10AC 116ms 58 1988us 20ms -99 86D7EA 86D94C dynamic LCS 035B2114 98ms 58 1692us 20ms -99 86D94C 86D7EA static LCS 0116FCA0 68ms 126 536us 23ms -99 AE428D AE6710 dynamic CS 1808454C 67ms 29 2290us 10ms 1 828509 873172 88B93C dynamic LCS 035B3A4C 39ms 7 5495us 20ms -99 86D7EA 86D94C dynamic LCS 035BCA5C 38ms 18 2113us 19ms -99 86D7EA 86D94C dynamic LCS 035B297C 38ms 13 2901us 19ms -99 86D7EA 86D94C static LCS 0106E5A0 34ms 12k 3us 10ms -99 42364D 58791B AE7D33 ADE578 ... static LCS 0108EDE0 21ms 29 704us 19ms -99 8268B8 82735D 84B6D1 dynamic LCS 035B7924 19ms 3 6061us 19ms -99 86D7EA 86D94C dynamic LCS 035B531C 19ms 14 1329us 19ms -99 86D94C 86D7EA dynamic CS 035A6250 27ms 1459 18us 6ms 4 86E55A 86E23A static CS 01073AB4 19ms 43 444us 10ms 1 58A85B 591488 58D262 58D104 ... dynamic CS 18FA3CDC 25ms 286 89us 3ms 1 BCF421 BCF1B8 dynamic CS 181467CC 17ms 4 4056us 11ms 0 BCB86B BC9DBE dynamic LCS 035C8CDC 3ms 43 64us 3ms -99 86C9D5 86C877
The form is:
A: "static" or "dynamic"
B: "CS" or "LCS"
C: object address in hexadecimal
D: total time spent blocking on it, in milliseconds
E: number of times it was blocked on
F: average time spent blocking on it, in microseconds (ie the previous two divided by each other)
G: the single longest time spent blocking on it, in milliseconds
H: the highest number of threads blocking on it at once (currently misreported as -99 by LCSes)
I: a list of caller addresses in hexadecimal, in the order that they were detected; if more than a few are detected it will print the first few then a "..." to indicate that too many to print were found
Generally, the total time a specific CS or LCS spent blocking ("D" in the list above) is the biggest indicator that it may have been responsible for performance issues in general.
If the single longest times blocked on a specific CS/LCS ("G" in the list above) is high that indicates that the CS/LCS in question may have causes the game to freeze for brief periods (ie stutter). If you suspect this to be the case you may try searching the log for that objects address, as in addition to the summaries the profiling stuff also prints lines about specific times Oblivion/Fallout blocked, though only if the blocking lasted for a while. Thus, you can find out how many times a critical section blocked for a long time by searching the log for those lines, and you can also find out which thread blocked on it (thread 1 blocking for more than 33 milliseconds will always result in a stutter at 30 FPS; other threads blocking for a long time will often cause problems, but not always, and it may take them longer before it qualifies as a problem). Currently those lines may misidentify LCSes as CSes, that will get fixed in the next version.
If a CS/LCS blocked many times, with a large total-times-blocked but a smaller longest-time-blocked then that CS/LCS is more likely to cause general performance reduction (ie reduced FPS) than actual stuttering.
The performance summaries printed are cumulative - a summary printed at the end of a hour long game session includes all blocking events for the entire hour, not just the recent ones.
If a CS/LCS caused problems for a smaller part of the time, but had no issues most of the time, then the information about its problems could get diluted by all the times it wasn't having problems. You can check for this by comparing multiple summaries, or by getting a log of a game session that included mostly just problem times.
Once a critical section is suspected of being a problem, what can be done about it?
Well, really, the critical sections largely reflect the structure of Oblivion/Fallout code, which (mostly) nothing can be done about. But... some things can be tried, and in a few cases they have been found to be quite helpful.
One thing that can be done is a critical section or light critical section can be suppressed. This forces the program to never block on that CS/LCS, usually causing the program to behave incorrectly and/or crash. In some cases however there seem to be little or no harmful consequences.
A less-likely-to-crash possibility is that the CS/LCS behavior can be tweaked. Its behavior can be modified to make it give priority to threads under specific circumstances, or to make it more willing to spend CPU time to reduce real time spent blocking. However, Stutter Remover is already applying a set of default tweaks (determined by CriticalSection\iDefaultMode and other such settings) to each CS & LCS, to get a noticable improvement requires that a set of alternate settings be not just noticably better than vanilla Oblivion/Fallout behavior but noticably better than default OSR/FSR behavior.
The "OverrideList" section lists specific objects that you want deviations from the default behavior on. For each override, you specify a type of object, a way of finding that object, and what behavior changes you want. Note that overrides of a specific type will be ignored if the appropriate bUseOverrides setting is 0.
For critical sections and light critical sections, the way of find it is either by object address or by caller address. All addresses must have an "0x" added at the beginning, otherwise Stutter Remover will try to interpret them as decimal and get very confused. If the CS/LCS is listed as "static" in the profiling summary, then object addresses should work fine, just set it to the object address the profiling stuff listed. If the CS/LCS is listed as dynamic instead, then the object address has a habit of changing with the phase of the moon or other arcane circumstances, so another method should be used (though you can often get away with testing a change using the object address of a dynamic object address, just be aware that it will probably silently fail when you switch computers, reboot, update drivers, or somesuch thing). Caller address is currently the only alternative. However, for performance reason caller address currently only works if the first caller is the one you specified. For most critical sections, the first one in the list never changes, so you can simply use that address, but for most LCSes and some rare CSes the order may change from run to run, in which case you may need to create multiple override entries for each possible first caller.
Once you've told it how to find a CS or LCS, then you tell it the behavior differences you want. Currently this means a line like "Spin = 500" or "Mode = 1".
These are the default overrides atm:
OverrideList = { CriticalSection = { CallerAddress = 0x828509 Mode = 5 comment = Renderer+0x180, recommendation=suppress (mode 5) } LightCriticalSection = { CallerAddress = 0x86D7EA Mode = 3 comment = MemoryPool (multiple LCSes), recommendation=stutter (mode 3) } LightCriticalSection = { CallerAddress = 0x86D94C Mode = 3 comment = MemoryPool alternate caller (multiple LCSes), recommendation=stutter (mode 3) } LightCriticalSection = { ObjectAddress = 0x1079FE0 Mode = 2 comment = GarbageCollector, recommendation=fair (mode 2) }}
The modes are:
1 = more or less vanilla behavior - often tends to give priority to threads trying to reenter shortly after exiting
2 = increased fairness (attempts to let threads in in the order that tried to enter)
3 = stutter (vary behavior once in a while)
5 = suppress (will generally crash, but when it works its the highest performance)
6 = attempts to give priority to the main thread
7 = attempts to reduce priority for the main thread
CSes and LCSes use different implementations of each mode, so the actual meanings may vary, but both attempt to use the same basic concept for a given mode number.
Other "Mode" values will tend to generate warnings or error messages and either have no effect or cause Oblivion/Fallout to exit immediately.
Spincounts:
low = minimize CPU resources used (in theory; practice differs)
high = minimize time blocked at the expensive of increased CPU usage (in theory; practice differs)
0 = very low
500 = low
1000 = medium-low
2000 = medium
4000 = high
In practice, I think "low" has tended to work better as the default setting.
Fallout3 CSes and LCSes with known names, purposes, or characteristics:
Renderer+0x180 CS: called from 0x828509; this CS seems to be okay to suppress; this was a major source of performance problems in Oblivion, slightly less so in Fallout.
Heap LCSes: These are dynamic LCSes called from 86D7EA or 86D94C. They are created and used inside of Fallouts heap, and become irrelevant if the heap is replaced. I think the equivalent CS in Oblivion worked best when set to stuttering, so that's what I've set these to in the default ini of FSR for the moment.
GarbageCollector LCS: static, object address is 0x1079FE0
ExtraData LCS: static, object address is 0x0106C980
TESObjectCell::CellRefLockEnter LCS: static, object address is 0x00DCD0A8
GarbageCollector LCS: static, object address is 0x01079FE0
NavMeshObstacleManager+0x100: dynamic, called from 0x005E920D among other addresses