Sound manipulation via script has always been a weakness in the engine unfortunately, obse had some extra commands for changing the different volumes, dont recall seeing them in fose yet though.
If its not too much action you could probably get away with it, as long as the music is dynamic enough to hold attention.
Id be more concerned about the cut-scene scripting, as it is quite doable but quite laborious. Thats if your going to do any camera moves or just have it POV.
Edit: One perk to scripted camera moves in your case is you can move it away from the action, thus achieving the effect your after. But if your not up to scratch on your maths your in for a, LONG, fun ride