Introduction
I am rewriting the papyrus compiler.
Why? A number of things. Firstly, I was looking for a long term "back burner" type project that I could work on for those times when I get burned out on Skyrim and don't have any other mods in the pipeline. Secondly I was looking for a challenge. Thirdly, I made the mistake of disassembling some pex files and was horrified by the amount of superfluous assignments and temporary variables.
As I consider this to be an educational project I will be writing the compiler entirely by hand, eschewing such tools as ANTLR.
My main programming language is Java so that is what I will use to create this. I have next to no experience with C++. Although I never got to complete a computer science degree (or any degree for that matter) I have worked in the IT industry for nearly a quarter of a century:- hopefully all that experience will help me to pull this off.
Note this is a long term project. I don't anticipate having anything available before the end of this year.
Objectives
Optimize out a bunch of unnecessary assignments and temporary variables.
Other optimizations (such as dead code removal or common expression elimination).
Add prefix/postfix increment/decrement operators (++ and --).
Add for loop construct.
Add ternary assignment ( ?: ) operator.
Allow variables in new array creation.
Add directory level configuration files so that psc files can auto-compile no matter where they are located.
All suggestions are welcome
The Plan
The current plan is to write the compiler in six stages.
Front End
- Lexer - This stage takes the source code and splits it into a list of tokens, each token representing a source code element.
- Parse - This stage takes the list of tokens and generates an abstract syntax tree representing the program.
- Gather - This stage picks out high level definitions such as variables or functions and updates the abstract syntax tree. This stage also retrieves other objects/files that are referenced in the abstract syntax tree and performs front end processing on them.
- Semantics - This stage validates the abstract syntax tree against the additional information from the gather step.
Back End
- Optimize - This stage performs various transformations on the abstract syntax tree to improve code speed and size.
- Generate - This stage generates a Papyrus assembly file.
The optimize step will not be tackled until I have the rest of the compiler working.
Current Status
I will add tasks here as I start working on them, otherwise I will use the following color codes:
Started but not a major focus -- Actively being worked on -- Completed
Lexer - Build token list from source code.
Lexer - Add JUnit testing to test project.
Parser - Main parser (Recursive descent algorithm).
Parser - Expression parser (Recursive descent algorithm for operands, shunting yard algorithm for operators).
Parser - Main parser - Add JUnit testing to test project.
Parser - Expression parser - add JUnit testing to test project.