[WIP] PIP - Papyrus Improvement Project

Post » Mon Sep 02, 2013 8:32 pm

Introduction

I am rewriting the papyrus compiler.

Why? A number of things. Firstly, I was looking for a long term "back burner" type project that I could work on for those times when I get burned out on Skyrim and don't have any other mods in the pipeline. Secondly I was looking for a challenge. Thirdly, I made the mistake of disassembling some pex files and was horrified by the amount of superfluous assignments and temporary variables.

As I consider this to be an educational project I will be writing the compiler entirely by hand, eschewing such tools as ANTLR.

My main programming language is Java so that is what I will use to create this. I have next to no experience with C++. Although I never got to complete a computer science degree (or any degree for that matter) I have worked in the IT industry for nearly a quarter of a century:- hopefully all that experience will help me to pull this off.

Note this is a long term project. I don't anticipate having anything available before the end of this year.

Objectives

Optimize out a bunch of unnecessary assignments and temporary variables.

Other optimizations (such as dead code removal or common expression elimination).

Add prefix/postfix increment/decrement operators (++ and --).

Add for loop construct.

Add ternary assignment ( ?: ) operator.

Allow variables in new array creation.

Add directory level configuration files so that psc files can auto-compile no matter where they are located.

All suggestions are welcome

The Plan

The current plan is to write the compiler in six stages.

Front End

  1. Lexer - This stage takes the source code and splits it into a list of tokens, each token representing a source code element.
  2. Parse - This stage takes the list of tokens and generates an abstract syntax tree representing the program.
  3. Gather - This stage picks out high level definitions such as variables or functions and updates the abstract syntax tree. This stage also retrieves other objects/files that are referenced in the abstract syntax tree and performs front end processing on them.
  4. Semantics - This stage validates the abstract syntax tree against the additional information from the gather step.

Back End

  1. Optimize - This stage performs various transformations on the abstract syntax tree to improve code speed and size.
  2. Generate - This stage generates a Papyrus assembly file.

The optimize step will not be tackled until I have the rest of the compiler working.

Current Status

I will add tasks here as I start working on them, otherwise I will use the following color codes:

Started but not a major focus -- Actively being worked on -- Completed

Lexer - Build token list from source code.

Lexer - Add JUnit testing to test project.

Parser - Main parser (Recursive descent algorithm).

Parser - Expression parser (Recursive descent algorithm for operands, shunting yard algorithm for operators).

Parser - Main parser - Add JUnit testing to test project.

Parser - Expression parser - add JUnit testing to test project.

User avatar
gemma
 
Posts: 3441
Joined: Tue Jul 25, 2006 7:10 am

Post » Mon Sep 02, 2013 10:38 pm

This is excellent news!!

It is obvious you have a notion of the inner workings of a compiler and you clearly have an established plan of action. I'm sure you will succeed. Best of luck yo you!

PS: It'd be great if you consider supporting SKSE as many many mods make use of it.

User avatar
meg knight
 
Posts: 3463
Joined: Wed Nov 29, 2006 4:20 am

Post » Tue Sep 03, 2013 1:18 am

Sounding completely naive, and please feel free to answer in layman terms, what will this allow mod creators to do vs now? :)
User avatar
Kelvin Diaz
 
Posts: 3214
Joined: Mon May 14, 2007 5:16 pm

Post » Tue Sep 03, 2013 10:39 am

I

It sounds like he is simply making a better compiler so that the script binaries (the files actually used at run time) are cleaner and more efficient. A sloppy compiler can create inefficient and possibly bug-prone executables. So the answer would be that mod creators can't do anything new if all he does is improve what it is currently doing, although it may allow them to use more scripting in mods without bogging down the game engine so much. On the other hand, he may be able to extend the language to do more than it does now. I have no idea what the script assembly language is though, and if more can be done with it. If the improvements to efficiency are significant, this could make a noticeable performance difference for certain types of mods that generate a lot of script activity, such as Player Headtracking or Footprints or Wet & Cold. This may end of having the same kind of impact to Skyrim playability as a mod like ENBoost.

User avatar
Ashley Tamen
 
Posts: 3477
Joined: Sun Apr 08, 2007 6:17 am


Return to V - Skyrim