How to Build a Compiler?
Today, in the Advanced JavaScript Developers group, someone asked the question:
Musing from a Noob: so if you were to write a translator from one language to another, say JS to Java or something arbitrary, would that translator be written almost entirely in like regular expressions or something? My google-fu is solid, but I'm trying to get some professional opinions first.... What y'all think?
I'm no professional in compiler development (I never got a chance to take Brock University's awesome compiler course), but I once wrote a LISP interpreter in Java (takes in some LISP code and executes it).
For this, you might be able to hack your way through it with regular expressions, but I think the proper way is to build an Abstract Syntax Tree.
In all programming languages, there's the concept of a grammar. The grammar is something that defines all legal expressions in a language (much like English does).
For example, in JavaScript, we're allowed to do:
const x = 12
But we can't do:
const x y z = 12
The grammar that formalizes this might look something like:
"[variable type] [variable name] = [value]"
And we didn't follow that rule in the second example.
If languages each have a set of grammar rules, and those rules are all the legal expressions, then there are primarily two parts to building a compiler.
- Establish the language rules (grammar)
- Be able to read a file, parse it, then build an validate an Abstract Syntax Tree from that grammar. If you can't build the syntax tree, it's because some grammar is wrong (this is the concept of a syntax error).
There's also the validation of the tree itself.
For example, in Java, for a single tree, where a tree is a file (a program, like myApp.java), you're only allowed to have the public static void main (String[] args)
once in the program, because that's the entry point.
So I think in order to PROPERLY do this conversion, you need to:
- Build the grammar for JavaScript
- Build the grammar for Java
- Know what the conversion rules from one grammar to the other are
- Build the syntax tree for JS code
- For each node (or a group of nodes) in the tree, run it against the conversion rules for Java and build a new syntax tree (a Java one).
- Convert the syntax tree to Java source code!
- (optional) run it through a prettier thing to make it look clean :p
Definitely leave a comment below if you know a bit about the subject, you've built a compiler before, or you have some useful resources for others.
Resources
I found this page, Introduction to Programming Languages/Grammars. It seems really useful.
There's also Babel, a modern JavaScript compiler that you might have used. I suspect peeking at the source code here would be insightful :)
Other stuff:
- [Free] https://www.craftinginterpreters.com/the-lox-language.html
- [Paid, but the content is really good apparently!] https://coded-classes.dpdcart.com/product/81271
Stay in touch!
Join 20000+ value-creating Software Essentialists getting actionable advice on how to master what matters each week. 🖖
View more in Software Engineering