How to Build a Compiler?
Musing from a Noob: so if you were to write a translator from one language to another, say JS to Java or something arbitrary, would that translator be written almost entirely in like regular expressions or something? My google-fu is solid, but I'm trying to get some professional opinions first.... What y'all think?
I'm no professional in compiler development (I never got a chance to take Brock University's awesome compiler course), but I once wrote a LISP interpreter in Java (takes in some LISP code and executes it).
For this, you might be able to hack your way through it with regular expressions, but I think the proper way is to build an Abstract Syntax Tree.
In all programming languages, there's the concept of a grammar. The grammar is something that defines all legal expressions in a language (much like English does).
const x = 12
But we can't do:
const x y z = 12
The grammar that formalizes this might look something like:
"[variable type] [variable name] = [value]"
And we didn't follow that rule in the second example.
If languages each have a set of grammar rules, and those rules are all the legal expressions, then there are primarily two parts to building a compiler.
- Establish the language rules (grammar)
- Be able to read a file, parse it, then build an validate an Abstract Syntax Tree from that grammar. If you can't build the syntax tree, it's because some grammar is wrong (this is the concept of a syntax error).
There's also the validation of the tree itself.
For example, in Java, for a single tree, where a tree is a file (a program, like myApp.java), you're only allowed to have the
public static void main (String args) once in the program, because that's the entry point.
So I think in order to PROPERLY do this conversion, you need to:
- Build the grammar for Java
- Know what the conversion rules from one grammar to the other are
- Build the syntax tree for JS code
- For each node (or a group of nodes) in the tree, run it against the conversion rules for Java and build a new syntax tree (a Java one).
- Convert the syntax tree to Java source code!
- (optional) run it through a prettier thing to make it look clean :p
Definitely leave a comment below if you know a bit about the subject, you've built a compiler before, or you have some useful resources for others.
I found this page, Introduction to Programming Languages/Grammars. It seems really useful.
- [Free] https://www.craftinginterpreters.com/the-lox-language.html
- [Paid, but the content is really good apparently!] https://coded-classes.dpdcart.com/product/81271
Liked this? Sing it loud and proud 👨🎤.
Stay in touch!
View more in Software Engineering
You may also enjoy...
A few more related articles
Software Design and Architecture is pretty much it's own field of study within the realm of computing, like DevOps or UX Design. Here's a map describing the breadth of software design and architecture, from clean code to microkernels.
Learn how to use DDD and object-oriented programming concepts to model complex Node.js backends.
Want to be notified when new content comes out?
Join 5000+ other developers learning about Domain-Driven Design and Enterprise Node.js.
I won't spam ya. 🖖 Unsubscribe anytime.