How to Build a Compiler?

Last updated Invalid date
A generalization of compiling code involves understanding the programming language grammar, parsing it, organizing it into a syntax tree, and then evaluating expressions.

Today, in the Advanced JavaScript Developers group, someone asked the question:

Musing from a Noob: so if you were to write a translator from one language to another, say JS to Java or something arbitrary, would that translator be written almost entirely in like regular expressions or something? My google-fu is solid, but I'm trying to get some professional opinions first.... What y'all think?

I'm no professional in compiler development (I never got a chance to take Brock University's awesome compiler course), but I once wrote a LISP interpreter in Java (takes in some LISP code and executes it).

For this, you might be able to hack your way through it with regular expressions, but I think the proper way is to build an Abstract Syntax Tree.

In all programming languages, there's the concept of a grammar. The grammar is something that defines all legal expressions in a language (much like English does).

For example, in JavaScript, we're allowed to do:

const x = 12

But we can't do:

const x y z = 12

The grammar that formalizes this might look something like:

"[variable type] [variable name] = [value]"

And we didn't follow that rule in the second example.

If languages each have a set of grammar rules, and those rules are all the legal expressions, then there are primarily two parts to building a compiler.

  • Establish the language rules (grammar)
  • Be able to read a file, parse it, then build an validate an Abstract Syntax Tree from that grammar. If you can't build the syntax tree, it's because some grammar is wrong (this is the concept of a syntax error).

There's also the validation of the tree itself.

For example, in Java, for a single tree, where a tree is a file (a program, like myApp.java), you're only allowed to have the public static void main (String[] args) once in the program, because that's the entry point.

So I think in order to PROPERLY do this conversion, you need to:

  1. Build the grammar for JavaScript
  2. Build the grammar for Java
  3. Know what the conversion rules from one grammar to the other are
  4. Build the syntax tree for JS code
  5. For each node (or a group of nodes) in the tree, run it against the conversion rules for Java and build a new syntax tree (a Java one).
  6. Convert the syntax tree to Java source code!
  7. (optional) run it through a prettier thing to make it look clean :p

Definitely leave a comment below if you know a bit about the subject, you've built a compiler before, or you have some useful resources for others.

Resources

I found this page, Introduction to Programming Languages/Grammars. It seems really useful.

There's also Babel, a modern JavaScript compiler that you might have used. I suspect peeking at the source code here would be insightful :)

Other stuff:



Discussion

Liked this? Sing it loud and proud 👨‍🎤.



Stay in touch!



About the author

Khalil Stemmler,
Software Essentialist ⚡

I'm Khalil. I turn code-first developers into confident crafters without having to buy, read & digest hundreds of complex programming books. Using Software Essentialism, my philosophy of software design, I coach developers through boredom, impostor syndrome, and a lack of direction to master software design and architecture. Mastery though, is not the end goal. It is merely a step towards your Inward Pull.



View more in Software Engineering



You may also enjoy...

A few more related articles

GraphQL Schemas vs. RESTful DTOs
GraphQL schemas serve a similar purpose to RESTful DTOs. One of the main differences is tooling. In this post, I aim to strengthen...
Client-Side Architecture Basics [Guide]
Though the tools we use to build client-side web apps have changed substantially over the years, the fundamental principles behind...
Accidental and Essential Complexity
Determining if the complexity of a problem is related to the nature of the problem or related to the way that we approach solving ...
How I Write Testable Code | Khalil's Simple Methodology
The single biggest thing that improved the quality of my designs was understanding how dependencies influence my ability to write ...