Elixir might not seem the obvious first choice for a language to build a compiler in but it might be a better fit than you think. Here are some of the techniques I used to develop a compiler in Elixir.
Compiler
This compiler takes as input
- a textual description of a musical arrangement – the program
- a textual definition of an audio file, detailing many aspects of an audio-source such as tempo, musical meter, alignment marks, audio level calibrations, the degree of rhythmic swing, and the positions and categorisations and characteristics of musical phrases it contains.
These both need to be scanned (lexical analysis), symbol tables built and the code generated to implement an arrangement based on the supplied program. This compiler runs as a component of a web service.
Elixir
Elixir is a modern language, strong on textual processing with comprehensive regular expression and string manipulation libraries. It has fundamental support for textual lists (implemented as linked lists) and maps. Elixir maps are a data structure where a key relates to a value. A key does not have to be a scalar type. It can be a string for instance. In this way, it may be considered similar to a hash in perl or an associative array in awk.
Why Elixir?
Elixir is built on the Erlang VM. It is a purely functional language and integrates with Erlang. The Erlang Open Telecom Platform (OTP) has good infrastructure for process supervision and resilience generally. Importantly, programs which are written in Elixir, being pure functions, scale well. It was a design goal that the underlying web service scales well. Phoenix is a web server written in Elixir. Phoenix / Elixir was chosen as the web-service framework, with the service implemented in Elixir.
Lexical Analysis
No Lex and Yacc style tools were used. Scanners were implemented using regular expressions.
Syntax Analysis
A recursive descent approach was taken for the syntax analysis with progressive decomposition by calling functions. At each level, lexicons were generated with regular expressions, the syntax analysed and then further decomposed by calling appropriate functions to continue the lexical and syntactic analysis.
Example FileName Parser
Here is an example of the lexical and syntactic approach with the parsing of an arrangement name
This function determines the syntactical structure and then calls the appropriate next-level function based on the match. There are four possible matches. All have the “N” part. The “I” and “E” parts are optional. In this case, the next level parsing function is passed the entire string though often it is passed components derived from the higher level. The function has been designed for readability and verification, not performance. The performance of the algorithm can be improved by not performing any further Regex.scan’s once one is successful.
Here is a fragment of one of the next-level functions which continue the parsing further.
The Regex.scan()’s regular expression used may return 4, 5, or 6 parts depending on the supplied string. These are extracted and then the logic continues to parse those components as required. This is a classical recursive decent approach. The final line of the function returns the parsed and converted string. The function of this library is to parse names converting them into an intermediary code that is more convenient for the code generation phase. Effectively it expands macros and converts everything into a sequence of musical section identifiers which indicates a section’s
- length in bars,
- type – e.g. whether or not it is a bridge,
- whether it is an introduction, chorus or ending,
- whether it is a component of a first, middle or last chorus, and
- the position of any breaks (holds) or pushes to the rhythm.