HLA v3.0 Preview and Source Code

Although the release of HLA v3.0 is still far off into the future, many people have requested a sneak preview of the system (in particular the source code) so they can start learning about HLA v3.0's internal operation while the source code is still managable (i.e., it's easier to start learning the source code when it's smaller than 100,000 lines of code). Others have requested the source code to use in their own assembler projects. Though the average HLA user won't find the code here of much use, those who are interested in participating in the development of HLA in the future may want to start studying this code.

Although HLA v1.x has not (particularly) been an open source development project, I am planning on opening HLA v3.0 up to user participation once I get past the parsing of the declaration sections (which are very complex and require considerable knowledge in the area of compiler theory). If you're interested in working on the HLA project, here are some suggestions and areas where I expect I'll need some help:

1. Read the source code and get comfortable with the algorithms. My intent is to support questions about the HLA v3.0 source code via the HLA/AoA programming group/mailing list at http://groups.yahoo.com/group/aoaprogramming. Definitely consider joining this group if you want to learn about the HLA v3.0 source code status or want to as questions.

2. Read the preliminary documentation on this page!

3. It wouldn't hurt to study a little compiler theory. Check out the comp.compilers newsgroup if you want some pointers to excellent (and practical) primers on compiler theory on the internet.

4. Figure out how you would like to contribute.

Currently, here are some ideas I've had for contributions from the HLA user base:

A. Porting the HLA Standard Library to HLA v3.0 and other OSes (this is a long term goal).

B. Writing native code generators for various OSes. Also work on assembly output from HLA (sort of like HLA v1.x does, but for a wider variety of assemblers).

C. Writing test code sequences to provide regression testing for HLA v3.0.

D. Work on documenting the internal operation of HLA.

E. Work on assemblers for different CPUs using the HLA "front end".

F. Work on optimizers for HLA.

G. Work on 64-bit versions of HLA.

H. Porting HLA v3.0 to other OSes (initial development is for Windows, but the code is *mostly* portable to other OSes like Linux; some work is needed here, however).

Once I finish off the declaration section and HLA compile-time language, I intend to package the source code as an "assembler developer's kit" for those who would like to write their own assembler (using a different syntax than HLA) but don't want to have to write all the (boring) code to handle complex declarations, macros, compile-time language, and other facilities that make HLA so powerful. It will be at this release that I will actively seek help from the user community on HLA v3.0. In the meantime, however, it's a good idea for interested individuals to start studying the HLA source code.

Note that HLA v3.0 is written strictly with HLA v1.x. So it goes without saying that you'll need to know HLA in order to study the source code (OTOH, you don't need to know C, Flex, or Bison to study the source code, as is necessary when looking at the HLA v1.x source code). Do note that the HLA v3.0 source code uses some more advanced facilities of the HLA language, so be ready to dust off the HLA reference manual when reading through the source code. Of course, any questions you have about how the whole system operates can be answered in the aoaprogramming group specified earlier.

HLA v3.0 Source Code and Documentation

HLA v3.0 source code

Zip file containing the HLA v1.0 source code for HLA v3.0 in its current state. Note: this code is compiled using MASM as the back end for HLA v1.x. The code currently makes some non-portable calls to some Win32 API routines; but this will be corrected in a future version of HLA v3.0.

2/26/2006- This version contains a large number of defect corrections. Many additional features in the declarations parser were added or cleaned up. Approximately 1,000 assert statements were added to the code to help catch errors that creep into the project in the future. The big addition in this version is the inclusion of an automated regression test suite (with between 700 and 800 test programs that do a code coverage test on the HLA v3.0 source code).

12/31/2004- This version includes macro facilities (standard and HLA's context-free macros). Also improved include file and TEXT object handling. Fixed several problems with conditional assembly and tracked down several other defects.

8/12/2004 - This version has most of the declaration stuff completed. Just missing macros and a few other little details.

12/31/2003- Initial release with lexical analyzer, symbol table code (not totallycomplete), and some parsing/expression evaluation code in place (total: about 55,000 lines of code; though not all of this is part of the compiler proper; also [for those frightened by this line count] a good number of the lines of code were machine-generated and you wouldn't normally read them).

Discussion of the HLA v3.0 Lexical Analyzer (PDF)

This document describes the HLA "lexical analyzer" that recognizes reserved words and special symbols for use by the assembler/parser. This is a very high-performance lexer/scanner; those writing other assemblers will definitely want to take a look at HLA's new lexical analyzer as there are a lot of really good ideas in this code. It is very easy to add new reserved words to HLA v3.0 (or any other product using the HLA lexer). This documentation explains how to do this.

Discussion of the HLA v3.0 Symbol Table (PDF)

One of the secrets behind HLA's power and speed is the sophistication of the symbol table that it uses. Without question, the symbol table data structure and the routines that manipulate the symbol table are among the most complex in the HLA source code. This document describes the format of the HLA symbol table and how each of the fields in the symbol table data structure are used.

Conceptual Discussion of the HLA Intermediate Code Format (PDF)

Unlike HLA v1.x, HLA v3.0 will be a true assembler, capable of directly emitting object code files. However, to satisfy those who also want assembly code output, HLA v3.0 will provide replaceable code generators that allow a user to plug in a new code generator to create output in any format they desire (well, at least for those formats for which a code generator has been written). To achieve this, HLA v3.0 will assemble the code to an internal intermediate form and then a code generator module will convert this intermediate form to an output format. The ultimate goal is to allow third parties to supply different code generators for different assemblers, object module formats, etc. Although the exact composition of the internal intermediate form is yet to be defined, this document provides a brief "technology" explanation of the plans for HLA.

HLA v3.0 Samples and Other Goodies

Performance Comparison Between HLA v1.x and HLA v3.0

Probably the biggest complaint HLA v1.x users have has been the glacial performance of the compiler. Although lots of tricks have been played to improve the compilation speed over the years, the truth is, HLA v1.x is slow when you start working on large projects. Improved performance has always been a promise for HLA v3.0. Although HLA v3.0 is in the early stages of development, the performance improvements over HLA v1.x are quickly becoming apparent. Included with the HLA v3.0 source package is a sample file that demonstrates the difference in speed between the two products: u2.hla3

Here's the times measured on a 2.0 GHz PIV machine:

Assembler u2.hla3 (copied to u2.hla when compiled via HLA v1.x)
HLA v1.60 93.5 seconds
HLA v3.0 (with symbol table dump facility disabled). 0.25 seconds

As you can see, there is a small difference between the two! Though HLA v3.0 isn't far enough along to determine how fast it will be processing normal assembly source files, it's pretty obvious that HLA v3.0 is going to be fast! By the way, this was faster than MASM, which tends to handle source files like this one much better than other assemblers (TASM and FASM were unable to process this file because of memory constraints, so no timing is currently available for them).

Try this on your favorite assembler! (Generally, a search and replace operation is all that's needed to convert the HLA source code to a form acceptable to other assemblers).

No guarantees that HLA v3.0 will be the faster assembler ever, but it's certainly going to be up there with the fastest assemblers available for the x86.