9.6 Make Files
Although using separate compilation reduces assembly time and promotes code reuse and modularity, it is not without its own drawbacks. Suppose you have a program that consists of two modules: pgma.hla and pgmb.hla. Also suppose that you've already compiled both modules so that the files pgma.obj and pgmb.obj exist. Finally, you make changes to pgma.hla and pgmb.hla and compile the pgma.hla file but forget to compile the pgmb.hla file. Therefore, the pgmb.obj file will be out of date since this object file does not reflect the changes made to the pgmb.hla file. If you link the program's modules together, the resulting executable file will only contain the changes to the pgma.hla file, it will not have the updated object code associated with pgmb.hla. As projects get larger they tend to have more modules associated with them, and as more programmers begin working on the project, it gets very difficult to keep track of which object modules are up to date.
This complexity would normally cause someone to recompile all modules in a project, even if many of the object files are up to date, simply because it might seem too difficult to keep track of which modules are up to date and which are not. Doing so, of course, would eliminate many of the benefits that separate compilation offers. Fortunately, there is a tool that can help you manage large projects: make1. The make program, with a little help from you, can figure out which files need to be reassemble and which files have up to date .obj files. With a properly defined make file, you can easily assemble only those modules that absolutely must be assembled to generate a consistent program.
A make file is a text file that lists compile-time dependencies between files. An .exe file, for example, is dependent on the source code whose assembly produce the executable. If you make any changes to the source code you will (probably) need to reassemble or recompile the source code to produce a new executable file2.
Typical dependencies include the following:
- An executable file generally depends only on the set of object files that the linker combines to form the executable.
- A given object code file depends on the assembly language source files that were assembled to produce that object file. This includes the assembly language source files (.hla) and any files included during that assembly (generally .hhf files).
- The source files and include files generally don't depend on anything.
A make file generally consists of a dependency statement followed by a set of commands to handle that dependency. A dependency statement takes the following form:
dependent-file : list of filespgm.exe: pgma.obj pgmb.obj --Windows/nmake exampleThis statement says that "pgm.exe" is dependent upon pgma.obj and pgmb.obj. Any changes that occur to pgma.obj or pgmb.obj will require the generation of a new pgm.exe file. This example is Windows-specific, here's the same makefile statement in a Linux-friendly form:
pgm: pgma.o pgmb.o --Linux/make exampleThe make program uses a time/date stamp to determine if a dependent file is out of date with respect to the files it depends upon. Any time you make a change to a file, the operating system will update a modification time and date associated with the file. The make program compares the modification date/time stamp of the dependent file against the modification date/time stamp of the files it depends upon. If the dependent file's modification date/time is earlier than one or more of the files it depends upon, or one of the files it depends upon is not present, then make assumes that some operation must be necessary to update the dependent file.
When an update is necessary, make executes the set of commands following the dependency statement. Presumably, these commands would do whatever is necessary to produce the updated file.
The dependency statement must begin in column one. Any commands that must execute to resolve the dependency must start on the line immediately following the dependency statement and each command must be indented one tabstop. The pgm.exe statement above (the Windows example) would probably look something like the following:
pgm.exe: pgma.obj pgmb.obj hla -opgm.exe pgma.obj pgmb.obj(The "-opgm.exe" option tells HLA to name the executable file "pgm.exe.") Here's the same example for Linux users:
pgm: pgma.o pgmb.o hla -opgm pgma.obj pgmb.objIf you need to execute more than one command to resolve the dependencies, you can place several commands after the dependency statement in the appropriate order. Note that you must indent all commands one tab stop. The make program ignores any blank lines in a make file. Therefore, you can add blank lines, as appropriate, to make the file easier to read and understand.
There can be more than a single dependency statement in a make file. In the example above, for example, executable (pgm or pgm.exe) depends upon the object files (pgma.obj or pgma.o and pgmb.obj or pgmb.o). Obviously, the object files depend upon the source files that generated them. Therefore, before attempting to resolve the dependencies for the executable, make will first check out the rest of the make file to see if the object files depend on anything. If they do, make will resolve those dependencies first. Consider the following (Windows) make file:
pgm.exe: pgma.obj pgmb.obj hla -opgm.exe pgma.obj pgmb.objpgma.obj: pgma.hla hla -c pgma.hlapgmb.obj: pgmb.hla hla -c pgmb.hlaThe make program will process the first dependency line it finds in the file. However, the files that pgm.exe depends upon themselves have dependency lines. Therefore, make will first ensure that pgma.obj and pgmb.obj are up to date before attempting to execute HLA to link these files together. Therefore, if the only change you've made has been to pgmb.hla, make takes the following steps (assuming pgma.obj exists and is up to date).
- 1. The make program processes the first dependency statement. It notices that dependency lines for pgma.obj and pgmb.obj (the files on which pgm.exe depends) exist. So it processes those statements first.
- 2. the make program processes the pgma.obj dependency line. It notices that the pgma.obj file is newer than the pgma.hla file, so it does not execute the command following this dependency statement.
- 3. The make program processes the pgmb.obj dependency line. It notes that pgmb.obj is older than pgmb.hla (since we just changed the pgmb.hla source file). Therefore, make executes the command following on the next line. This generates a new pgmb.obj file that is now up to date.
- 4. Having processed the pgma.obj and pgmb.obj dependencies, make now returns its attention to the first dependency line. Since make just created a new pgmb.obj file, its date/time stamp will be newer than pgm.exe's. Therefore, make will execute the HLA command that links pgma.obj and pgmb.obj together to form the new pgm.exe file.
Note that a properly written make file will instruct the make program to assemble only those modules absolutely necessary to produce a consistent executable file. In the example above, make did not bother to assemble pgma.hla since its object file was already up to date.
There is one final thing to emphasize with respect to dependencies. Often, object files are dependent not only on the source file that produces the object file, but any files that the source file includes as well. In the previous example, there (apparently) were no such include files. Often, this is not the case. A more typical make file might look like the following (Linux example):
pgm: pgma.o pgmb.o hla -opgm pgma.o pgmb.opgma.o: pgma.hla pgm.hhf hla -c pgma.hlapgmb.o: pgmb.hla pgm.hhf hla -c pgmb.hlaNote that any changes to the pgm.hhf file will force the make program to recompile both pgma.hla and pgmb.hla since the pgma.o and pgmb.o files both depend upon the pgm.hhf include file. Leaving include files out of a dependency list is a common mistake programmers make that can produce inconsistent executable files.
Note that you would not normally need to specify the HLA Standard Library include files nor the Standard Library ".lib" (Windows) or ".a" (Linux) files in the dependency list. True, your resulting exectuable file does depend on this code, but the Standard Library rarely changes, so you can safely leave it out of your dependency list. Should you make a modification to the Standard Library, simply delete any old executable and object files to force a reassembly of the entire system.
The make program, by default, assumes that it will be processing a make file named "makefile". When you run the make program, it looks for "makefile" in the current directory. If it doesn't find this file, it complains and terminates3. Therefore, it is a good idea to collect the files for each project you work on into their own subdirectory and give each project its own makefile. Then to create an executable, you need only change into the appropriate subdirectory and run the make program.
Although this section discusses the make program in sufficient detail to handle most projects you will be working on, keep in mind that the make program provides considerable functionality that this chapter does not discuss. To learn more about the nmake.exe program, consult the the appropriate documentation. Note that several versions of MAKE exist. Microsoft produces nmake.exe, Borland has their own MAKE.EXE program and various versions of MAKE have been ported to Windows from UNIX systems (e.g., GMAKE). Linux users will typically employ the GNU make program. While these various make programs are not equivalent, they all do a pretty good job of handling the simple make syntax that this chapter describes.
9.7 Code Reuse
One of the principle goals of Software Engineering is to reduce program development time. Although the techniques we've studied in this chapter will certainly reduce development effort, there are bigger prizes to be had here. Consider for a moment a simple program that reads an integer from the user and then displays the value of that integer on the standard output device. You can easily write a trivial version of this program with about eight lines of HLA code. That's not too difficult. However, suppose you did not have the HLA Standard Library at your disposal. Now, instead of an eight line program, you'd be faced with writing a program that hundreds if not thousands of lines long. Obviously, this program will take a lot longer to write than the original eight-line version. The difference between these two applications is the fact that in the first version of this program you got to reuse some code that was already written; in the second version of the program you had to write everything from scratch. This concept of code reuse is very important when writing large programs - you can get large programs working much more quickly if you reuse code from previous projects.
The idea behind code reuse is that many code sequences you write will be usable in future programs. As time passes and you write more code, progress on your projects will be faster since you can reuse code you've written (or others have written) on previous projects. The HLA Standard Library functions are the classic example, somebody had to write those functions so you could use them. And use them you do. As of this writing, the Standard Library represented about 50,000 lines of HLA source code. Imagine having to write a fair portion of that everytime you wanted to write an HLA program!
Although the HLA Standard Library contains lots of very useful routines and functions, this code base cannot possible predict the type of code you will need in every future project. The HLA Standard Library provides some of the more common routines you'll need when writing programs, but you're certainly going to have need for routines that the HLA Standard Library cannot satisfy. Unless you can find a source for the code you need from some third party, you're probably going to have to write the new routines yourself.
The trick when writing a program is to try and figure out which routines are general purpose and could be used in future programs; once you make this determination, you should write such routines separately from the rest of your application (i.e., put them in a separate source file for compilation). By keeping them separate, you can use them in future projects. If "try and figure out which routines are general purpose..." sounds a bit difficult, well, you're right it is. Even after 30 years of Software Engineering research, no one has really figured out how to effectively reuse code. There are some obvious routines we can reuse (that's why there are "standard libraries") but it is quite difficult for the practicing engineer to successfully predict which routines s/he will need in the future and write these as separate modules.
Attempting to teach you how to decide which routines are worthy of saving for future programs and which are specific to your current application is well beyond the scope of this text. There are several Software Engineering texts out there that try to explain how to do this, but keep in mind that even after the publication of these texts, practicing engineers still have problems picking the right routines to save. Hopefully, as you gain experience, you will begin to recognize those routines that are worth keeping for future programs and those that aren't worth bothering with. This text will take the easy way out and assume that you know which routines you want to keep and which you don't.
9.8 Creating and Managing Libraries
Imagine that you've created a few hundred routines over the past couple of years and you would like to have the object code ready to link with any new projects you begin. You could move all this code into a single source file, stick in a bunch of EXTERNAL declarations, and then link the resulting object file with any new programs you write that can use the routines in your "library". Unfortunately, there are a couple of problems with this approach. Let's take a look at some of these problems.
Problem number one is that your library will grow to a fairly good size with time; if you put the source code to every routine in a single source file, small additions or changes to the file will require a complete recompilation of the whole library. That's clearly not what we want to do, based on what you've learned from this chapter.
Another problem with this "solution" is that whenever you link this object file to your new applications, you link in the entire library, not just the routines you want to use. This makes your applications unnecessarily large, especially if your library has grown. Were you to link your simple projects with the entire HLA Standard library, for example, the result would be positively huge.
A solution to both of the above problems is to compile each routine in a separate file and produce a unique object file for it. Unfortunately, with hundreds of routines you're going to wind up with hundreds of object files; any time you want to call a dozen or so library routines, you'd have to link your main application with a dozen or so object modules from your library. Clearly, this isn't acceptable either.
You may have noticed by now that when you link your applications with the HLA Standard Library, you only link with a single file: hlalib.lib (Windows) or hlalib.a (Linux). .LIB (library) and ".a" (archive) files are a collection of object files. When the linker processes a library file, it pulls out only the object files it needs, it does not link the entire file with your application. Hence you get to work with a single file and your applications don't grow unnecessarily large.
Linux provids the "ar" (archiver) program to manage library files. To use this program to combine several object files into a single ".a" file, you'd use a command line like the following:
ar -q library.a list_of_.o_files
For more information on this command, check out the man page on the "ar" program ("man ar").
9.9 Name Space Pollution
One problem with creating libraries with lots of different modules is name space pollution. A typical library module will have a #INCLUDE file associated with it that provides external definitions for all the routines, constants, variables, and other symbols provided in the library. Whenever you want to use some routines or other objects from the library, you would typically #INCLUDE the library's header file in your project. As your libraries get larger and you add more declarations in the header file, it becomes more and more likely that the names you've chosen for your library's identifiers will conflict with names you want to use in your current project. This conflict is what is meant by name space pollution: library header files pollute the name space with many names you typically don't need in order to gain easy access to the few routines in the library you actually use. Most of the time those names don't harm anything - unless you want to use those names yourself in your program.
HLA requires that you declare all external symbols at the global (PROGRAM/UNIT) level. You cannot, therefore, include a header file with external declarations within a procedure4. As such, there will be no naming conflicts between external library symbols and symbols you declare locally within a procedure; the conflicts will only occur between the external symbols and your global symbols. While this is a good argument for avoiding global symbols as much as possible in your program, the fact remains that most symbols in an assembly language program will have global scope. So another solution is necessary.
HLA's solution, which it certainly uses in the Standard Library, is to put most of the library names in a NAMESPACE declaration section. A NAMESPACE declaration encapsulates all declarations and exposes only a single name (the NAMESPACE identifier) at the global level. You access the names within the NAMESPACE by using the familiar dot-notation (see "Namespaces" on page 432). This reduces the effect of namespace pollution from many dozens or hundreds of names down to a single name.
Of course, one disadvantage of using a NAMESPACE declaration is that you have to type a longer name in order to reference a particular identifier in that name space (i.e., you have to type the NAMESPACE identifier, a period, and then the specific identifier you wish to use). For a few identifiers you use frequently, you might elect to leave those identifiers outside of any NAMESPACE declaration. For example, the HLA Standard Library does not define the symbols malloc, free, or nl (among others) within a NAMESPACE. However, you want to minimize such declarations in your libraries to avoid conflicts with names in your own programs. Often, you can choose a NAMESPACE identifier to complement your routine names. For example, the HLA Standard Libraries string copy routine was named after the equivalent C Standard Library function, strcpy. HLA's version is str.cpy. The actual function name is cpy; it happens to be a member of the str NAMESPACE, hence the full name str.cpy which is very similar to the comparable C function. The HLA Standard Library contains several examples of this convention. The arg.c and arg.v functions are another pair of such identifiers (corresponding to the C identifiers argc and argv).
Using a NAMESPACE in a header file is no different than using a NAMESPACE in a PROGRAM or UNIT. Here's an example of a typical header file containing a NAMESPACE declaration:
// myHeader.hhf - // // Routines supported in the myLibrary.lib file. namespace myLib; procedure func1; external; procedure func2; external; procedure func3; external; end myLib;Typically, you would compile each of the functions (func1..func3) as separate units (so each has its own object file and linking in one function doesn't link them all). Here's what a sample UNIT declaration for one of these functions:
unit func1Unit; #includeonce( "myHeader.hhf" ) procedure myLib.func1; begin func1; << code for func1 >> end func1; end func1Unit;You should notice two important things about this unit. First, you do not put the actual func1 procedure code within a NAMESPACE declaration block. By using the identifier myLib.func1 as the procedure's name, HLA automatically realizes that this procedure declaration belongs in a name space. The second thing to note is that you do not preface func1 with "myLib." after the BEGIN and END clauses in the procedure. HLA automatically associates the BEGIN and END identifiers with the PROCEDURE declaration, so it knows that these identifiers are part of the myLib name space and it doesn't make you type the whole name again.
Important note: when you declare external names within a name space, as was done in func1Unit above, HLA uses only the function name (func1 in this example) as the external name. This creates a name space pollution problem in the external name space. For example, if you have two different name spaces, myLib and yourLib and they both define a func1 procedure, the linker will complain about a duplicate definition for func1 if you attempt to use functions from both these library modules. There is an easy work-around to this problem: use the extended form of the EXTERNAL directive to explicitly supply an external name for all external identifiers appearing in a NAMESPACE declaration. For example, you could solve this problem with the following simple modification to the myHeader.hhf file above:
// myHeader.hhf - // // Routines supported in the myLibrary.lib file. namespace myLib; procedure func1; external( "myLib_func1" ); procedure func2; external( "myLib_func2" ); procedure func3; external( "myLib_func3" ); end myLib;This example demonstrates an excellent convention you should adopt: when exporting names from a name space, always supply an explicit external name and construct that name by concatenating the NAMESPACE identifier with an underscore and the object's internal name.
The use of NAMESPACE declarations does not completely eliminate the problems of name space pollution (after all, the name space identifier is still a global object, as anyone who has included stdlib.hhf and attempted to define a "cs" variable can attest), but NAMESPACE declarations come pretty close to eliminating this problem. Therefore, you should use NAMESPACE everywhere practical when creating your own libraries.
9.10 Putting It All Together
Managing large projects is considerably easier if you break your program up into separate modules and work on them independently. In this chapter you learned about HLA's UNITs, include files, and the EXTERNAL directive. These provide the tools you need to break a program up into smaller modules. In addition to HLA's facilities, you'll also use a separate tool, nmake.exe, to automatically compile and link only those files that are necessary in a large project.
This chapter provided a very basic introduction to the use of makefiles and the make utility. Note that the MAKE programs are quite sophisticated. The presentation of the make program in this chapter barely scratches the surface. If you're interested in more information about MAKE facilities you should consult one of the excellent texts available on this subject. Lots of good information is also available on the Internet (just use the usual search tools).
In addition to breaking up large HLA projects, UNITs are also the basis for letting you write assembly language functions that you can call from high level languages like C/C++ and Delphi/Kylix. A later volume in this text will describe how you can use UNITs for this purpose.
1Under Windows, Microsoft calls this program nmake. This text will use the more generic name "make" when refering to this program. If you are using Microsoft tools under Windows, just substitute "nmake" for "make" throughout this chapter.
2Obviously, if you only change comments or other statements in the source file that do not affect the executable file, a recompile or reassembly will not be necessary. To be safe, though, we will assume any change to the source file will require a reassembly.
3There is a command line option that lets you specify the name of the makefile. See the nmake documentation in the MASM manuals for more details.
4Or within an Iterator or Method, as you will see in later chapters.
|