1.5 The HLA TYPE Section
Let's say that you simply do not like the names that HLA uses for declaring byte, word, double word, real, and other variables. Let's say that you prefer Pascal's naming convention or, perhaps, C's naming convention. You want to use terms like integer, float, double, or whatever. If this were Pascal you could redefine the names in the type section of the program. With C you could use a #define or a typedef statement to accomplish the task. Well, HLA, like Pascal, has it's own TYPE statement that also lets you create aliases of these names. The following example demonstrates how to set up some C/C++/Pascal compatible names in your HLA programs:
type integer: int32; float: real32; double: real64; colors: byte;Now you can declare your variables with more meaningful statements like:
static i: integer; x: float; HouseColor: colors;If you are an Ada, C/C++, or FORTRAN programmer (or any other language, for that matter), you can pick type names you're more comfortable with. Of course, this doesn't change how the 80x86 or HLA reacts to these variables one iota, but it does let you create programs that are easier to read and understand since the type names are more indicative of the actual underlying types. One warning for C/C++ programmers: don't get too excited and go off and define an int data type. Unfortunately, INT is an 80x86 machine instruction (interrupt) and therefore, this is a reserved word in HLA.
The TYPE section is useful for much more than creating type isomorphism (that is, giving a new name to an existing type). The following sections will demonstrate many of the possible things you can do in the TYPE section.
1.6 ENUM and HLA Enumerated Data Types
In a previous section discussing constants and constant expressions, you saw the following example:
const TapeDAT := 1; const Tape8mm := TapeDAT + 1; const TapeQIC80 := Tape8mm + 1; const TapeTravan := TapeQIC80 + 1; const TapeDLT := TapeTravan + 1;This example demonstrates how to use constant expressions to develop a set of constants that contain unique, consecutive, values. There are, however, a couple of problems with this approach. First, it involves a lot of typing (and extra reading when reviewing this program). Second, it's very easy make a mistake when creating long lists of unique constants and reuse or skip some values. The HLA ENUM type provides a better way to create a list of constants with unique values.
ENUM is an HLA type declaration that lets you associate a list of names with a new type. HLA associates a unique value with each name (that is, it enumerates the list). The ENUM keyword typically appears in the TYPE section and you use it as follows:
type enumTypeID: enum { comma_separated_list_of_names };The symbol enumTypeID becomes a new type whose values are specified by the specified list of names. As a concrete example, consider the data type TapeDrives and a corresponding variable declaration of type TypeDrives:
type TapeDrives: enum{ TapeDAT, Tape8mm, TapeQIC80, TapeTravan, TapeDLT}; static BackupUnit: TapeDrives := TapeDAT; . . . mov( BackupUnit, al ); if( al = Tape8mm ) then ... endif; // etc.By default, HLA reserves one byte of storage for enumerated data types. So the BackupUnit variable will consume one byte of memory and you would typically use an eight-bit register to access it1. As for the constants, HLA associates consecutive uns8 constant values starting at zero with each of the enumerated identifiers. In the TapeDrives example, the tape drive identifiers would have the values TapeDAT=0, Tape8mm=1, TapeQIC80=2, TapeTravan=3, and TapeDLT=4. You may use these constants exactly as though you had defined them with these values in a CONST section.
1.7 Pointer Data Types
Some people refer to pointers as scalar data types, others refer to them as composite data types. This text will treat them as scalar data types even though they exhibit some tendencies of both scalar and composite data types.
Of course, the place to start is with the question "What is a pointer?" Now you've probably experienced pointers first hand in the Pascal, C, or Ada programming languages and you're probably getting worried right now. Almost everyone has a real bad experience when they first encounter pointers in a high level language. Well, fear not! Pointers are actually easier to deal with in assembly language. Besides, most of the problems you had with pointers probably had nothing to do with pointers, but rather with the linked list and tree data structures you were trying to implement with them. Pointers, on the other hand, have lots of uses in assembly language that have nothing to do with linked lists, trees, and other scary data structures. Indeed, simple data structures like arrays and records often involve the use of pointers. So if you've got some deep-rooted fear about pointers, well forget everything you know about them. You're going to learn how great pointers really are.
Probably the best place to start is with the definition of a pointer. Just exactly what is a pointer, anyway? Unfortunately, high level languages like Pascal tend to hide the simplicity of pointers behind a wall of abstraction. This added complexity (which exists for good reason, by the way) tends to frighten programmers because they don't understand what's going on.
Now if you're afraid of pointers, well, let's just ignore them for the time being and work with an array. Consider the following array declaration in Pascal:
M: array [0..1023] of integer;Even if you don't know Pascal, the concept here is pretty easy to understand. M is an array with 1024 integers in it, indexed from M[0] to M[1023]. Each one of these array elements can hold an integer value that is independent of all the others. In other words, this array gives you 1024 different integer variables each of which you refer to by number (the array index) rather than by name.
If you encountered a program that had the statement "M[0]:=100;" you probably wouldn't have to think at all about what is happening with this statement. It is storing the value 100 into the first element of the array M. Now consider the following two statements:
i := 0; (* Assume "i" is an integer variable *) M [i] := 100;You should agree, without too much hesitation, that these two statements perform the same exact operation as "M[0]:=100;". Indeed, you're probably willing to agree that you can use any integer expression in the range 0...1023 as an index into this array. The following statements still perform the same operation as our single assignment to index zero:
i := 5; (* assume all variables are integers*) j := 10; k := 50; m [i*j-k] := 100;"Okay, so what's the point?" you're probably thinking. "Anything that produces an integer in the range 0...1023 is legal. So what?" Okay, how about the following:
M [1] := 0; M [ M [1] ] := 100;Whoa! Now that takes a few moments to digest. However, if you take it slowly, it makes sense and you'll discover that these two instructions perform the exact same operation you've been doing all along. The first statement stores zero into array element M[1]. The second statement fetches the value of M[1], which is an integer so you can use it as an array index into M, and uses that value (zero) to control where it stores the value 100.
If you're willing to accept the above as reasonable, perhaps bizarre, but usable nonetheless, then you'll have no problems with pointers. Because m[1] is a pointer! Well, not really, but if you were to change "M" to "memory" and treat this array as all of memory, this is the exact definition of a pointer.
1.7.1 Using Pointers in Assembly Language
A pointer is simply a memory location whose value is the address (or index, if you prefer) of some other memory location. Pointers are very easy to declare and use in an assembly language program. You don't even have to worry about array indices or anything like that.
An HLA pointer is a 32 bit value that may contain the address of some other variable. If you have a dword variable p that contains $1000_0000, then p "points" at memory location $1000_0000. To access the dword that p points at, you could use code like the following:
mov( p, ebx ); // Load EBX with the value of pointer p. mov( [ebx], eax ); // Fetch the data that p points at.By loading the value of p into EBX this code loads the value $1000_0000 into EBX (assuming p contains $1000_0000 and, therefore, points at memory location $1000_0000). The second instruction above loads the EAX register with the word starting at the location whose offset appears in EBX. Since EBX now contains $1000_0000, this will load EAX from locations $1000_0000 through $1000_0003.
Why not just load EAX directly from location $1000_0000 using an instruction like "MOV( mem, EAX );" (assuming mem is at address $1000_0000)? Well, there are lots of reasons. But the primary reason is that this single instruction always loads EAX from location mem. You cannot change the location from which it loads EAX. The former instructions, however, always load EAX from the location where p is pointing. This is very easy to change under program control. In fact, the simple instruction "MOV( &mem2, p );" will cause those same two instructions above to load EAX from mem2 the next time they execute. Consider the following instructions:
mov( &i, p ); // Assume all variables are STATIC variables. . . . if( some_expression ) then mov( &j, p ); // Assume the code above skips this instruction and . // you get to the next instruction by jumping . // to this point from somewhere else. . endif; mov( p, ebx ); // Assume both of the above code paths wind up mov( [ebx], eax ); // down here.This short example demonstrates two execution paths through the program. The first path loads the variable p with the address of the variable i. The second path through the code loads p with the address of the variable j. Both execution paths converge on the last two MOV instructions that load EAX with i or j depending upon which execution path was taken. In many respects, this is like a parameter to a procedure in a high level language like Pascal. Executing the same instructions accesses different variables depending on whose address (i or j) winds up in p.
1.7.2 Declaring Pointers in HLA
Since pointers are 32 bits long, you could simply use the dword directive to allocate storage for your pointers. However, there is a much better way to do this: HLA provides the POINTER TO phrase specifically for declaring pointer variables. Consider the following example:
static b: byte; d: dword; pByteVar: pointer to byte := &b; pDWordVar: pointer to dword := &d;This example demonstrates that it is possible to initialize as well as declare pointer variables in HLA. Note that you may only take addresses of static variables (STATIC, READONLY, and STORAGE objects) with the address-of operator, so you can only initialize pointer variables with the addresses of static objects.
You can also define your own pointer types in the TYPE section of an HLA program. For example, if you often use pointers to characters, you'll probably want to use a TYPE declaration like the one in the following example:
type ptrChar: pointer to char; static cString: ptrChar;1.7.3 Pointer Constants and Pointer Constant Expressions
HLA allows two literal pointer constant forms: the address-of operator followed by the name of a static variable or the constant zero. In addition to these two literal pointer constants, HLA also supports simple pointer constant expressions.
The constant zero represents the NULL or NIL pointer, that is, an illegal address that does not exist2. Programs typically initialize pointers with NULL to indicate that a pointer has explicitly not been initialized. The HLA Standard Library predefines both the "NULL" and "nil" constants in the memory.hhf header file3.
In addition to simple address literals and the value zero, HLA allows very simple constant expressions wherever a pointer constant is legal. Pointer constant expressions take one of the two following forms:
&StaticVarName + PureConstantExpression &StaticVarName - PureConstantExpressionThe PureConstantExpression term is a numeric constant expression that does not involve any pointer constants. This type of expression produces a memory address that is the specified number of bytes before or after ("-" or "+", respectively) the StaticVarName variable in memory.
Since you can create pointer constant expressions, it should come as no surprise to discover that HLA lets you define manifest pointer constants in the CONST section. The following program demonstrates how you can do this.
program PtrConstDemo; #include( "stdlib.hhf" ); static b: byte := 0; byte 1, 2, 3, 4, 5, 6, 7; const pb:= &b + 1; begin PtrConstDemo; mov( pb, ebx ); mov( [ebx], al ); stdout.put( "Value at address pb = $", al, nl ); end PtrConstDemo; Program 1.5 Pointer Constant Expressions in an HLA ProgramUpon execution, this program prints the value of the byte just beyond b in memory (which contains the value $01).
1.7.4 Pointer Variables and Dynamic Memory Allocation
Pointer variables are the perfect place to store the return result from the HLA Standard Library malloc function. The malloc function returns the address of the storage it allocates in the EAX register; therefore, you can store the address directly into a pointer variable with a single MOV instruction immediately after a call to malloc:
type bytePtr: pointer to byte; var bPtr: bytePtr; . . . malloc( 1024 ); // Allocate a block of 1,024 bytes. mov( eax, bPtr ); // Store address of block in bPtr. . . . free( bPtr ); // Free the allocated block when done using it. . . .In addition to malloc and free, the HLA Standard Library provides a realloc procedure. The realloc routine takes two parameters, a pointer to a block of storage that malloc (or realloc) previously created, and a new size. If the new size is less than the old size, realloc releases the storage at the end of the allocated block back to the system. If the new size is larger than the current block, then realloc will allocate a new block and move the old data to the start of the new block, then free the old block.
Typically, you would use realloc to correct a bad guess about a memory size you'd made earlier. For example, suppose you want to read a set of values from the user but you won't know how many memory locations you'll need to hold the values until after the user has entered the last value. You could make a wild guess and then allocate some storage using malloc based on your estimate. If, during the input, you discover that your estimate was too low, simply call realloc with a larger value. Repeat this as often as required until all the input is read. Once input is complete, you can make a call to realloc to release any unused storage at the end of the memory block.
The realloc procedure uses the following calling sequence:
realloc( ExistingPointer, NewSize );Realloc returns a pointer to the newly allocated block in the EAX register.
One danger exists when using realloc. If you've made multiple copies of pointers into a block of storage on the heap and then call realloc to resize that block, all the existing pointers are now invalid. Effectively realloc frees the existing storage and then allocates a new block. That new block may not be in the same memory location at the old block, so any existing pointers (into the block) that you have will be invalid after the realloc call.
1.7.5 Common Pointer Problems
There are five common problems programmers encounter when using pointers. Some of these errors will cause your programs to immediately stop with a diagnostic message; other problems are more subtle, yielding incorrect results without otherwise reporting an error or simply affecting the performance of your program without displaying an error. These five problems are
- Using an uninitialized pointer
- Using a pointer that contains an illegal value (e.g., NULL)
- Continuing to use malloc'd storage after that storage has been free'd
- Failing to free storage once the program is done using it
- Accessing indirect data using the wrong data type.
The first problem above is using a pointer variable before you have assigned a valid memory address to the pointer. Beginning programmers often don't realize that declaring a pointer variable only reserves storage for the pointer itself, it does not reserve storage for the data that the pointer references. The following short program demonstrates this problem:
// Program to demonstrate use of // an uninitialized pointer. Note // that this program should terminate // with a Memory Access Violation exception. program UninitPtrDemo; #include( "stdlib.hhf" ); static // Note: by default, varibles in the // static section are initialized with // zero (NULL) hence the following // is actually initialized with NULL, // but that will still cause our program // to fail because we haven't initialized // the pointer with a valid memory address. Uninitialized: pointer to byte; begin UninitPtrDemo; mov( Uninitialized, ebx ); mov( [ebx], al ); stdout.put( "Value at address Uninitialized: = $", al, nl ); end UninitPtrDemo; Program 1.6 Uninitialized Pointer DemonstrationAlthough variables you declare in the STATIC section are, technically, initialized; static initialization still doesn't initialize the pointer in this program with a valid address.
Of course, there is no such thing as a truly uninitialized variable on the 80x86. What you really have are variables that you've explicitly given an initial value and variables that just happen to inherit whatever bit pattern was in memory when storage for the variable was allocated. Much of the time, these garbage bit patterns laying around in memory don't correspond to a valid memory address. Attempting to dereference such a pointer (that is, access the data in memory at which it points) raises a Memory Access Violation exception.
Sometimes, however, those random bits in memory just happen to correspond to a valid memory location you can access. In this situation, the CPU will access the specified memory location without aborting the program. Although to a naive programmer this situation may seem preferable to aborting the program, in reality this is far worse because your defective program continues to run with a defect without alerting you to the problem. If you store data through an uninitialized pointer, you may very well overwrite the values of other important variables in memory. This defect can produce some very difficult to locate problems in your program.
The second problem programmers have with pointers is storing invalid address values into a pointer. The first problem, above, is actually a special case of this second problem (with garbage bits in memory supplying the invalid address rather than you producing via a miscalculation). The effects are the same; if you attempt to dereference a pointer containing an invalid address you will either get a Memory Access Violation exception or you will access an unexpected memory location.
The third problem listed above is also known as the dangling pointer problem. To understand this problem, consider the following code fragment:
malloc( 256 ); // Allocate some storage. mov( eax, ptr ); // Save address away in a pointer variable. . . // Code that use the pointer variable "ptr". . free( ptr ); // Free the storage associated with "ptr". . . // Code that does not change the value in "ptr". . mov( ptr, ebx ); mov( al, [ebx] );In this example you will note that the program allocates 256 bytes of storage and saves the address of that storage away in the ptr variable. Then the code uses this block of 256 bytes for a while and frees the storage, returning it to the system for other uses. Note that calling free does not change the value of ptr in any way; ptr still points at the block of memory allocated by malloc earlier. Indeed, free does not change any data in this block, so upon return from free, ptr still points at the data stored into the block by this code. However, note that the call to free tells the system that this 256-byte block of memory is no longer needed by the program and the system can use this region of memory for other purposes. The free function cannot enforce that fact that you will never access this data again, you are simply promising that you won't. Of course, the code fragment above breaks this promise; as you can see in the last two instructions above the program fetches the value in ptr and accesses the data it points at in memory.
The biggest problem with dangling pointers is that you can get away with using them a good part of the time. As long as the system doesn't reuse the storage you've free'd, using a dangling pointer produces no ill effects in your program. However, with each new call to malloc, the system may decide to reuse the memory released by that previous call to free. When this happens, any attempt to dereference the dangling pointer may produce some unintended consequences. The problems range from reading data that has been overwritten (by the new, legal, use of the data storage), to overwriting the new data, to (the worst case) overwriting system heap management pointers (doing so will probably cause your program to crash). The solution is clear: never use a pointer value once you free the storage associated with that pointer.
Of all the problems, the fourth (failing to free allocated storage) will probably have the least impact on the proper operation of your program. The following code fragment demonstrates this problem:
malloc( 256 ); mov( eax, ptr ); . // Code that uses the data where ptr is pointing. . // This code does not free up the storage . // associated with ptr. malloc( 512 ); mov( eax, ptr ); // At this point, there is no way to reference the original // block of 256 bytes pointed at by ptr.In this example the program allocates 256 bytes of storage and references this storage using the ptr variable. At some later time, the program allocates another block of bytes and overwrites the value in ptr with the address of this new block. Note that the former value in ptr is lost. Since this address no longer exists in the program, there is no way to call free to return the storage for later use. As a result, this memory is no longer available to your program. While making 256 bytes of memory inaccessible to your program may not seem like a big deal, imagine now that this code is in a loop that repeats over and over again. With each execution of the loop the program loses another 256 bytes of memory. After a sufficient number of loop iterations, the program will exhaust the memory available on the heap. This problem is often called a memory leak because the effect is the same as though the memory bits were leaking out of your computer (yielding less and less available storage) during program execution4.
Memory leaks are far less damaging than using dangling pointers. Indeed, there are only two problems with memory leaks: the danger of running out of heap space (which, ultimately, may cause the program to abort, though this is rare) and performance problems due to virtual memory page swapping. Nevertheless, you should get in the habit of always free all storage once you are done using it. Note that when your program quits, the operating system reclaims all storage including the data lost via memory leaks. Therefore, memory lost via a leak is only lost to your program, not the whole system.
The last problem with pointers is the lack of type-safe access. HLA cannot and does not enforce pointer type checking. For example, consider the following program:
// Program to demonstrate use of // lack of type checking in pointer // accesses. program BadTypePtrDemo; #include( "stdlib.hhf" ); static ptr: pointer to char; cnt: uns32; begin BadTypePtrDemo; // Allocate sufficient characters // to hold a line of text input // by the user: malloc( 256 ); mov( eax, ptr ); // Okay, read the text a character // at a time by the user: stdout.put( "Enter a line of text: " ); stdin.flushInput(); mov( 0, cnt ); mov( ptr, ebx ); repeat stdin.getc(); // Read a character from the user. mov( al, [ebx] ); // Store the character away. inc( cnt ); // Bump up count of characters. inc( ebx ); // Point at next position in memory. until( stdin.eoln()); // Okay, we've read a line of text from the user, // now display the data: mov( ptr, ebx ); for( mov( cnt, ecx ); ecx > 0; dec( ecx )) do mov( [ebx], eax ); stdout.put( "Current value is $", eax, nl ); inc( ebx ); endfor; free( ptr ); end BadTypePtrDemo; Program 1.7 Type-Unsafe Pointer Access ExampleThis program reads in data from the user as character values and then displays the data as double word hexadecimal values. While a powerful feature of assembly language is that it lets you ignore data types at will and automatically coerce the data without any effort, this power is a two-edged sword. If you make a mistake and access indirect data using the wrong data type, HLA and the 80x86 may not catch the mistake and your program may produce inaccurate results. Therefore, you need to take care when using pointers and indirection in your programs that you use the data consistently with respect to data type.
1.8 Putting It All Together
This chapter contains an eclectic combination of subjects. It begins with a discussion of the INTMUL, BOUND, and INTO instructions that will prove useful throughout this text. Then this chapter discusses how to declare constants and data types, including enumerated data types. This chapter also introduces constant expressions and pointers. The following chapters in this text will make extensive use of these concepts.
1HLA provides a mechanism by which you can specify that enumerated data types consume two or four bytes of memory. See the HLA documentation for more details.
2Actually, address zero does exist, but if you try to access it under Windows or Linux you will get a general protection fault.
3NULL is for C/C++ programmers and nil is familiar to Pascal/Delphi programmers.
4Note that the storage isn't lost from you computer; once your program quits it returns all memory (including unfree'd storage) to the O/S. The next time the program runs it will start with a clean slate.
|