12.4 Programming in C/C++ and HLA
Unlike Delphi, that has only a single vendor, there are many different C/C++ compilers available on the market. Each vendor (Microsoft, Borland, Watcom, GNU, etc.) has their own ideas about how C/C++ should interface to external code. Many vendors have their own extensions to the C/C++ language to aid in the interface to assembly and other languages. For example, Borland provides a special keyword to let Borland C++ (and C++ Builder) programmers call Pascal code (or, conversely, allow Pascal code to call the C/C++ code). Microsoft, who stopped making Pascal compilers years ago, no longer supports this option. This is unfortunate since HLA uses the Pascal calling conventions. Fortunately, HLA provides a special interface to code that C/C++ systems generate.
Before we get started discussing how to write HLA modules for your C/C++ programs, you must understand two very important facts:
- HLA's exception handling facilities are not directly compatible with C/C++'s exception handling facilities. This means that you cannot use the TRY..ENDTRY and RAISE statements in the HLA code you intend to link to a C/C++ program. This also means that you cannot call library functions that contain such statements. Since the HLA Standard Library modules use exception handling statements all over the place, this effectively prevents you from calling HLA Standard Library routines from the code you intend to link with C/C++1.
Given the rich set of language features that C/C++ supports, it should come as no surprise that the interface between the C/C++ language and assembly language is somewhat complex. Fortunately there are two facts that reduce this problem. First, HLA (v1.26 and later) supports C/C++'s calling conventions. Second, the other complex stuff you won't use very often, so you may not have to bother with it.
- Note: the following sections assume you are already familiar with C/C++ programming. They make no attempt to explain C/C++ syntax or features other than as needed to explain the C/C++ assembly language interface. If you're not familiar with C/C++, you will probably want to skip this section.
- Also note: although this text uses the generic term "C/C++" when describing the interface between HLA and various C/C++ compilers, the truth is that you're really interfacing HLA with the C language. There is a fairly standardized interface between C and assembly language that most vendors follow. No such standard exists for the C++ language and every vendor, if they even support an interface between C++ and assembly, uses a different scheme. In this text we will stick to interfacing HLA with the C language. Fortunately, all popular C++ compilers support the C interface to assembly, so this isn't much of a problem.
The examples in this text will use the GNU C++ compiler. There may be some minor adjustments you need to make if you're using some other C/C++ compiler; please see the vendor's documentation for more details.
12.4.1 Linking HLA Modules With C/C++ Programs
One big advantage of C/C++ over Delphi is that (most) C/C++ compiler vendors' products emit standard object files. So, working with object files and a true linker is much nicer than having to deal with Delphi's built-in linker. As nice as the Delphi system is, interfacing with assembly language is much easier in C/C++ than in Delphi.
The difference between a C and a C++ compilation occurs in the external declarations for the functions you intend to write in assembly language. For example, in a C source file you would simply write:
extern char* RetHW( void );However, in a C++ environment, you would need the following external declaration:
extern "C" { extern char* RetHW( void ); };The `extern "C"' clause tells the compiler to use standard C linkage even though the compiler is processing a C++ source file (C++ linkage is different than C and definitely far more complex; this text will not consider pure C++ linkage since it varies so much from vendor to vendor).
The following sample program demonstrates this external linkage mechanism by writing a short HLA program that returns the address of a string ("Hello World") in the EAX register (like Delphi, C/C++ expects functions to return their results in EAX). The main C/C++ program then prints this string to the console device.
#include <stdlib.h> #include "ratc.h" extern "C" { extern char* ReturnHW( void ); }; int main() _begin( main ) printf( "%s\n", ReturnHW() ); _return 0; _end( main ) Program 12.17 Cex1 - A Simple Example of a Call to an Assembly Function from C++ unit ReturnHWUnit; procedure ReturnHW; external( "_ReturnHW" ); procedure ReturnHW; nodisplay; noframe; noalignstk; begin ReturnHW; lea( eax, "Hello World" ); ret(); end ReturnHW; end ReturnHWUnit; Program 12.18 RetHW.hla - Assembly Code that Cex1 CallsThere are several new things in both the C/C++ and HLA code that might confuse you at first glance, so let's discuss these things real quick here.
The first strange thing you will notice in the C++ code is the #include "ratc.h" statement. RatC is a C/C++ macro library that adds several new features to the C++ language. RatC adds several interesting features and capabilities to the C/C++ language, but a primary purpose of RatC is to help make C/C++ programs a little more readable. Of course, if you've never seen RatC before, you'll probably argue that it's not as readable as pure C/C++, but even someone who has never seen RatC before can figure out 80% of Ratc within a minutes. In the example above, the _begin and _end clauses clearly map to the "{" and "}" symbols (notice how the use of _begin and _end make it clear what function or statement associates with the braces; unlike the guesswork you've got in standard C). The _return statement is clearly equivalent to the C return statement. As you'll quickly see, all of the standard C control structures are improved slightly in RatC. You'll have no trouble recognizing them since they use the standard control structure names with an underscore prefix. This text promotes the creation of readable programs, hence the use of RatC in the examples appearing in this chapter2. You can find out more about RatC on Webster at http://webster.cs.ucr.edu.
The C/C++ program isn't the only source file to introduce something new. If you look at the HLA code you'll notice that the LEA instruction appears to be illegal. It takes the following form:
lea( eax, "Hello World" );The LEA instruction is supposed to have a memory and a register operand. This example has a register and a constant; what is the address of a constant, anyway? Well, this is a syntactical extension that HLA provides to 80x86 assembly language. If you supply a constant instead of a memory operand to LEA, HLA will create a static (readonly) object initialized with that constant and the LEA instruction will return the address of that object. In this example, HLA will emit the string to the constants segment and then load EAX with the address of the first character of that string. Since HLA strings always have a zero terminating byte, EAX will contain the address of a zero-terminated string which is exactly what C++ wants. If you look back at the original C++ code, you will see that RetHW returns a char* object and the main C++ program displays this result on the console device.
If you haven't figured it out yet, this is a round-about version of the venerable "Hello World" program.
Microsoft VC++ users can compile this program from the command line by using the following commands3:
hla -c RetHW.hla // Compiles and assembles RetHW.hla to RetHW.obj cl Cex1.cpp RetHW.obj // Compiles C++ code and links it with RetHW.objIf you're a Borland C++ user, you'd use the following command sequence:
hla -o:omf RetHW.hla // Compile HLA file to an OMF file. bcc32i Cex1.cpp RetHW.obj // Compile and link C++ and assembly code. // Could also use the BCC32 compiler.GCC users can compile this program from the command line by using the following commands:
hla -o:omf RetHW.hla // Compile HLA file to an OMF file. bcc32i Cex1.cpp RetHW.obj // Compile and link C++ and assembly code. // Could also use the BCC32 compiler.12.4.2 Register Preservation
Unlike Delphi, a single language with a single vendor, there is no single list of registers that you can freely use as scratchpad values within an assembly language function. The list changes by vendor and even changes between versions from the same vendor. However, you can safely assume that EAX is available for scratchpad use since C functions return their result in the EAX register. You should probably preserve everything else.
12.4.3 Function Results
C/C++ compilers universally seem to return ordinal and pointer function results in AL, AX, or EAX depending on the operand's size. The compilers probably return floating point results on the top of the FPU stack as well. Other than that, check your C/C++ vendor's documentation for more details on function return locations.
12.4.4 Calling Conventions
The standard C/C++ calling convention is probably the biggest area of contention between the C/C++ and HLA languages. VC++ and BCC both support multiple calling conventions. BCC even supports the Pascal calling convention that HLA uses, making it trivial to write HLA functions for BCC programs4. However, before we get into the details of these other calling conventions, it's probably a wise idea to first discuss the standard C/C++ calling convention.
Both VC++ and BCC decorate the function name when you declare an external function. For external "C" functions, the decoration consists of an underscore. If you look back at Program 12.18 you'll notice that the external name the HLA program actually uses is "_RetHW" rather than simply "RetHW". The HLA program itself, of course, uses the symbol "RetHW" to refer to the function, but the external name (as specified by the optional parameter to the EXTERNAL option) is "_RetHW". In the C/C++ program (Program 12.17) there is no explicit indication of this decoration; you simply have to read the compiler documentation to discover that the compiler automatically prepends this character to the function name5. Fortunately, HLA's EXTERNAL option syntax allows us to undecorate the name, so we can refer to the function using the same name as the C/C++ program. Name decoration is a trivial matter, easily fixed by HLA.
A big problem is the fact that C/C++ pushes parameters on the stack in the opposite direction of just about every other (non-C based) language on the planet; specifically, C/C++ pushes actual parameters on the stack from right to left instead of the more common left to right. This means that you cannot declare a C/C++ function with two or more parameters and use a simple translation of the C/C++ external declaration as your HLA procedure declaration, i.e., the following are not equivalent:
external void CToHLA( int p, unsigned q, double r ); procedure CToHLA( p:int32; q:uns32; r:real64 ); external;Were you to call CToHLA from the C/C++ program, the compiler would push the r parameter first, the q parameter second, and the p parameter third - exactly the opposite order that the HLA code expects. As a result, the HLA code would use the L.O. double word of r as p's value, the H.O. double word of r as q's value, and the combination of p and q's values as the value for r. Obviously, you'd most likely get an incorrect result from this calculation. Fortunately, there's an easy solution to this problem: use the @CDECL procedure option in the HLA code to tell it to reverse the parameters:
procedure CToHLA( p:int32; q:uns32; r:real64 ); @cdecl; external;Now when the C/C++ code calls this procedure, it push the parameters on the stack and the HLA code will retrieve them in the proper order.
There is another big difference between the C/C++ calling convention and HLA: HLA procedures automatically clean up after themselves by removing all parameters pass to a procedure prior to returning to the caller. C/C++, on the other hand, requires the caller, not the procedure, to clean up the parameters. This has two important ramifications: (1) if you call a C/C++ function (or one that uses the C/C++ calling sequence), then your code has to remove any parameters it pushed upon return from that function; (2) your HLA code cannot automatically remove parameter data from the stack if C/C++ code calls it. The @CDECL procedure option tells HLA not to generate the code that automatically removes parameters from the stack upon return. Of course, if you use the @NOFRAME option, you must ensure that you don't remove these parameters yourself when your procedures return to their caller.
One thing HLA cannot handle automatically for you is removing parameters from the stack when you call a procedure or function that uses the @CDECL calling convention; for example, you must manually pop these parameters whenever you call a C/C++ function from your HLA code.
Removing parameters from the stack when a C/C++ function returns to your code is very easy, just execute an "add( constant, esp );" instruction where constant is the number of parameter bytes you've pushed on the stack. For example, the CToHLA function has 16 bytes of parameters (two int32 objects and one real64 object) so the calling sequence (in HLA) would look something like the following:
CToHLA( pVal, qVal, rVal ); // Assume this is the macro version. add( 16, esp ); // Remove parameters from the stack.Cleaning up after a call is easy enough. However, if you're writing the function that must leave it up to the caller to remove the parameters from the stack, then you've got a tiny problem - by default, HLA procedures always clean up after themselves. If you use the @CDECL option and don't specify the @NOFRAME option, then HLA automatically handles this for you. However, if you use the @NOFRAME option, then you've got to ensure that you leave the parameter data on the stack when returning from a function/procedure that uses the @CDECL calling convention.
If you want to leave the parameters on the stack for the caller to remove, then you must write the standard entry and exit sequences for the procedure that build and destroy the activation record (see "The Standard Entry Sequence" on page 683 and "The Standard Exit Sequence" on page 684). This means you've got to use the @NOFRAME (and @NODISPLAY) options on your procedures that C/C++ will call. Here's a sample implementation of the CToHLA procedure that builds and destroys the activation record:
procedure _CToHLA( rValue:real64; q:uns32; p:int32 ); @nodisplay; @noframe; begin _CToHLA; push( ebp ); // Standard Entry Sequence mov( esp, ebp ); // sub( _vars_, esp ); // Needed if you have local variables. . . // Code to implement the function's body. . mov( ebp, esp ); // Restore the stack pointer. pop( ebp ); // Restore link to previous activation record. ret(); // Note that we don't remove any parameters. end _CToHLA;12.4.5 Pass by Value and Reference in C/C++
A C/C++ program can pass parameters to a procedure or function using one of two different mechanisms: pass by value and pass by reference. Since pass by reference parameters use pointers, this parameter passing mechanism is completely compatible between HLA and C/C++. The following two lines provide an external declaration in C++ and the corresponding external (public) declaration in HLA for a pass by reference parameter using the calling convention:
extern void HasRefParm( int& refparm ); // C++ procedure HasRefParm( var refparm: int32 ); external; // HLALike HLA, C++ will pass the 32-bit address of whatever actual parameter you specify when calling the HasRefParm procedure. Don't forget, inside the HLA code, that you must dereference this pointer to access the actual parameter data. See the chapter on Intermediate Procedures for more details (see "Pass by Reference" on page 687).
Like HLA, C++ lets you pass untyped parameters by reference. The syntax to achieve this in C++ is the following:
extern void UntypedRefParm( void* parm1 );Actually, this is not a reference parameter, but a value parameter with an untyped pointer.
In HLA, you can use the VAR keyword as the data type to specify that you want an untyped reference parameter. Here's the corresponding prototype for the UntypedRefParm procedure in HLA:
procedure UntypedRefParm( var parm1:var ); external;12.4.6 Scalar Data Type Correspondence Between C/C++ and HLA
When passing parameters between C/C++ and HLA procedures and functions, it's very important that the calling code and the called code agree on the basic data types for the parameters. In this section we will draw a correspondence between the C/C++ scalar data types and the HLA (v1.x) data types.
Assembly language supports any possible data format, so HLA's data type capabilities will always be a superset of C/C++'s. Therefore, there may be some objects you can create in HLA that have no counterpart in C/C++, but the reverse is not true. Since the assembly functions and procedures you write are generally manipulating data that C/C++ provides, you don't have to worry too much about not being able to process some data passed to an HLA procedure by C/C++.
C/C++ provides a wide range of different integer data types. Unfortunately, the exact representation of these types is implementation specific. The following table lists the C/C++ types as currently implemented by Borland C++ and Microsoft VC++. This table may very well change as 64-bit compilers become available.
In addition to the integer values, C/C++ supports several non-integer ordinal types. The following table provides their HLA equivalents:
Like the integer types, C/C++ supports a wide range of real numeric formats. The following table presents these types and their HLA equivalents.
Table 6: Real Types in C/C++ and HLA C/C++ HLA Range Minimum Maximum double real64 5.0 E-324 1.7 E+308 float real32 1.5 E-45 3.4 E+38
This data type is 80 bits only in BCC. VC++ uses 64 bits for the long double type.The last scalar type of interest is the pointer type. Both HLA and C/C++ use a 32-bit address to represent pointers, so these data types are completely equivalent in both languages.
12.4.7 Passing String Data Between C/C++ and HLA Code
C/C++ uses zero terminated strings. Algorithms that manipulate zero-terminated strings are not as efficient as functions that work on length-prefixed strings; on the plus side, however, zero-terminated strings are very easy to work with. HLA's strings are downwards compatible with C/C++ strings since HLA places a zero byte at the end of each HLA string. Since you'll probably not be calling HLA Standard Library string routines, the fact that C/C++ strings are not upwards compatible with HLA strings generally won't be a problem. If you do decide to modify some of the HLA string functions so that they don't raise exceptions, you can always translate the str.cStrToStr function that translates zero-terminated C/C++ strings to HLA strings.
A C/C++ string variable is typically a char* object or an array of characters. In either case, C/C++ will pass the address of the first character of the string to an external procedure whenever you pass a string as a parameter. Within the procedure, you can treat the parameter as an indirect reference and dereference to pointer to access characters within the string.
12.4.8 Passing Record/Structure Data Between HLA and C/C++
Records in HLA are (mostly) compatible with C/C++ structs. You can easily translate a C/C++ struct to an HLA record. In this section we'll explore how to do this and learn about the incompatibilities that exist between HLA records and C/C++ structures.
For the most part, translating C/C++ records to HLA is a no brainer. Just grab the "guts" of a structure declaration and translate the declarations to HLA syntax within a RECORD..ENDRECORD block and you're done.
Consider the following C/C++ structure type declaration:
typedef struct { unsigned char day; unsigned char month; int year; unsigned char dayOfWeek; } dateType;The translation to an HLA record is, for the most part, very straight-forward. Just translate the field types accordingly and use the HLA record syntax (see "Records" on page 419) and you're in business. The translation is the following:
type recType: record day: byte; month: byte; year:int32; dayOfWeek:byte; endrecord;There is one minor problem with this example: data alignment. Depending on your compiler and whatever defaults it uses, C/C++ might not pack the data in the structure as compactly as possible. Some C/C++ compilers will attempt to align the fields on double word or other boundaries. With double word alignment of objects larger than a byte, the previous C/C++ typedef statement is probably better modelled by
type recType: record day: byte; month: byte; padding:byte[2]; // Align year on a four-byte boundary. year:int32; dayOfWeek:byte; morePadding: byte[3]; // Make record an even multiple of four bytes. endrecord;Of course, a better solution is to use HLA's ALIGN directive to automatically align the fields in the record:
type recType: record day: byte; month: byte; align( 4 ); // Align year on a four-byte boundary. year:int32; dayOfWeek:byte; align(4); // Make record an even multiple of four bytes. endrecord;Alignment of the fields is good insofar as access to the fields is faster if they are aligned appropriately. However, aligning records in this fashion does consume extra space (five bytes in the examples above) and that can be expensive if you have a large array of records whose fields need padding for alignment.
You will need to check your compiler vendor's documentation to determine whether it packs or pads structures by default. Most compilers give you several options for packing or padding the fields on various boundaries. Padded structures might be a bit faster while packed structures (i.e., no padding) are going to be more compact. You'll have to decide which is more important to you and then adjust your HLA code accordingly.
Note that by default, C/C++ passes structures by value. A C/C++ program must explicitly take the address of a structure object and pass that address in order to simulate pass by reference. In general, if the size of a structure exceeds about 16 bytes, you should pass the structure by reference rather than by value.
12.4.9 Passing Array Data Between HLA and C/C++
Passing array data between some procedures written in C/C++ and HLA is little different than passing array data between two HLA procedures. First of all, C/C++ can only pass arrays by reference, never by value. Therefore, you must always use pass by reference inside the HLA code. The following code fragments provide a simple example:
int CArray[128][4]; extern void PassedArrray( int array[128][4] );type CArray: int32[ 128, 4]; procedure PassedArray( var ary: CArray ); external;As the above examples demonstrate, C/C++'s array declarations are similar to HLA's insofar as you specify the bounds of each dimension in the array.
C/C++ uses row-major ordering for arrays. So if you're accessing elements of a C/C++ multi-dimensional array in HLA code, be sure to use the row-major order computation (see "Row Major Ordering" on page 405).
12.5 Putting It All Together
Most real-world assembly code that is written consists of small modules that programmers link to programs written in other languages. Most languages provide some scheme for interfacing that language with assembly (HLA) code. Unfortunately, the number of interface mechanisms is sufficiently close to the number of language implementations to make a complete exposition of this subject impossible. In general, you will have to refer to the documentation for your particular compiler in order to learn sufficient details to successfully interface assembly with that language.
Fortunately, nagging details aside, most high level languages do share some common traits with respect to assembly language interface. Parameter passing conventions, stack clean up, register preservation, and several other important topics often apply from one language to the next. Therefore, once you learn how to interface a couple of languages to assembly, you'll quickly be able to figure out how to interface to others (given the documentation for the new language).
This chapter discusses the interface between the Delphi and C/C++ languages and assembly language. Although there are more popular languages out there (e.g., Visual Basic), Delphi and C/C++ introduce most of the concepts you'll need to know in order to interface a high level language with assembly language. Beyond that point, all you need is the documentation for your specific compiler and you'll be interfacing assembly with that language in no time.
1Note that the HLA Standard Library source code is available; feel free to modify the routines you want to use and remove any exception handling statements contained therein.
2If RatC really annoys you, just keep in mind that you've only got to look at a few RatC programs in this chapter. Then you can go back to the old-fashioned C code and hack to your heart's content!
3This text assumes you've executed the VCVARS32.BAT file that sets up the system to allow the use of VC++ from the command line.
4Microsoft used to support the Pascal calling convention, but when they stopped supporting their QuickPascal language, they dropped support for this option.
5Most compilers provide an option to turn this off if you don't want this to occur. We will assume that this option is active in this text since that's the standard for external C names.
|