xxxx
ptr
" coercion operator is an example of a type operator. MASM expressions possess two major attributes: a value and a type. The arithmetic, logical, and relational operators change an expression's value. The type operators change its type. The previous section demonstrated how the ptr
operator could change an expression's type. There are several additional type operators as well.
Operator | Syntax | Description |
---|---|---|
PTR | byte ptr expr word ptr expr dword ptr expr qword ptr expr tbyte ptr expr near ptr expr far ptr expr |
Coerce expr to point at a byte. Coerce expr to point at a word. Coerce expr to point at a dword. Coerce expr to point at a qword. Coerce expr to point at a tbyte. Coerce expr to a near value. Coerce expr to a far value. |
short | short expr | expr must be within ±128 bytes of the current jmp instruction (typically a JMP instruction). This operator forces the JMP instruction to be two bytes long (if possible). |
this | this type | Returns an expression of the specified type whose value is the current location counter. |
seg | seg label | Returns the segment address portion of label. |
offset | offset label | Returns the offset address portion of label. |
.type | type label | Returns a byte that indicates whether this symbol is a variable, statement label, or structure name. Superceded by opattr. |
opattr | opattr label | Returns a 16 bit value that gives information about label. |
length | length variable | Returns the number of array elements for a single dimension array. If a multi-dimension array, this operator returns the number of elements for the first dimension. |
lengthof | lengthof variable | Returns the number of items in array variable. |
type | type symbol | Returns a expression whose type is the same as symbol and whose value is the size, in bytes, for the specified symbol. |
size | size variable | Returns the number of bytes allocated for single dimension array variable. Useless for multi-dimension arrays. Superceded by sizeof. |
sizeof | sizeof variable | Returns the size, in bytes, of array variable. |
low | low expr | Returns the L.O. byte of expr. |
lowword | lowword expr | Returns the L.O. word of expr. |
high | high expr | Returns the H.O. byte of expr. |
highword | highword expr | Returns the H.O. word of expr. |
The short
operator works exclusively with the jmp
instruction. Remember, there are two jmp
direct near instructions, one that has a range of 128 bytes around the jmp
, one that has a range of 32,768 bytes around the current instruction. MASM will automatically generate a short jump if the target address is up to 128 bytes before the current instruction. This operator is mainly present for compatibility with old MASM (pre-6.0) code.
The this
operator forms an expression with the specified type whose value is the current location counter. The instruction mov bx, this word
, for example, will load the bx
register with the value 8B1Eh, the opcode for mov bx, memory
. The address this word
is the address of the opcode for this very instruction! You mostly use the this
operator with the equ
directive to give a symbol some type other than constant. For example, consider the following statement:
HERE equ this near
This statement assigns the current location counter value to HERE
and sets the type of HERE
to near. This, of course, could have been done much easier by simply placing the label HERE
: on the line by itself. However, the this
operator with the equ
directive does have some useful applications, consider the following:
WArray equ this word BArray byte 200 dup (?)
In this example the symbol BArray
is of type byte. Therefore, instructions accessing BArray
must contain byte operands throughout. MASM would flag a mov ax, BArray+8
instruction as an error. However, using the symbol WArray
lets you access the same exact memory locations (since WArray
has the value of the location counter immediately before encountering the byte
pseudo-opcode) so mov ax,WArray+8
accesses location BArray+8
. Note that the following two instructions are identical:
mov ax, word ptr BArray+8 mov ax, WArray+8
The seg
operator does two things. First, it extracts the segment portion of the specified address, second, it converts the type of the specified expression from address to constant. An instruction of the form mov ax, seg symbol
always loads the accumulator with the constant corresponding to the segment portion of the address of symbol
. If the symbol is the name of a segment, MASM will automatically substitute the paragraph address of the segment for the name. However, it is perfectly legal to use the seg
operator as well. The following two statements are identical if dseg
is the name of a segment:
mov ax, dseg mov ax, seg dseg
Offset
works like seg
, except it returns the offset portion of the specified expression rather than the segment portion. If VAR1
is a word variable, mov ax, VAR1
will always load the two bytes at the address specified by VAR1
into the ax
register. The mov ax, offset VAR1
instruction, on the other hand, loads the offset (address) of VAR1
into the ax
register. Note that you can use the lea
instruction or the mov
instruction with the offset
operator to load the address of a scalar variable into a 16 bit register. The following two instructions both load bx
with the address of variable J
:
mov bx, offset J lea bx, J
The lea
instruction is more flexible since you can specify any memory addressing mode, the offset
operator only allows a single symbol (i.e., displacement only addressing). Most programmers use the mov
form for scalar variables and the lea
instructor for other addressing modes. This is because the mov
instruction was faster on earlier processors.
One very common use for the seg
and offset
operators is to initialize a segment and pointer register with the segmented address of some object. For example, to load es:di
with the address of SomeVar,
you could use the following code:
mov di, seg SomeVar mov es, di mov di, offset SomeVar
Since you cannot load a constant directly into a segment register, the code above copies the segment portion of the address into di
and then copies di
into es
before copying the offset into di
. This code uses the di
register to copy the segment portion of the address into es
so that it will affect as few other registers as possible.
Opattr
returns a 16 bit value providing specific information about the expression that follows it. The .type
operator is an older version of opattr
that returns the L.O. eight bits of this value. Each bit in the value of these operators has the following meaning:
Bit(s) | Meaning |
---|---|
0 | References a label in the code segment if set. |
1 | References a memory variable or relocatable data object if set. |
2 | Is an immediate (absolute/constant) value if set. |
3 | Uses direct memory addressing if set. |
4 | Is a register name, if set. |
5 | References no undefined symbols and there is no error, if set. |
6 | Is an SS: relative reference, if set. |
7 | References an external name. |
8-10 | 000 - no language type 001 - C/C++ language type 010 - SYSCALL language type 011 - STDCALL language type 100 - Pascal language type 101 - FORTRAN language type 110 - BASIC language type |
The language bits are for programmers writing code that interfaces with high level languages like C++ or Pascal. Such programs use the simplified segment directives and MASM's HLL features.
You would normally use these values with MASM's conditional assembly directives and macros. This allows you to generate different instruction sequences depending on the type of a macro parameter or the current assembly configuration. For more details, see "Conditional Assembly" on page 397 and "Macros" on page 400.
The size
, sizeof
, length
, and lengthof
operators compute the sizes of variables (including arrays) and return that size and their value. You shouldn't normally use size
and length
. The sizeof
and lengthof
operators have superceded these operators. Size
and length
do not always return reasonable values for arbitrary operands. MASM 6.x includes them to remain compatible with older versions of the assembler. However, you will see an example later in this chapter where you can use these operators.
The sizeof variable
operator returns the number of bytes directly allocated to the specified variable. The following examples illustrate the point:
a1 byte ? ;SIZEOF(a1) = 1 a2 word ? ;SIZEOF(a2) = 2 a4 dword ? ;SIZEOF(a4) = 4 a8 real8 ? ;SIZEOF(a8) = 8 ary0 byte 10 dup (0) ;SIZEOF(ary0) = 10 ary1 word 10 dup (10 dup (0)) ;SIZEOF(ary1) = 200
You can also use the sizeof
operator to compute the size, in bytes, of a structure or other data type. This is very useful for computing an index into an array using the formula from Chapter Four:
Element_Address := base_address + index*Element_Size
You may obtain the element size of an array or structure using the sizeof
operator. So if you have an array of structures, you can compute an index into the array as follows:
.286 ;Allow 80286 instructions. s struct <some number of fields> s ends . . . array s 16 dup ({}) ;An array of 16 "s" elements . . . imul bx, I, sizeof s ;Compute BX := I * elementsize mov al, array[bx].fieldname
You can also apply the sizeof
operator to other data types to obtain their size in bytes. For example, sizeof byte
returns 1, sizeof word
returns two, and sizeof dword
returns 4. Of course, applying this operator to MASM's built-in data types is questionable since the size of those objects is fixed. However, if you create your own data types using typedef
, it makes perfect sense to compute the size of the object using the sizeof
operator:
integer typedef word Array integer 16 dup (?) . . . imul bx, bx, sizeof integer . . .
In the code above, sizeof integer
would return two, just like sizeof word
. However, if you change the typedef
statement so that integer
is a dword
rather than a word
, the sizeof integer
operand would automatically change its value to four to reflect the new size of an integer.
The lengthof
operator returns the total number of elements in an array. For the Array
variable above, lengthof Array
would return 16. If you have a two dimensional array, lengthof
returns the total number of elements in that array.
When you use the lengthof
and sizeof
operators with arrays, you must keep in mind that it is possible for you to declare arrays in ways that MASM can misinterpret. For example, the following statements all declare arrays containing eight words:
A1 word 8 dup (?) A2 word 1, 2, 3, 4, 5, 6, 7, 8 ; Note: the "\" is a "line continuation" symbol. It tells MASM to append ; the next line to the end of the current line. A3 word 1, 2, 3, 4, \ 5, 6, 7, 8 A4 word 1, 2, 3, 4 word 5, 6, 7, 8
Applying the sizeof
and lengthof
operators to A1
, A2
, and A3
produces sixteen (sizeof) and eight (lengthof). However, sizeof(A4)
produces eight and lengthof(A4)
produces four. This happens because MASM thinks that the arrays begin and end with a single data declaration. Although the A4
declaration sets aside eight consecutive words, just like the other three declarations above, MASM thinks that the two word directives declare two separate arrays rather than a single array. So if you want to initialize the elements of a large array or a multidimensional array and you also want to be able to apply the lengthof
and sizeof
operators to that array, you should use A3
's form of declaration rather than A4
's.
The type
operator returns a constant that is the number of bytes of the specified operand. For example, type(word)
returns the value two. This revelation, by itself, isn't particularly interesting since the size
and sizeof
operators also return this value. However, when you use the type operator with the comparison operators (eq, ne, le, lt, gt, and ge), the comparison produces a true result only if the types of the operands are the same. Consider the following definitions:
Integer typedef word J word ? K sword ? L integer ? M word ? byte type (J) eq word ;value = 0FFh byte type (J) eq sword ;value = 0 byte type (J) eq type (L) ;value = 0FFh byte type (J) eq type (M) ;value = 0FFh byte type (L) eq integer ;value = 0FFh byte type (K) eq dword ;value = 0
Since the code above typedef
'd Integer
to word
, MASM treats integers and words as the same type. Note that with the exception of the last example above, the value on either side of the eq
operator is two. Therefore, when using the comparison operations with the type
operator, MASM compares more than just the value. Therefore, type
and sizeof
are not synonymous. E.g.,
byte type (J) eq type (K) ;value = 0 byte (sizeof J) equ (sizeof K) ;value = 0FFh
The type
operator is especially useful when using MASM's conditional assembly directives. See "Conditional Assembly" on page 397 for more details.
The examples above also demonstrate another interesting MASM feature. If you use a type name within an expression, MASM treats it as though you'd entered "type(name)
" where name is a symbol of the given type. In particular, specifying a type name returns the size, in bytes, of an object of that type. Consider the following examples:
Integer typedef word s struct d dword ? w word ? b byte ? s ends byte word ;value = 2 byte sword ;value = 2 byte byte ;value = 1 byte dword ;value = 4 byte s ;value = 7 byte word eq word ;value = 0FFh byte word eq sword ;value = 0 byte b eq dword ;value = 0 byte s eq byte ;value = 0 byte word eq Integer ;value = 0FFh
The high
and low
operators, like offset
and seg
, change the type of expression from whatever it was to a constant. These operators also affect the value of the expression - they decompose it into a high order byte and a low order byte. The high
operator extracts bits eight through fifteen of the expression, the low
operator extracts and returns bits zero through seven. Highword
and lowword
extract the H.O. and L.O. 16 bits of an expression:
You can extract bits 16-23 and 24-31 using expressions of the form low
( highword
( expr )) and high
( highword
( expr )), respectively.
Precedence | Operators |
---|---|
(Highest) | |
1 | length, lengthof, size, sizeof, ( ), [ ], < > |
2 | . (structure field name operator) |
3 | CS: DS: ES: FS: GS: SS: (Segment override prefixes) |
4 | ptr offset set type opattr this |
5 | high, low, highword, lowword |
6 | + - (unary) |
7 | * / mod shl shr |
8 | + - (binary) |
9 | eq ne lt le gt ge |
10 | not |
11 | and |
12 | or xor |
13 | short .type |
(Lowest) |
Parentheses should only surround expressions. Some operators, like sizeof
and lengthof
, require type names, not expressions. They do not allow you to put parentheses around the name. Therefore, "(sizeof X)
" is legal, but "sizeof(X)
" is not. Keep this in mind when using parentheses to override operator precedence in an expression. If MASM generates an error, you may need to rearrange the parentheses in your expression.
As is true for expressions in a high level language, it is a good idea to always use parentheses to explicitly state the precedence in all complex address expressions (complex meaning that the expression has more than one operator). This generally makes the expression more readable and helps avoid precedence related bugs.