if
statement, loops, and subroutine invocation (a call). Since compilers reduce all other languages to assembly language, it should come as no surprise that assembly language supports the instructions necessary to implement these control structures. 80x86 program control instructions belong to three groups: unconditional transfers, conditional transfers, and subroutine call and return instructions. The following sections describe these instructions:The jmp
(jump) instruction unconditionally transfers control to another point in the program. There are six forms of this instruction: an intersegment/direct jump, two intrasegment/direct jumps, an intersegment/indirect jump, and two intrasegment/indirect jumps. Intrasegment jumps are always between statements in the same code segment. Intersegment jumps can transfer control to a statement in a different code segment.
These instructions generally use the same syntax, it is
jmp target
The assembler differentiates them by their operands:
jmp disp8 ;direct intrasegment, 8 bit displacement. jmp disp16 ;direct intrasegment, 16 bit displacement. jmp adrs32 ;direct intersegment, 32 bit segmented address. jmp mem16 ;indirect intrasegment, 16 bit memory operand. jmp reg16 ;register indirect intrasegment. jmp mem32 ;indirect intersegment, 32 bit memory operand.
Intersegment is a synonym for far, intrasegment is a synonym for near.
The two direct intrasegment jumps differ only in their length. The first form consists of an opcode and a single byte displacement. The CPU sign extends this displacement to 16 bits and adds it to the ip
register. This instruction can branch to a location -128..+127 from the beginning of the next instruction following it (i.e., -126..+129 bytes around the current instruction).
The second form of the intrasegment jump is three bytes long with a two byte displacement. This instruction allows an effective range of -32,768..+32,767 bytes and can transfer control to anywhere in the current code segment. The CPU simply adds the two byte displacement to the ip
register.
These first two jumps use a relative addressing scheme. The offset encoded as part of the opcode byte is not the target address in the current code segment, but the distance to the target address. Fortunately, MASM will compute the distance for you automatically, so you do not have to compute this displacement value yourself. In many respects, these instructions are really nothing more than add ip, disp
instructions.
The direct intersegment jump is five bytes long, the last four bytes containing a segmented address (the offset in the second and third bytes, the segment in the fourth and fifth bytes). This instruction copies the offset into the ip
register and the segment into the cs
register. Execution of the next instruction continues at the new address in cs:ip
. Unlike the previous two jumps, the address following the opcode is the absolute memory address of the target instruction; this version does not use relative addressing. This instruction loads cs:ip with a 32 bit immediate value.
For the three direct jumps described above, you normally specify the target address using a statement label. A statement label is usually an identifier followed by a colon, usually on the same line as an executable machine instruction. The assembler determines the offset of the statement after the label and automatically computes the distance from the jump instruction to the statement label. Therefore, you do not have to worry about computing displacements manually. For example, the following short little loop continuously reads the parallel printer data port and inverts the L.O. bit. This produces a square wave electrical signal on one of the printer port output lines:
mov dx, 378h ;Parallel printer port address. LoopForever: in al, dx ;Read character from input port. xor al, 1 ;Invert the L.O. bit. out dx, al ;Output data back to port. jmp LoopForever ;Repeat forever.
The fourth form of the unconditional jump instruction is the indirect intrasegment jump instruction. It requires a 16 bit memory operand. This form transfers control to the address within the offset given by the two bytes of the memory operand. For example,
WordVar word TargetAddress . . . jmp WordVar
transfers control to the address specified by the value in the 16 bit memory location WordVar
. This does not jump to the statement at address WordVar
, it jumps to the statement at the address held in the WordVar
variable. Note that this form of the jmp instruction is roughly equivalent to:
mov ip, WordVar
Although the example above uses a single word variable containing the indirect address, you can use any valid memory address mode, not just the displacement only addressing mode. You can use memory indirect addressing modes like the following:
jmp DispOnly ;Word variable jmp Disp[bx] ;Disp is an array of words jmp Disp[bx][si] jmp [bx] etc.
Consider the indexed addressing mode above for a moment (disp[bx]
). This addressing mode fetches the word from location disp+bx
and copies this value to the ip
register; this lets you create an array of pointers and jump to a specified pointer using an array index. Consider the following example:
AdrsArray word stmt1, stmt2, stmt3, stmt4 . . . mov bx, I ;I is in the range 0..3 add bx, bx ;Index into an array of words. jmp AdrsArray[bx] ;Jump to stmt1, stmt2, etc., depending ; on the value of I.
The important thing to remember is that the near indirect jump fetches a word from memory and copies it into the ip
register; it does not jump to the memory location specified, it jumps indirectly through the 16 bit pointer at the specified memory location.
The fifth jmp
instruction transfers control to the offset given in a 16 bit general purpose register. Note that you can use any general purpose register, not just bx
, si
, di
, or bp
. An instruction of the form
jmp ax
is roughly equivalent to
mov ip, ax
Note that the previous two forms (register or memory indirect) are really the same instruction. The mod and r/m fields of a mod-reg-r/m byte specify a register or memory indirect address. See Appendix D for the details.
The sixth form of the jmp
instruction, the indirect intersegment jump, has a memory operand that contains a double word pointer. The CPU copies the double word at that address into the cs:ip
register pair. For example,
FarPointer dword TargetAddress . . . jmp FarPointer
transfers control to the segmented address specified by the four bytes at address FarPointer
. This instruction is semantically identical to the (mythical) instruction
lcs ip, FarPointer ;load cs, ip from FarPointer
As for the near indirect jump described earlier, this far indirect jump lets you specify any arbitrary (valid) memory addressing mode. You are not limited to the displacement only addressing mode the example above uses.
MASM uses a near indirect or far indirect addressing mode depending upon the type of the memory location you specify. If the variable you specify is a word variable, MASM will automatically generate a near indirect jump; if the variable is a dword, MASM emits the opcode for a far indirect jump. Some forms of memory addressing, unfortunately, do not intrinsically specify a size. For example, [bx]
is definitely a memory operand, but does bx
point at a word variable or a double word variable? It could point at either. Therefore, MASM will reject a statement of the form:
jmp [bx]
MASM cannot tell whether this should be a near indirect or far indirect jump. To resolve the ambiguity, you will need to use a type coercion operator. Chapter Eight will fully describe type coercion operators, for now, just use one of the following two instructions for a near or far jump, respectively:
jmp word ptr [bx] jmp dword ptr [bx]
The register indirect addressing modes are not the only ones that could be type ambiguous. You could also run into this problem with indexed and base plus index addressing modes:
jmp word ptr 5[bx] jmp dword ptr 9[bx][si]
For more information on the type coercion operators, see Chapter Eight.
In theory, you could use the indirect jump instructions and the setcc
instructions to conditionally transfer control to some given location. For example, the following code transfers control to iftrue
if word variable X
is equal to word variable Y
. It transfers control to iffalse
, otherwise.
JmpTbl word iffalse, iftrue . . . mov ax, X cmp ax, Y sete bl movzx ebx, bl jmp JmpTbl[ebx*2]
As you will soon see, there is a much better way to do this using the conditional jump instructions.
The call
and ret
instructions handle subroutine calls and returns. There are five different call instructions and six different forms of the return instruction:
call disp16 ;direct intrasegment, 16 bit relative. call adrs32 ;direct intersegment, 32 bit segmented address. call mem16 ;indirect intrasegment, 16 bit memory pointer. call reg16 ;indirect intrasegment, 16 bit register pointer. call mem32 ;indirect intersegment, 32 bit memory pointer. ret ;near or far return retn ;near return retf ;far return ret disp ;near or far return and pop retn disp ;near return and pop retf disp ;far return and pop
The call
instructions take the same forms as the jmp
instructions except there is no short (two byte) intrasegment call.
The far call
instruction does the following:
cs
register onto the stack.
cs:ip
registers. Since the call
instruction allows the same addressing modes as jmp
, call
can obtain the target address using a relative, memory, or register addressing mode.
The near call
instruction does the following:
ip
register. Since the call
instruction allows the same addressing modes as jmp
, call
can obtain the target address using a relative, memory, or register addressing mode.
The call disp16
instruction uses relative addressing. You can compute the effective address of the target by adding this 16 bit displacement with the return address (like the relative jmp instructions, the displacement is the distance from the instruction following the call to the target address).
The call adrs32
instruction uses the direct addressing mode. A 32 bit segmented address immediately follows the call
opcode. This form of the call instruction copies that value directly into the cs:ip
register pair. In many respects, this is equivalent to the immediate addressing mode since the value this instruction copies into the cs:ip
register pair immediately follows the instruction.
Call mem16
uses the memory indirect addressing mode. Like the jmp
instruction, this form of the call
instruction fetches the word at the specified memory location and uses that word's value as the target address. Remember, you can use any memory addressing mode with this instruction. The displacement-only addressing mode is the most common form, but the others are just as valid:
call CallTbl[bx] ;Index into an array of pointers. call word ptr [bx] ;BX points at word to use. call WordTbl[bx][si] ; etc.
Note that the selection of addressing mode only affects the effective address computation for the target subroutine. These call instructions still push the offset of the next instruction following the call onto the stack. Since these are near calls (they obtain their target address from a 16 bit memory location), they all push a 16 bit return address onto the stack.
Call reg16
works just like the memory indirect call above, except it uses the 16 bit value in a register for the target address. This instruction is really the same instruction as the call mem16
instruction. Both forms specify their effective address using a mod-reg-r/m byte. For the call reg16
form, the mod bits contain 11b so the r/m field specifies a register rather than a memory addressing mode. Of course, this instruction also pushes the 16 bit offset of the next instruction onto the stack as the return address.
The call mem32
instruction is a far indirect call. The memory address specified by this instruction must be a double word value. This form of the call instruction fetches the 32 bit segmented address at the computed effective address and copies this double word value into the cs:ip
register pair. This instruction also copies the 32 bit segmented address of the next instruction onto the stack (it pushes the segment value first and the offset portion second). Like the call mem16
instruction, you can use any valid memory addressing mode with this instruction:
call DWordVar call DwordTbl[bx] call dword ptr [bx] etc.
It is relatively easy to synthesize the call
instruction using two or three other 80x86 instructions. You could create the equivalent of a near call
using a push
and a jmp
instruction:
push <offset of instruction after jmp> jmp subroutine
A far call
would be similar, you'd need to add a push cs
instruction before the two instructions above to push a far return address on the stack.
The ret
(return) instruction returns control to the caller of a subroutine. It does so by popping the return address off the stack and transferring control to the instruction at this return address. Intrasegment (near) returns pop a 16 bit return address off the stack into the ip
register. An intersegment (far) return pops a 16 bit offset into the ip register and then a 16 bit segment value into the cs
register. These instructions are effectively equal to the following:
retn: pop ip retf: popd cs:ip
Clearly, you must match a near subroutine call with a near return and a far subroutine call with a corresponding far return. If you mix near calls with far returns or vice versa, you will leave the stack in an inconsistent state and you probably will not return to the proper instruction after the call. Of course, another important issue when using the call
and ret
instructions is that you must make sure your subroutine doesn't push something onto the stack and then fail to pop it off before trying to return to the caller. Stack problems are a major cause of errors in assembly language subroutines. Consider the following code:
Subroutine: push ax push bx . . . pop bx ret . . . call Subroutine
The call
instruction pushes the return address onto the stack and then transfers control to the first instruction of subroutine
. The first two push instructions push the ax
and bx
registers onto the stack, presumably in order to preserve their value because subroutine
modifies them. Unfortunately, a programming error exists in the code above, subroutine only pops bx
from the stack, it fails to pop ax
as well. This means that when subroutine tries to return to the caller, the value of ax
rather than the return address is sitting on the top of the stack. Therefore, this subroutine returns control to the address specified by the initial value of the ax
register rather than to the true return address. Since there are 65,536 different values ax
can have, there is a 1/65,536th of a chance that your code will return to the real return address. The odds are not in your favor! Most likely, code like this will hang up the machine. Moral of the story - always make sure the return address is sitting on the stack before executing the return instruction.
Like the call
instruction, it is very easy to simulate the ret instruction using two 80x86 instructions. All you need to do is pop the return address off the stack and then copy it into the ip
register. For near returns, this is a very simple operation, just pop the near return address off the stack and then jump indirectly through that register:
pop ax jmp ax
Simulating a far return is a little more difficult because you must load cs:ip
in a single operation. The only instruction that does this (other than a far return) is the jmp mem32
instruction. See the exercises at the end of this chapter for more details.
There are two other forms of the ret
instruction. They are identical to those above except a 16 bit displacement follows their opcodes. The CPU adds this value to the stack pointer immediately after popping the return address from the stack. This mechanism removes parameters pushed onto the stack before returning to the caller. See Chapter Eleven for more details.
The assembler allows you to type ret
without the "f" or "n" suffix. If you do so, the assembler will figure out whether it should generate a near return or a far return. See the chapter on procedures and functions for details on this.
The int
(for software interrupt) instruction is a very special form of a call
instruction. Whereas the call
instruction calls subroutines within your program, the int
instruction calls system routines and other special subroutines. The major difference between interrupt service routines and standard procedures is that you can have any number of different procedures in an assembly language program, while the system supports a maximum of 256 different interrupt service routines. A program calls a subroutine by specifying the address of that subroutine; it calls an interrupt service routine by specifying the interrupt number for that particular interrupt service routine. This chapter will only describe how to call an interrupt service routine using the int, into,
and bound
instructions, and how to return from an interrupt service routine using the iret
instruction.
There are four different forms of the int
instruction. The first form is
int nn
(where "nn" is a value between 0 and 255). It allows you to call one of 256 different interrupt routines. This form of the int
instruction is two bytes long. The first byte is the int
opcode. The second byte is immediate data containing the interrupt number.
Although you can use the int
instruction to call procedures (interrupt service routines) you've written, the primary purpose of this instruction is to make a system call. A system call is a subroutine call to a procedure provided by the system, such as a DOS , PC-BIOS, mouse, or some other piece of software resident in the machine before your program began execution. Since you always refer to a specific system call by its interrupt number, rather than its address, your program does not need to know the actual address of the subroutine in memory. The int instruction provides dynamic linking to your program. The CPU determines the actual address of an interrupt service routine at run time by looking up the address in an interrupt vector table. This allows the authors of such system routines to change their code (including the entry point) without fear of breaking any older programs that call their interrupt service routines. As long as the system call uses the same interrupt number, the CPU will automatically call the interrupt service routine at its new address.
The only problem with the int
instruction is that it supports only 256 different interrupt service routines. MS-DOS alone supports well over 100 different calls. BIOS and other system utilities provide thousands more. This is above and beyond all the interrupts reserved by Intel for hardware interrupts and traps. The common solution most of the system calls use is to employ a single interrupt number for a given class of calls and then pass a function number in one of the 80x86 registers (typically the ah
register). For example, MS-DOS uses only a single interrupt number, 21h. To choose a particular DOS function, you load a DOS function code into the ah
register before executing the int 21h
instruction. For example, to terminate a program and return control to MS-DOS, you would normally load ah with 4Ch and call DOS with the int 21h
instruction:
mov ah, 4ch ;DOS terminate opcode. int 21h ;DOS call
The BIOS keyboard interrupt is another good example. Interrupt 16h is responsible for testing the keyboard and reading data from the keyboard. This BIOS routine provides several calls to read a character and scan code from the keyboard, see if any keys are available in the system type ahead buffer, check the status of the keyboard modifier flags, and so on. To choose a particular operation, you load the function number into the ah register before executing int 16h
. The following table lists the possible functions:
Function # (AH) | Input Parameters | Output Parameters | Description |
---|---|---|---|
0 | - | al - ASCII character
ah - scan code
|
Read character. Reads next available character from the system's type ahead buffer. Wait for a keystroke if the buffer is empty. |
1 | - | ZF- Set if no key.
ZF- Clear if key available.
al - ASCII code
ah - scan code
|
Checks to see if a character is available in the type ahead buffer. Sets the zero flag if not key is available, clears the zero flag if a key is available. If there is an available key, this function returns the ASCII and scan code value in ax . The value in ax is undefined if no key is available.
|
2 | - | al- shift flags | Returns the current status of the shift flags in al. The shift flags are defined as follows: bit 7: Insert toggle bit 6: Capslock toggle bit 5: Numlock toggle bit 4: Scroll lock toggle bit 3: Alt key is down bit 2: Ctrl key is down bit 1: Left shift key is down bit 0: Right shift key is down |
3 | al = 5
bh = 0, 1, 2, 3 for 1/4, 1/2, 3/4, or 1 second delay
bl = 0..1Fh for 30/sec to 2/sec.
|
- | Set auto repeat rate. The bh register contains the amount of time to wait before starting the autorepeat operation, the bl register contains the autorepeat rate.
|
5 | ch = scan code
cl = ASCII code
|
- | Store keycode in buffer. This function stores the value in the cx register at the end of the type ahead buffer. Note that the scan code in ch doesn't have to correspond to the ASCII code appearing in cl . This routine will simply insert the data you provide into the system type ahead buffer.
|
10h | - | al - ASCII character
ah - scan code
|
Read extended character. Like ah =0 call, except this one passes all key codes, the ah =0 call throws away codes that are not PC/XT compatible.
|
11h | - | ZF- Set if no key.
ZF- Clear if key available.
al - ASCII code
ah - scan code
|
Like the ah=01h call except this one does not throw away keycodes that are not PC/XT compatible (i.e., the extra keys found on the 101 key keyboard). |
12h | - | al- shift flags ah- extended shift flags | Returns the current status of the shift flags in ax. The shift flags are defined as follows: bit 15: SysReq key pressed bit 14: Capslock key currently down bit 13: Numlock key currently down bit 12: Scroll lock key currently down bit 11: Right alt key is down bit 10:Right ctrl key is down bit 9: Left alt key is down bit 8: Left ctrl key is down bit 7: Insert toggle bit 6: Capslock toggle bit 5: Numlock toggle bit 4: Scroll lock toggle bit 3: Either alt key is down (some machines, left only) bit 2: Either ctrl key is down bit 1: Left shift key is down bit 0: Right shift key is down |
For example, to read a character from the system type ahead buffer, leaving the ASCII code in al
, you could use the following code:
mov ah, 0 ;Wait for key available, and then int 16h ; read that key. mov character, al ;Save character read.
Likewise, if you wanted to test the type ahead buffer to see if a key is available, without reading that keystroke, you could use the following code:
mov ah, 1 ;Test to see if key is available. int 16h ;Sets the zero flag if a key is not ; available.
The second form of the int instruction is a special case:
int 3
Int
3 is a special form of the interrupt instruction that is only one byte long. CodeView and other debuggers use it as a software breakpoint instruction. Whenever you set a breakpoint on an instruction in your program, the debugger will typically replace the first byte of the instruction's opcode with an int 3
instruction. When your program executes the int 3
instruction, this makes a "system call" to the debugger so the debugger can regain control of the CPU. When this happens, the debugger will replace the int 3
instruction with the original opcode.
While operating inside a debugger, you can explicitly use the int 3
instruction to stop program executing and return control to the debugger. This is not, however, the normal way to terminate a program. If you attempt to execute an int 3
instruction while running under DOS, rather than under the control of a debugger program, you will likely crash the system.
The third form of the int
instruction is into
. Into
will cause a software breakpoint if the 80x86 overflow flag is set. You can use this instruction to quickly test for arithmetic overflow after executing an arithmetic instruction. Semantically, this instruction is equivalent to
if overflow = 1 then int 4
You should not use this instruction unless you've supplied a corresponding trap handler (interrupt service routine). Doing so would probably crash the system. .
The fourth software interrupt, provided by 80286 and later processors, is the bound
instruction. This instruction takes the form
bound reg, mem
and executes the following algorithm:
if (reg < [mem]) or (reg > [mem+sizeof(reg)]) then int 5
[mem]
denotes the contents of the memory location mem
and sizeof(reg)
is two or four depending on whether the register is 16 or 32 bits wide. The memory operand must be twice the size of the register operand. The bound
instruction compares the values using a signed integer comparison.
Intel's designers added the bound instruction to allow a quick check of the range of a value in a register. This is useful in Pascal, for example, which checking array bounds validity and when checking to see if a subrange integer is within an allowable range. There are two problems with this instruction, however. On 80486 and Pentium/586 processors, the bound instruction is generally slower than the sequence of instructions it would replace:
cmp reg, LowerBound jl OutOfBounds cmp reg, UpperBound jg OutOfBounds
On the 80486 and Pentium/586 chips, the sequence above only requires four clock cycles assuming you can use the immediate addressing mode and the branches are not taken; the bound
instruction requires 7-8 clock cycles under similar circumstances and also assuming the memory operands are in the cache.
A second problem with the bound
instruction is that it executes an int 5
if the specified register is out of range. IBM, in their infinite wisdom, decided to use the int 5
interrupt handler routine to print the screen. Therefore, if you execute a bound
instruction and the value is out of range, the system will, by default, print a copy of the screen to the printer. If you replace the default int 5
handler with one of your own, pressing the PrtSc key will transfer control to your bound
instruction handler. Although there are ways around this problem, most people don't bother since the bound
instruction is so slow.
Whatever int
instruction you execute, the following sequence of events follows:
cs
and then ip
onto the stack;
into
is interrupt #4, bound
is interrupt #5) times four as an index into the interrupt vector table and copies the double word at that point in the table into cs:ip
.
The int
instructions vary from a call
in two major ways. First, call
instructions vary in length from two to six bytes long, whereas int
instructions are generally two bytes long (int 3, into
, and bound
are the exceptions). Second, and most important, the int
instruction pushes the flags and the return address onto the stack while the call
instruction pushes only the return address. Note also that the int
instructions always push a far return address (i.e., a cs
value and an offset within the code segment), only the far call pushes this double word return address.
Since int
pushes the flags onto the stack you must use a special return instruction, iret
(interrupt return), to return from a routine called via the int
instructions. If you return from an interrupt procedure using the ret
instruction, the flags will be left on the stack upon returning to the caller. The iret
instruction is equivalent to the two instruction sequence: ret
, popf (assuming, of course, that you execute popf
before returning control to the address pointed at by the double word on the top of the stack).
The int
instructions clear the trace (T) flag in the flags register. They do not affect any other flags. The iret
instruction, by its very nature, can affect all the flags since it pops the flags from the stack.
Although the jmp
, call
, and ret
instructions provide transfer of control, they do not allow you to make any serious decisions. The 80x86's conditional jump instructions handle this task. The conditional jump instructions are the basic tool for creating loops and other conditionally executable statements like the if..then
statement.
The conditional jumps test one or more flags in the flags register to see if they match some particular pattern (just like the setcc
instructions). If the pattern matches, control transfers to the target location. If the match fails, the CPU ignores the conditional jump and execution continues with the next instruction. Some instructions, for example, test the conditions of the sign, carry, overflow, and zero flags. For example, after the execution of a shift left instruction, you could test the carry flag to determine if it shifted a one out of the H.O. bit of its operand. Likewise, you could test the condition of the zero flag after a test
instruction to see if any specified bits were one. Most of the time, however, you will probably execute a conditional jump after a cmp
instruction. The cmp instruction sets the flags so that you can test for less than, greater than, equality, etc.
Note: Intel's documentation defines various synonyms or instruction aliases for many conditional jump instructions. The following tables list all the aliases for a particular instruction. These tables also list out the opposite branches. You'll soon see the purpose of the opposite branches.
Instruction | Description | Condition | Aliases | Opposite |
---|---|---|---|---|
JC | Jump if carry | Carry = 1 | JB, JNAE | JNC |
JNC | Jump if no carry | Carry = 0 | JNB, JAE | JC |
JZ | Jump if zero | Zero = 1 | JE | JNZ |
JNZ | Jump if not zero | Zero = 0 | JNE | JZ |
JS | Jump if sign | Sign = 1 | - | JNS |
JNS | Jump if no sign | Sign = 0 | - | JS |
JO | Jump if overflow | Ovrflw=1 | - | JNO |
JNO | Jump if no Ovrflw | Ovrflw=0 | - | JO |
JP | Jump if parity | Parity = 1 | JPE | JNP |
JPE | Jump if parity even | Parity = 1 | JP | JPO |
JNP | Jump if no parity | Parity = 0 | JPO | JP |
JPO | Jump if parity odd | Parity = 0 | JNP | JPE |
Instruction | Description | Condition | Aliases | Opposite |
---|---|---|---|---|
JA | Jump if above (>) | Carry=0, Zero=0 | JNBE | JNA |
JNBE | Jump if not below or equal (not <=) | Carry=0, Zero=0 | JA | JBE |
JAE | Jump if above or equal (>=) | Carry = 0 | JNC, JNB | JNAE |
JNB | Jump if not below (not <) | Carry = 0 | JNC, JAE | JB |
JB | Jump if below (<) | Carry = 1 | JC, JNAE | JNB |
JNAE | Jump if not above or equal (not >=) | Carry = 1 | JC, JB | JAE |
JBE | Jump if below or equal (<=) | Carry = 1 or Zero = 1 | JNA | JNBE |
JNA | Jump if not above (not >) | Carry = 1 or Zero = 1 | JBE | JA |
JE | Jump if equal (=) | Zero = 1 | JZ | JNE |
JNE | Jump if not equal () | Zero = 0 | JNZ | JE |
Instruction | Description | Condition | Aliases | Opposite |
---|---|---|---|---|
JG | Jump if greater (>) | Sign = Ovrflw or Zero=0 | JNLE | JNG |
JNLE | Jump if not less than or equal (not <=) | Sign = Ovrflw or Zero=0 | JG | JLE |
JGE | Jump if greater than or equal (>=) | Sign = Ovrflw | JNL | JGE |
JNL | Jump if not less than (not <) | Sign = Ovrflw | JGE | JL |
JL | Jump if less than (<) | Sign Ovrflw | JNGE | JNL |
JNGE | Jump if not greater or equal (not >=) | Sign Ovrflw | JL | JGE |
JLE | Jump if less than or equal (<=) | Sign Ovrflw or Zero = 1 | JNG | JNLE |
JNG | Jump if not greater than (not >) | Sign Ovrflw or Zero = 1 | JLE | JG |
JE | Jump if equal (=) | Zero = 1 | JZ | JNE |
JNE | Jump if not equal () | Zero = 0 | JNZ | JE |
On the 80286 and earlier, these instructions are all two bytes long. The first byte is a one byte opcode followed by a one byte displacement. Although this leads to very compact instructions, a single byte displacement only allows a range of ±128 bytes. There is a simple trick you can use to overcome this limitation on these earlier processors:
jmp
instruction whose target address is the original target address.
For example, to convert:
jc Target
to the long form, use the following sequence of instructions:
jnc SkipJmp jmp Target SkipJmp:
If the carry flag is clear (NC=no carry), then control transfers to label SkipJmp
, at the same point you'd be if you were using the jc
instruction above. If the carry flag is set when encountering this sequence, control will fall through the jnc
instruction to the jmp
instruction that will transfer control to Target
. Since the jmp
instruction allows 16 bit displacement and far operands, you can jump anywhere in the memory using this trick.
One brief comment about the "opposites" column is in order. As mentioned above, when you need to manually extend a branch from ±128 you should choose the opposite branch to branch around a jump to the target location. As you can see in the "aliases" column above, many conditional jump instructions have aliases. This means that there will be aliases for the opposite jumps as well. Do not use any aliases when extending branches that are out of range. With only two exceptions, a very simple rule completely describes how to generate an opposite branch:
jcc
instruction is not an "n", insert an "n" after the "j". E.g., je
becomes jne
and jl
becomes jnl
.
jcc
instruction is an "n", then remove that "n" from the instruction. E.g., jng
becomes jg
, jne
becomes je
.
The two exceptions to this rule are jpe
(jump parity even) and jpo
(jump parity odd). These exceptions cause few problems because (a) you'll hardly ever need to test the parity flag, and (b) you can use the aliases jp
and jnp
synonyms
for jpe
and jpo
. The "N/No N" rule applies to jp
and jnp
.
Though you know that jge
is the opposite of jl
, get in the habit of using jnl
rather than jge
. It's too easy in an important situation to start thinking "greater is the opposite of less" and substitute jg
instead. You can avoid this confusion by always using the "N/No N" rule.
MASM 6.x and many other modern 80x86 assemblers will automatically convert out of range branches to this sequence for you. There is an option that will allow you to disable this feature. For performance critical code that runs on 80286 and earlier processors, you may want to disable this feature so you can fix the branches yourself. The reason is quite simple, this simple fix always wipes out the pipeline no matter which condition is true since the CPU jumps in either case. One thing nice about conditional jumps is that you do not flush the pipeline or the prefetch queue if you do not take the branch. If one condition is true far more often than the other, you might want to use the conditional jump to transfer control to a jmp
nearby, so you can continue to fall through as before. For example, if you have a je target
instruction and target
is out of range, you could convert it to the following code:
je GotoTarget . . . GotoTarget: jmp Target
Although a branch to target now requires executing two jumps, this is much more efficient than the standard conversion if the zero flag is normally clear when executing the je
instruction.
The 80386 and later processor provide an extended form of the conditional jump that is four bytes long, with the last two bytes containing a 16 bit displacement. These conditional jumps can transfer control anywhere within the current code segment. Therefore, there is no need to worry about manually extending the range of the jump. If you've told MASM you're using an 80386 or later processor, it will automatically choose the two byte or four byte form, as necessary. See Chapter Eight to learn how to tell MASM you're using an 80386 or later processor.
The 80x86 conditional jump instruction give you the ability to split program flow into one of two paths depending upon some logical condition. Suppose you want to increment the ax
register if bx
is or equal to cx
. You can accomplish this with the following code:
cmp bx, cx jne SkipStmts inc ax SkipStmts:
The trick is to use the opposite branch to skip over the instructions you want to execute if the condition is true. Always use the "opposite branch (N/no N)" rule given earlier to select the opposite branch. You can make the same mistake choosing an opposite branch here as you could when extending out of range jumps.
You can also use the conditional jump instructions to synthesize loops. For example, the following code sequence reads a sequence of characters from the user and stores each character in successive elements of an array until the user presses the Enter key (carriage return):
mov di, 0 ReadLnLoop: mov ah, 0 ;INT 16h read key opcode. int 16h mov Input[di], al inc di cmp al, 0dh ;Carriage return ASCII code. jne ReadLnLoop mov Input[di-1],0 ;Replace carriage return with zero.
For more information concerning the use of the conditional jumps to synthesize IF statements, loops, and other control structures, see Chapter Ten.
Like the setcc
instructions, the conditional jump instructions come in two basic categories - those that test specific process flag values (e.g., jz, jc, jno
) and those that test some condition ( less than, greater than, etc.). When testing a condition, the conditional jump instructions almost always follow a cmp
instruction. The cmp
instruction sets the flags so you can use a ja, jae, jb, jbe, je,
or jne
instruction to test for unsigned less than, less than or equal, equality, inequality, greater than, or greater than or equal. Simultaneously, the cmp instruction sets the flags so you can also do a signed comparison using the jl, jle, je, jne, jg,
and jge
instructions.
The conditional jump instructions only test flags, they do not affect any of the 80x86 flags.
The jcxz
(jump if cx
is zero) instruction branches to the target address if cx
contains zero. Although you can use it anytime you need to see if cx contains zero, you would normally use it before a loop you've constructed with the loop
instructions. The loop
instruction can repeat a sequence of operations cx
times. If cx
equals zero, loop
will repeat the operation 65,536 times. You can use jcxz
to skip over such a loop when cx
is zero.
The jecxz
instruction, available only on 80386 and later processors, does essentially the same job as jcxz
except it tests the full ecx
register. Note that the jcxz
instruction only checks cx
, even on an 80386 in 32 bit mode.
There are no "opposite" jcxz
or jecxz
instructions. Therefore, you cannot use "N/No N" rule to extend the jcxz
and jecxz
instructions. The easiest way to solve this problem is to break the instruction up into two instructions that accomplish the same task:
jcxz Target
becomes
test cx, cx ;Sets the zero flag if cx=0 je Target
Now you can easily extend the je
instruction using the techniques from the previous section.
The test
instruction above will set the zero flag if and only if cx
contains zero. After all, if there are any non-zero bits in cx
, logically anding them with themselves will produce a non-zero result. This is an efficient way to see if a 16 or 32 bit register contains zero. In fact, this two instruction sequence is faster than the jcxz
instruction on the 80486 and later processors. Indeed, Intel recommends the use of this sequence rather than the jcxz
instruction if you are concerned with speed. Of course, the jcxz
instruction is shorter than the two instruction sequence, but it is not faster. This is a good example of an exception to the rule "shorter is usually faster."
The jcxz
instruction does not affect any flags.
This instruction decrements the cx
register and then branches to the target location if the cx
register does not contain zero. Since this instruction decrements cx
then checks for zero, if cx
originally contained zero, any loop you create using the loop
instruction will repeat 65,536 times. If you do not want to execute the loop when cx
contains zero, use jcxz
to skip over the loop.
There is no "opposite" form of the loop instruction, and like the jcxz/jecxz
instructions the range is limited to ±128 bytes on all processors. If you want to extend the range of this instruction, you will need to break it down into discrete components:
; "loop lbl" becomes: dec cx jne lbl
You can easily extend this jne
to any distance.
There is no eloop
instruction that decrements ecx
and branches if not zero (there is a loope
instruction, but it does something else entirely). The reason is quite simple. As of the 80386, Intel's designers stopped wholeheartedly supporting the loop
instruction. Oh, it's there to ensure compatibility with older code, but it turns out that the dec/jne
instructions are actually faster on the 32 bit processors. Problems in the decoding of the instruction and the operation of the pipeline are responsible for this strange turn of events.
Although the loop
instruction's name suggests that you would normally create loops with it, keep in mind that all it is really doing is decrementing cx
and branching to the target address if cx
does not contain zero after the decrement. You can use this instruction anywhere you want to decrement cx
and then check for a zero result, not just when creating loops. Nonetheless, it is a very convenient instruction to use if you simply want to repeat a sequence of instructions some number of times. For example, the following loop initializes a 256 element array of bytes to the values 1, 2, 3, ...
mov ecx, 255 ArrayLp: mov Array[ecx], cl loop ArrayLp mov Array[0], 0
The last instruction is necessary because the loop does not repeat when cx
is zero. Therefore, the last element of the array that this loop processes is Array[1]
, hence the last instruction.
The loop
instruction does not affect any flags.
Loope/loopz
(loop while equal/zero, they are synonyms for one another) will branch to the target address if cx
is not zero and the zero flag is set. This instruction is quite useful after cmp
or cmps
instruction, and is marginally faster than the comparable 80386/486 instructions if you use all the features of this instruction. However, this instruction plays havoc with the pipeline and superscalar operation of the Pentium so you're probably better off sticking with discrete instructions rather than using this instruction. This instruction does the following:
cx := cx - 1 if ZeroFlag = 1 and cx 0, goto target
The loope
instruction falls through on one of two conditions. Either the zero flag is clear or the instruction decremented cx
to zero. By testing the zero flag after the loop instruction (with a je
or jne
instruction, for example), you can determine the cause of termination.
This instruction is useful if you need to repeat a loop while some value is equal to another, but there is a maximum number of iterations you want to allow. For example, the following loop scans through an array looking for the first non-zero byte, but it does not scan beyond the end of the array:
mov cx, 16 ;Max 16 array elements. mov bx, -1 ;Index into the array (note next inc). SearchLp: inc bx ;Move on to next array element. cmp Array[bx], 0 ;See if this element is zero. loope SearchLp ;Repeat if it is. je AllZero ;Jump if all elements were zero.
Note that this instruction is not the opposite of loopnz/loopne
. If you need to extend this jump beyond ±128 bytes, you will need to synthesize this instruction using discrete instructions. For example, if loope
target is out of range, you would need to use an instruction sequence like the following:
jne quit dec cx je Quit2 jmp Target quit: dec cx ;loope decrements cx, even if ZF=0. quit2:
The loope/loopz
instruction does not affect any flags.
This instruction is just like the loope
/loopz
instruction
in the previous section except loopne/loopnz
(loop while not equal/not zero) repeats while cx
is not zero and the zero flag is clear. The algorithm is
cx := cx - 1 if ZeroFlag = 0 and cx 0, goto target
You can determine if the loopne
instruction terminated because cx
was zero or if the zero flag was set by testing the zero flag immediately after the loopne
instruction. If the zero flag is clear at that point, the loopne
instruction fell through because it decremented cx
to zero. Otherwise it fell through because the zero flag was set.
This instruction is not the opposite of loope/loopz
. If the target address is out of range, you will need to use an instruction sequence like the following:
je quit dec cx je Quit2 jmp Target quit: dec cx ;loopne decrements cx, even if ZF=1. quit2:
You can use the loopne
instruction to repeat some maximum number of times while waiting for some other condition to be true. For example, you could scan through an array until you exhaust the number of array elements or until you find a certain byte using a loop like the following:
mov cx, 16 ;Maximum # of array elements. mov bx, -1 ;Index into array. LoopWhlNot0: inc bx ;Move on to next array element. cmp Array[bx],0 ;Does this element contain zero? loopne LoopWhlNot0 ;Quit if it does, or more than 16 bytes.
Although the loope/loopz
and loopne/loopnz
instructions are slower than the individual instruction from which they could be synthesized, there is one main use for these instruction forms where speed is rarely important; indeed, being faster would make them less useful - timeout loops during I/O operations. Suppose bit #7 of input port 379h contains a one if the device is busy and contains a zero if the device is not busy. If you want to output data to the port, you could use code like the following:
mov dx, 379h WaitNotBusy: in al, dx ;Get port test al, 80h ;See if bit #7 is one jne WaitNotBusy ;Wait for "not busy"
The only problem with this loop is that it is conceivable that it would loop forever. In a real system, a cable could come unplugged, someone could shut off the peripheral device, and any number of other things could go wrong that would hang up the system. Robust programs usually apply a timeout to a loop like this. If the device fails to become busy within some specified amount of time, then the loop exits and raises an error condition. The following code will accomplish this:
mov dx, 379h ;Input port address mov cx, 0 ;Loop 65,536 times and then quit. WaitNotBusy: in al, dx ;Get data at port. test al, 80h ;See if busy loopne WaitNotBusy ;Repeat if busy and no time out. jne TimedOut ;Branch if CX=0 because we timed out.
You could use the loope/loopz
instruction if the bit were zero rather than one.
The loopne/loopnz
instruction does not affect any flags.
There are various miscellaneous instructions on the 80x86 that don't fall into any category above. Generally these are instructions that manipulate individual flags, provide special processor services, or handle privileged mode operations.
There are several instructions that directly manipulate flags in the 80x86 flags register. They are
clc
Clears the carry flag
stc
Sets the carry flag
cmc
Complements the carry flag
cld
Clears the direction flag
std
Sets the direction flag
cli
Clears the interrupt enable/disable flag
sti
Sets the interrupt enable/disable flag
Note: you should be careful when using the cli
instruction in your programs. Improper use could lock up your machine until you cycle the power.
The nop
instruction doesn't do anything except waste a few processor cycles and take up a byte of memory. Programmers often use it as a place holder or a debugging aid. As it turns out, this isn't a unique instruction, it's just a synonym for the xchg ax, ax
instruction.
The hlt
instruction halts the processor until a reset, non-maskable interrupt, or other interrupt (assuming interrupts are enabled) comes along. Generally, you shouldn't use this instruction on the IBM PC unless you really know what you are doing. This instruction is not equivalent to the x86 halt instruction. Do not use it to stop your programs.
The 80x86 provides another prefix instruction, lock
, that, like the rep
instruction, affects the following instruction. However, this instruction has little meaning on most PC systems. Its purpose is to coordinate systems that have multiple CPUs. As systems become available with multiple processors, this prefix may finally become valuable. You need not be too concerned about this here.
The Pentium provides two additional instructions of interest to real-mode DOS programmers. These instructions are cpuid
and rdtsc
. If you load eax
with zero and execute the cpuid
instruction, the Pentium (and later processors) will return the maximum value cpuid
allows as a parameter in eax
. For the Pentium, this value is one. If you load the eax
register with one and execute the cpuid
instruction, the Pentium will return CPU identification information in eax
. Since this instruction is of little value until Intel produces several additional chips in the family, there is no need to consider it further, here.
The second Pentium instruction of interest is the rdtsc
(read time stamp counter) instruction. The Pentium maintains a 64 bit counter that counts clock cycles starting at reset. The rdtsc
instruction copies the current counter value into the edx:eax
register pair. You can use this instruction to accurately time sequences of code.
Besides the instructions presented thus far, the 80286 and later processors provide a set of protected mode instructions. This text will not consider those protected most instructions that are useful only to those who are writing operating systems. You would not even use these instructions in your applications when running under a protected mode operating system like Windows, UNIX, or OS/2. These instructions are reserved for the individuals who write such operating systems and drivers for them.