10.5 Machine and Arithmetic Idioms
An idiom is an idiosyncrasy. Several arithmetic operations and 80x86 instructions have idiosyncrasies that you can take advantage of when writing assembly language code. Some people refer to the use of machine and arithmetic idioms as "tricky programming" that you should always avoid in well written programs. While it is wise to avoid tricks just for the sake of tricks, many machine and arithmetic idioms are well-known and commonly found in assembly language programs. Some of them can be really tricky, but a good number of them are simply "tricks of the trade." This text cannot even begin to present all of the idioms in common use today; they are too numerous and the list is constantly changing. Nevertheless, there are some very important idioms that you will see all the time, so it makes sense to discuss those.
10.5.1 Multiplying without MUL, IMUL, or INTMUL
If you take a quick look at the timing for the multiply instruction, you'll notice that the execution time for this instruction is often long1. When multiplying by a constant, you can sometimes avoid the performance penalty of the MUL, IMUL, and INTMUL instructions by using shifts, additions, and subtractions to perform the multiplication.
Remember, a SHL instruction computes the same result as multiplying the specified operand by two. Shifting to the left two bit positions multiplies the operand by four. Shifting to the left three bit positions multiplies the operand by eight. In general, shifting an operand to the left n bits multiplies it by 2n. Any value can be multiplied by some constant using a series of shifts and adds or shifts and subtractions. For example, to multiply the AX register by ten, you need only multiply it by eight and then add in two times the original value. That is, 10*AX = 8*AX + 2*AX. The code to accomplish this is
shl( 1, ax ); // Multiply AX by two. mov( ax, bx); // Save 2*AX for later. shl( 2, ax ); // Multiply ax by eight (*4 really, but it contains *2). add( bx, ax ); // Add in AX*2 to AX*8 to get AX*10.The AX register (or just about any register, for that matter) can often be multiplied by many constant values much faster using SHL than by using the MUL instruction. This may seem hard to believe since it only takes one instruction to compute this product:
intmul( 10, ax );However, if you look at the timings, the shift and add example above requires fewer clock cycles on many processors in the 80x86 family than the MUL instruction. Of course, the code is somewhat larger (by a few bytes), but the performance improvement is usually worth it. Of course, on the later 80x86 processors, the multiply instructions are quite a bit faster than the earlier processors, but the shift and add scheme is often faster on these processors as well.
You can also use subtraction with shifts to perform a multiplication operation. Consider the following multiplication by seven:
mov( eax, ebx ); // Save EAX * 1 shl( 3, eax ); // EAX = EAX * 8 sub( ebx, eax ); // EAX*8 - EAX*1 is EAX*7This follows directly from the fact that EAX*7 = (EAX*8)-EAX.
A common error made by beginning assembly language students is subtracting or adding one or two rather than EAX*1 or EAX*2. The following does not compute eax*7:
shl( 3, eax ); sub( 1, eax );It computes (8*EAX)-1, something entirely different (unless, of course, EAX = 1). Beware of this pitfall when using shifts, additions, and subtractions to perform multiplication operations.
You can also use the LEA instruction to compute certain products. The trick is to use the scaled index addressing modes. The following examples demonstrate some simple cases:
lea( eax, [ecx][ecx] ); // EAX := ECX * 2 lea( eax, [eax]eax*2] ); // EAX := EAX * 3 lea( eax, [eax*4] ); // EAX := EAX * 4 lea( eax, [ebx][ebx*4] ); // EAX := EBX * 5 lea( eax, [eax*8] ); // EAX := EAX * 8 lea( eax, [edx][edx*8] ); // EAX := EDX * 910.5.2 Division Without DIV or IDIV
Much as the SHL instruction can be used for simulating a multiplication by some power of two, you may use the SHR and SAR instructions to simulate a division by a power of two. Unfortunately, you cannot use shifts, additions, and subtractions to perform a division by an arbitrary constant as easily as you can use these instructions to perform a multiplication operation.
Another way to perform division is to use the multiply instructions. You can divide by some value by multiplying by its reciprocal. Since the multiply instruction is faster than the divide instruction; multiplying by a reciprocal is usually faster than division.
Now you're probably wondering "how does one multiply by a reciprocal when the values we're dealing with are all integers?" The answer, of course, is that we must cheat to do this. If you want to multiply by one tenth, there is no way you can load the value 1/10th into an 80x86 register prior to performing the multiplication. However, we could multiply 1/10th by 10, perform the multiplication, and then divide the result by ten to get the final result. Of course, this wouldn't buy you anything at all, in fact it would make things worse since you're now doing a multiplication by ten as well as a division by ten. However, suppose you multiply 1/10th by 65,536 (6553), perform the multiplication, and then divide by 65,536. This would still perform the correct operation and, as it turns out, if you set up the problem correctly, you can get the division operation for free. Consider the following code that divides AX by ten:
mov( 6554, dx ); // 6554 = round( 65,536/10 ). mul( dx, ax );This code leaves AX/10 in the DX register.
To understand how this works, consider what happens when you multiply AX by 65,536 ($10000). This simply moves AX into DX and sets AX to zero (a multiply by $10000 is equivalent to a shift left by sixteen bits). Multiplying by 6,554 (65,536 divided by ten) puts AX divided by ten into the DX register. Since MUL is faster than DIV , this technique runs a little faster than using a straight division.
Multiplying by the reciprocal works well when you need to divide by a constant. You could even use it to divide by a variable, but the overhead to compute the reciprocal only pays off if you perform the division many, many times (by the same value).
10.5.3 Implementing Modulo-N Counters with AND
If you want to implement a counter variable that counts up to 2n-1 and then resets to zero, simply using the following code:
inc( CounterVar ); and( nBits, CounterVar );where nBits is a binary value containing n one bits right justified in the number. For example, to create a counter that cycles between zero and fifteen, you could use the following:
inc( CounterVar ); and( %00001111, CounterVar );10.5.4 Careless Use of Machine Idioms
One problem with using machine idioms is that the machines change over time. The DOS/16-bit version of this text recommends the use of several machine idioms in addition to those this chapter presents. Unfortunately, as time passed Intel improved the processor and tricks that used to provide a performance benefit are actually slower on the newer processors. Therefore, you should be careful about employing common "tricks" you pick up; they may not actually improve the code.
10.6 The HLA (Pseudo) Random Number Unit
The HLA rand.hhf module provides a set of pseudo-random generators that returns seemingly random values on each call. These pseudo-random number generator functions are great for writing games and other simulations that require a sequence of values that the user can not easily guess. These functions return a 32-bit value in the EAX register. You can treat the result as a signed or unsigned value as appropriate for your application.
The rand.hhf library module includes the following functions:
procedure rand.random; returns( "eax" ); procedure rand.range( startRange:dword; endRange:dword ); returns( "eax" ); procedure rand.uniform; returns( "eax" ); procedure rand.urange( startRange:dword; endRange:dword ); returns( "eax" ); procedure rand.randomize;The rand.random and rand.uniform procedures are both functions that return a 32-bit pseudo-random number in the EAX register. They differ only in the algorithm they use to compute the random number sequence (rand.random uses a standard linear congruential generator, rand.uniform uses an additive generator. See Knuth's "The Art of Computer Programming" Volume Two for details on these two algorithms).
The rand.range and rand.urange functions return a pseudo-random number that falls between two values passed as parameters (inclusive). These routines use better algorithms than the typical "mod the result by the range of values and add the starting value" algorithm that naive users often employ to limit random numbers to a specific range (that naive algorithm generally produces a stream of numbers that is somewhat less than random).
By default, the random number generators in the HLA Standard Library generate the same sequence of numbers every time you run a program. While this may not seem random at all (and statistically, it certainly is not random), this is generally what you want in a random number generator. The numbers should appear to be random but you usually need to be able to generate the same sequence over and over again when testing your program. After all, a defect you encounter with one random sequence may not be apparent when using a different random number sequence. By emitting the same sequence over and over again, your programs become deterministic so you can properly test them across several runs of the program.
Once your program is tested and operational, you might want your random number generator to generate a different sequence every time you run the program. For example, if you write a game and that game uses a pseudo-random sequence to control the action, the end user may detect a pattern and play the game accordingly if the random number generator always returns the same sequence of numbers.
To alleviate this problem, the HLA Standard Library rand module provides the rand.randomize procedure. This procedure reads the current date and time (in milliseconds) and, on processors that support it, reads the CPU's timestamp counter to generate an almost random set of bits as the starting random number generator value. Calling the rand.randomize procedure at the beginning of your program essentially guarantees that different executions of the program will produce a different sequence of random numbers.
Note that you cannot make the sequence "more random" by calling rand.randomize multiple times. In fact, since rand.randomize generates a new seed based on the date and time, calling rand.randomize multiple times in your program will actually generate a less random sequence (since time is an ever increasing value, not a random value). So make at most one call to rand.randomize and leave it up to the random number generators to take it from there.
Note that rand.randomize will randomize both the rand.random and rand.uniform random number generators. You do not need separate calls for the two different generators nor can you randomize one without randomizing the other.
One attribute of a random number generator is "how uniform are the results the generator returns." A uniform random number generator2 that produces a 32-bit result returns a sequence of values that are evenly distributed throughout the 32-bit range of values. That is, any return result is as equally likely as any other return result. Good random number generators don't tend to bunch numbers up in groups.
The following program code provides a simple test of the random number generators by plotting asterisks at random positions on the screen. This program works by choosing two random numbers, one between zero and 79, the other between zero and 23. Then the program uses the console.puts function to print a single asterisk at the (X,Y) coordinate on the screen specified by these two random numbers (therefore, this code runs only under Windows). After 10,000 iterations of this process the program stops and lets you observe the result. Note: since random number generators generate random numbers, you should not expect this program to fill the entire screen with asterisks in only 10,000 iterations.
program testRandom; #include( "stdlib.hhf" ); begin testRandom; console.cls(); mov( 10_000, ecx ); repeat // Generate a random X-coordinate // using rand.range. rand.range( 0, 79 ); mov( eax, ebx ); // Save the X-coordinate for now. // Generate a random Y-coordinate // using rand.urange. rand.urange( 0, 23 ); // Print an asterisk at // the specified coordinate on the screen. console.puts( ax, bx, "*" ); // Repeat this 10,000 times to get // a good distribution of values. dec( ecx ); until( @z ); // Position the cursor at the bottom of the // screen so we can observe the results. console.gotoxy( 24, 0 ); end testRandom; Program 10.1 Screen Plot Test of the HLA Random Number GeneratorsThe rand.hhf module also provides an iterator that generates a random sequence of value in the range 0..n-1. However, a discussion of this function must wait until we cover iterators in a later chapter.
10.7 Putting It All Together
This chapter finished the presentation of the integer arithmetic instructions on the 80x86. Then it demonstrated how to convert expressions from a high level language syntax into assembly language. This chapter concluded by teaching you a few assembly language tricks you will commonly find in programs. By the conclusion of this chapter you are (hopefully) in a position where you can easily evaluate arithmetic expressions in your assembly language programs.
1Actually, this is specific to a given processor. Some processors execute the INTMUL instruction fairly fast.
2Despite their names, both rand.uniform and rand.random generate a uniformly distributed set of pseudo-random numbers.
|