set
macro. This macro takes the form:set SetName1, SetName2, ..., SetName8
SetName1..SetName8
represent the names of up to eight set variables. You may have fewer than eight names in the operand field, but doing so will waste some bits in the set array.
The CreateSets
routine provides another mechanism for creating set variables. Unlike the set macro, which you would use to create set variables in your data segment, the CreateSets
routine allocates storage for up to eight sets dynamically at run time. It returns a pointer to the first set variable in es:di
. The remaining seven sets follow at locations es:di+1
, es:di+2
, ..., es:di+7
. A typical program that allocates set variables dynamically might use the following code:
Set0 dword ? Set1 dword ? Set2 dword ? Set3 dword ? Set4 dword ? Set5 dword ? Set6 dword ? Set7 dword ? . . . CreateSets mov word ptr Set0+2, es mov word ptr Set1+2, es mov word ptr Set2+2, es mov word ptr Set3+2, es mov word ptr Set4+2, es mov word ptr Set5+2, es mov word ptr Set6+2, es mov word ptr Set7+2, es mov word ptr Set0, di inc di mov word ptr Set1, di inc di mov word ptr Set2, di inc di mov word ptr Set3, di inc di mov word ptr Set4, di inc di mov word ptr Set5, di inc di mov word ptr Set6, di inc di mov word ptr Set7, di inc di
This code segment creates eight different sets on the heap, all empty, and stores pointers to them in the appropriate pointer variables.
The SHELL.ASM file provides a commented-out line of code in the data segment that includes the file STDSETS.A. This include file provides the bit definitions for eight commonly used character sets. They are alpha
(upper and lower case alphabetics), lower
(lower case alphabetics), upper
(upper case alphabetics), digits
("0".."9"), xdigits
("0".."9", "A".."F", and "a".."f"), alphanum
(upper and lower case alphabetics plus the digits), whitespace
(space, tab, carriage return, and line feed), and delimiters
(whitespace plus commas, semicolons, less than, greater than, and vertical bar). If you would like to use these standard character sets in your program, you need to remove the semicolon from the beginning of the include
statement in the SHELL.ASM file.
The UCR Standard Library provides 16 character set routines: CreateSets
, EmptySet
, RangeSet
, AddStr
, AddStrl
, RmvStr
, RmvStrl
, AddChar
, RmvChar
, Member
, CopySet
, SetUnion
, SetIntersect
, SetDifference
, NextItem
, and RmvItem
. All of these routines except CreateSets
require a pointer to a character set variable in the es:di
registers. Specific routines may require other parameters as well.
The EmptySet
routine clears all the bits in a set producing the empty set. This routine requires the address of the set variable in the es:di
. The following example clears the set pointed at by Set1
:
les di, Set1 EmptySet
RangeSet
unions in a range of values into the set variable pointed at by es:di
. The al
register contains the lower bound of the range of items, ah
contains the upper bound. Note that al
must be less than or equal to ah
. The following example constructs the set of all control characters (ASCII codes one through 31, the null character [ASCII code zero] is not allowed in sets):
les di, CtrlCharSet ;Ptr to ctrl char set. mov al, 1 mov ah, 31 RangeSet
AddStr
and AddStrl
add all the characters in a zero terminated string to a character set. For AddStr
, the dx:si
register pair points at the zero terminated string. For AddStrl
, the zero terminated string follows the call to AddStrl
in the code stream. These routines union each character of the specified string into the set. The following examples add the digits and some special characters into the FPDigits
set:
Digits byte "0123456789",0 set FPDigitsSet FPDigits dword FPDigitsSet . . . ldxi Digits ;Loads DX:SI with adrs of Digits. les di, FPDigits AddStr . . . les di, FPDigits AddStrL byte "Ee.+-",0
RmvStr
and RmvStrl
remove characters from a set. You supply the characters in a zero terminated string. For RmvStr
, dx:si
points at the string of characters to remove from the string. For RmvStrl
, the zero terminated string follows the call. The following example uses RmvStrl to remove the special symbols from FPDigits above:
les di, FPDigits RmvStrl byte "Ee.+-",0
The AddChar
and RmvChar
routines let you add or remove individual characters. As usual, es:di
points at the set; the al
register contains the character you wish to add to the set or remove from the set. The following example adds a space to the set FPDigits and removes the "," character (if present):
les di, FPDigits mov al, ' ' AddChar . . . les di, FPDigits mov al, ',' RmvChar
The Member
function checks to see if a character is in a set. On entry, es:di
must point at the set and al
must contain the character to check. On exit, the zero flag is set if the character is a member of the set, the zero flag will be clear if the character is not in the set. The following example reads characters from the keyboard until the user presses a key that is not a whitespace character:
SkipWS: get ;Read char from user into AL. lesi WhiteSpace ;Address of WS set into es:di. member je SkipWS
The CopySet
, SetUnion
, SetIntersect
, and SetDifference
routines all operate on two sets of characters. The es:di
register points at the destination character set, the dx:si
register pair points at a source character set. CopySet
copies the bits from the source set to the destination set, replacing the original bits in the destination set. SetUnion
computes the union of the two sets and stores the result into the destination set. SetIntersect
computes the set intersection and stores the result into the destination set. Finally, the SetDifference
routine computes DestSet := DestSet - SrcSet.
The NextItem
and RmvItem
routines let you extract elements from a set. NextItem returns in al
the ASCII code of the first character it finds in a set. RmvItem
does the same thing except it also removes the character from the set. These routines return zero in al
if the set is empty (StdLib sets cannot contain the NULL character). You can use the RmvItem
routine to build a rudimentary iterator for a character set.
The UCR Standard Library's character set routines are very powerful. With them, you can easily manipulate character string data, especially when searching for different patterns within a string. We will consider this routines again when we study pattern matching later in this text.
cmps
instruction is useful for comparing (very) large integer values. Unlike character strings, we cannot compare integers with cmps
from the L.O. byte through the H.O. byte. Instead, we must compare them from the H.O. byte down to the L.O. byte. The following code compares two 12-byte integers:lea di, integer1+10 lea si, integer2+10 mov cx, 6 std repe cmpsw
After the execution of the cmpsw
instruction, the flags will contain the result of the comparison.
You can easily assign one long integer string to another using the movs
instruction. Nothing tricky here, just load up the si, di,
and cx
registers and have at it. You must do other operations, including arithmetic and logical operations, using the extended precision methods described in the chapter on arithmetic operations.
movs
and cmps
instructions for these operations.lods
and stos
instructions. The following code shows how you can easily add the value 20 to each element of the integer array A:lea si, A mov di, si mov cx, SizeOfA cld AddLoop: lodsw add ax, 20 stosw loop AddLoop
You can implement other operations in a similar fashion.