es:di
- Points at the next character in the input string. You should not look at any characters before this address. Furthermore, you should never look beyond the end of the string (see cx
below).ds:si
- Contains the four byte parameter found in the matchparm
field.cx
- Contains the last position, plus one, in the input string you're allowed to look at. Note that your pattern matching routine should not look beyond location es:cx
or the zero terminating byte; whichever comes first in the input string.ax
must contain the offset into the string (di
's value) of the last character matched plus one, if your matching function is successful. It must also set the carry flag to denote success. After your pattern matches, the match routine might call another matching function (the one specified by the next pattern field) and that function begins matching at location es:ax
.di
value in the ax
register and return with the carry flag clear. Note that your matching function must preserve all other registers.ds
does not point at your data segment, it contains the H.O. word of the matchparm
parameter. Therefore, if you are going to access global variables in your data segment you will need to push ds
, load it with the address of dseg
, and pop ds
before leaving. Several examples throughout this chapter demonstrate how to do this.matchtoistr
, matchichar
, and matchtoichar
pattern functions. The following example code demonstrates how to add a matchtoistr
(match up to a string, doing a case insensitive comparison) routine..xlist include stdlib.a includelib stdlib.lib matchfuncs .list dseg segment para public 'data' TestString byte "This is the string 'xyz' in it",cr,lf,0 TestPat pattern {matchtoistr,xyz} xyz byte "XYZ",0 dseg ends cseg segment para public 'code' assume cs:cseg, ds:dseg ; MatchToiStr- Matches all characters in a string up to, and including, the ; specified parameter string. The parameter string must be ; all upper case characters. This guy matches string using ; a case insensitive comparison. ; ; inputs: ; es:di- Source string ; ds:si- String to match ; cx- Maximum match position ; ; outputs: ; ax- Points at first character beyond the end of the ; matched string if success, contains the initial DI ; value if failure occurs. ; carry- 0 if failure, 1 if success. MatchToiStr proc far pushf push di push si cld ; Check to see if we're already past the point were we're allowed ; to scan in the input string. cmp di, cx jae MTiSFailure ; If the pattern string is the empty string, always match. cmp byte ptr ds:[si], 0 je MTSsuccess ; The following loop scans through the input string looking for ; the first character in the pattern string. ScanLoop: push si lodsb ;Get first char of string dec di FindFirst: inc di ;Move on to next (or 1st) char. cmp di, cx ;If at cx, then we've got to jae CantFind1st ; fail. mov ah, es:[di] ;Get input character. cmp ah, 'a' ;Convert input character to jb DoCmp ; upper case if it's a lower cmp ah, 'z' ; case character. ja DoCmp and ah, 5fh DoCmp: cmp al, ah ;Compare input character against jne FindFirst ; pattern string. ; At this point, we've located the first character in the input string ; that matches the first character of the pattern string. See if the ; strings are equal. push di ;Save restart point. CmpLoop: cmp di, cx ;See if we've gone beyond the jae StrNotThere ; last position allowable. lodsb ;Get next input character. cmp al, 0 ;At the end of the parameter je MTSsuccess2 ; string? If so, succeed. inc di mov ah, es:[di] ;Get the next input character. cmp ah, 'a' ;Convert input character to jb DoCmp2 ; upper case if it's a lower cmp ah, 'z' ; case character. ja DoCmp2 and ah, 5fh DoCmp2: cmp al, ah ;Compare input character against je CmpLoop pop di pop si jmp ScanLoop StrNotThere: add sp, 2 ;Remove di from stack. CantFind1st: add sp, 2 ;Remove si from stack. MTiSFailure: pop si pop di mov ax, di ;Return failure position in AX. popf clc ;Return failure. ret MTSSuccess2: add sp, 2 ;Remove DI value from stack. MTSSuccess: add sp, 2 ;Remove SI value from stack. mov ax, di ;Return next position in AX. pop si pop di popf stc ;Return success. ret MatchToiStr endp Main proc mov ax, dseg mov ds, ax mov es, ax meminit lesi TestString ldxi TestPat xor cx, cx match jnc NoMatch print byte "Matched",cr,lf,0 jmp Quit NoMatch: print byte "Did not match",cr,lf,0 Quit: ExitPgm Main endp cseg ends sseg segment para stack 'stack' stk db 1024 dup ("stack ") sseg ends zzzzzzseg segment para public 'zzzzzz' LastBytes db 16 dup (?) zzzzzzseg ends end Main
(buy | sell) [0-9]+ shares of (ibm | apple | hp | dec)
While it is easy to devise a Standard Library pattern that recognizes strings of this form, calling the match
routine would only tell you that you have a legal buy or sell command. It does not tell you if you are to buy or sell, who to buy or sell, or how many shares to buy or sell. Of course, you could take the cross product of (buy | sell) with (ibm | apple | hp | dec) and generate eight different regular expressions that uniquely determine whether you're buying or selling and whose stock you're trading, but you can't process the integer values this way (unless you willing to have millions of regular expressions). A better solution would be to extract substrings from the legal pattern and process these substrings after you verify that you have a legal buy or sell command. For example, you could extract buy or sell into one string, the digits into another, and the company name into a third. After verifying the syntax of the command, you could process the individual strings you've extracted. The UCR Standard Library patgrab
routine provides this capability for you.
You normally call patgrab
after calling match
and verifying that it matches the input string. Patgrab
expects a single parameter - a pointer to a pattern recently processed by match. Patgrab
creates a string on the heap consisting of the characters matched by the given pattern and returns a pointer to this string in es:di
. Note that patgrab
only returns a string associated with a single pattern data structure, not a chain of pattern data structures. Consider the following pattern:
PatToGrab pattern {matchstr, str1, 0, Pat2} Pat2 pattern {matchstr, str2} str1 byte "Hello",0 str2 byte " there",0
Calling match
on PatToGrab
will match the string "Hello there". However, if after calling match
you call patgrab
and pass it the address of PatToGrab
, patgrab
will return a pointer to the string "Hello".
Of course, you might want to collect a string that is the concatenation of several strings matched within your pattern (i.e., a portion of the pattern list). This is where calling the sl_match2
pattern matching function comes in handy. Consider the following pattern:
Numbers pattern {sl_match2, FirstNumber} FirstNumber pattern {anycset, digits, 0, OtherDigs} OtherDigs pattern {spancset, digits}
This pattern matches the same strings as
Numbers pattern {anycset, digits, 0, OtherDigs} OtherDigs pattern {spancset, digits}
So why bother with the extra pattern that calls sl_match2
? Well, as it turns out the sl_match2
matching function lets you create parenthetical patterns. A parenthetical pattern is a pattern list that the pattern matching routines (especially patgrab
) treat as a single pattern. Although the match
routine will match the same strings regardless of which version of Numbers
you use, patgrab
will produce two entirely different strings depending upon your choice of the above patterns. If you use the latter version, patgrab
will only return the first digit of the number. If you use the former version (with the call to sl_match2
), then patgrab
returns the entire string matched by sl_match2
, and that turns out to be the entire string of digits.
The following sample program demonstrates how to use parenthetical patterns to extract the pertinent information from the stock command presented earlier. It uses parenthetical patterns for the buy/sell command, the number of shares, and the company name.
.xlist include stdlib.a includelib stdlib.lib matchfuncs .list dseg segment para public 'data' ; Variables used to hold the number of shares bought/sold, a pointer to ; a string containing the buy/sell command, and a pointer to a string ; containing the company name. Count word 0 CmdPtr dword ? CompPtr dword ? ; Some test strings to try out: Cmd1 byte "Buy 25 shares of apple stock",0 Cmd2 byte "Sell 50 shares of hp stock",0 Cmd3 byte "Buy 123 shares of dec stock",0 Cmd4 byte "Sell 15 shares of ibm stock",0 BadCmd0 byte "This is not a buy/sell command",0 ; Patterns for the stock buy/sell command: ; ; StkCmd matches buy or sell and creates a parenthetical pattern ; that contains the string "buy" or "sell". StkCmd pattern {sl_match2, buyPat, 0, skipspcs1} buyPat pattern {matchistr,buystr,sellpat} buystr byte "BUY",0 sellpat pattern {matchistr,sellstr} sellstr byte "SELL",0 ; Skip zero or more white space characters after the buy command. skipspcs1 pattern {spancset, whitespace, 0, CountPat} ; CountPat is a parenthetical pattern that matches one or more ; digits. CountPat pattern {sl_match2, Numbers, 0, skipspcs2} Numbers pattern {anycset, digits, 0, RestOfNum} RestOfNum pattern {spancset, digits} ; The following patterns match " shares of " allowing any amount ; of white space between the words. skipspcs2 pattern {spancset, whitespace, 0, sharesPat} sharesPat pattern {matchistr, sharesStr, 0, skipspcs3} sharesStr byte "SHARES",0 skipspcs3 pattern {spancset, whitespace, 0, ofPat} ofPat pattern {matchistr, ofStr, 0, skipspcs4} ofStr byte "OF",0 skipspcs4 pattern {spancset, whitespace, 0, CompanyPat} ; The following parenthetical pattern matches a company name. ; The patgrab-available string will contain the corporate name. CompanyPat pattern {sl_match2, ibmpat} ibmpat pattern {matchistr, ibm, applePat} ibm byte "IBM",0 applePat pattern {matchistr, apple, hpPat} apple byte "APPLE",0 hpPat pattern {matchistr, hp, decPat} hp byte "HP",0 decPat pattern {matchistr, decstr} decstr byte "DEC",0 include stdsets.a dseg ends cseg segment para public 'code' assume cs:cseg, ds:dseg ; DoBuySell- This routine processes a stock buy/sell command. ; After matching the command, it grabs the components ; of the command and outputs them as appropriate. ; This routine demonstrates how to use patgrab to ; extract substrings from a pattern string. ; ; On entry, es:di must point at the buy/sell command ; you want to process. DoBuySell proc near ldxi StkCmd xor cx, cx match jnc NoMatch lesi StkCmd patgrab mov word ptr CmdPtr, di mov word ptr CmdPtr+2, es lesi CountPat patgrab atoi ;Convert digits to integer mov Count, ax free ;Return storage to heap. lesi CompanyPat patgrab mov word ptr CompPtr, di mov word ptr CompPtr+2, es printf byte "Stock command: %^s\n" byte "Number of shares: %d\n" byte "Company to trade: %^s\n\n",0 dword CmdPtr, Count, CompPtr les di, CmdPtr free les di, CompPtr free ret NoMatch: print byte "Illegal buy/sell command",cr,lf,0 ret DoBuySell endp Main proc mov ax, dseg mov ds, ax mov es, ax meminit lesi Cmd1 call DoBuySell lesi Cmd2 call DoBuySell lesi Cmd3 call DoBuySell lesi Cmd4 call DoBuySell lesi BadCmd0 call DoBuySell Quit: ExitPgm Main endp cseg ends sseg segment para stack 'stack' stk db 1024 dup ("stack ") sseg ends zzzzzzseg segment para public 'zzzzzz' LastBytes db 16 dup (?) zzzzzzseg ends end Main Sample program output: Stock command: Buy Number of shares: 25 Company to trade: apple Stock command: Sell Number of shares: 50 Company to trade: hp Stock command: Buy Number of shares: 123 Company to trade: dec Stock command: Sell Number of shares: 15 Company to trade: ibm Illegal buy/sell command