[Chapter Sixteen][Previous] [Next] [Art of Assembly][Randall Hyde]

Art of Assembly: Chapter Sixteen


16.8 - Some Sample Pattern Matching Applications
16.8.1 - Converting Written Numbers to Integers

16.8 Some Sample Pattern Matching Applications


The best way to learn how to convert a pattern matching problem to the respective pattern matching algorithms is by example. The following sections provide several examples of some small pattern matching problems and their solutions.


16.8.1 Converting Written Numbers to Integers


One interesting pattern matching problem is to convert written (English) numbers to their integer equivalents. For example, take the string "one hundred ninety-two" and convert it to the integer 192. Although written numbers represent a pattern quite a bit more complex than the ones we've seen thus far, a little study will show that it is easy to decompose such strings.

The first thing we will need to do is enumerate the English words we will need to process written numbers. This includes the following words:

zero, one, two, three, four, five, six, seven, eight, nine, ten, eleven twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, thirty, forty, fifty sixty, seventy, eighty, ninety, hundred, and thousand.

With this set of words we can build all the values between zero and 65,535 (the values we can represent in a 16 bit integer.

Next, we've got to decide how to put these words together to form all the values between zero and 65,535. The first thing to note is that zero only occurs by itself, it is never part of another number. So our first production takes the form:




Number   zero | NonZero

The next thing to note is that certain values may occur in pairs, denoting addition. For example, eighty-five denotes the sum of eighty plus five. Also note that certain other pairs denote multiplication. If you have a statement like "two hundred" or "fifteen hundred" the "hundred" word says multiply the preceding value by 100. The multiplicative words, "hundred" and "thousand" , are also additive. Any value following these terms is added in to the total; e.g., "one hundred five" means 1*100+5. By combining the appropriate rules, we obtain the following grammar




NonZero 		Thousands Maybe100s | Hundreds

Thousands 		Under100 thousand

Maybe100s 		Hundreds | 

Hundreds 		Under100 hundred After100 | Under100

After100 		Under100 | 

Under100 		Tens Maybe1s| Teens | ones

Maybe1s  		Ones | 

ones 	one | two | three | four | five | six | seven | eight | nine

teens 	ten | eleven | twelve | thirteen | fourteen | fifteen | sixteen |

	seventeen | eighteen | nineteen

tens 	twenty | thirty | forty | fifty | sixty | seventy | eighty | ninety

The final step is to add semantic actions to actually convert the strings matched by this grammar to integer values. The basic idea is to initialize an accumulator value to zero. Whenever you encounter one of the strings that ones, teens, or tens matches, you add the corresponding value to the accumulator. If you encounter the hundred or thousand strings, you multiply the accumulator by the appropriate factor. The complete program to do the conversion follows:





; Numbers.asm
;
; This program converts written English numbers in the range "zero"
; to "sixty five thousand five hundred thirty five" to the corresponding
; integer value.

                .xlist
                include         stdlib.a
                includelib      stdlib.lib
                matchfuncs
                .list


dseg            segment para public 'data'

Value           word    0                       ;Store results here.
HundredsVal     word    0
ThousandsVal    word    0


Str0            byte    "twenty one",0
Str1            byte    "nineteen hundred thirty-five",0
Str2            byte    "thirty three thousand two hundred nineteen",0
Str3            byte    "three",0
Str4            byte    "fourteen",0
Str5            byte    "fifty two",0
Str6            byte    "seven hundred",0
Str7            byte    "two thousand seven",0
Str8            byte    "four thousand ninety six",0
Str9            byte    "five hundred twelve",0
Str10           byte    "twenty three thousand two hundred ninety-five",0
Str11           byte    "seventy-five hundred",0
Str12           byte    "sixty-five thousand",0
Str13           byte    "one thousand",0


; The following grammar is what we use to process the numbers.
; Semantic actions appear in the braces.
;
; Note: begin by initializing Value, HundredsVal, and ThousandsVal to zero.
;
; N             -> separators zero
;               | N4
;
; N4            -> do1000s maybe100s
;               | do100s
;
; Maybe100s     -> do100s
;               | <empty string>
;
; do1000s       -> Under100 "THOUSAND" separators
;                               {ThousandsVal := Value*1000}
;
; do100s        -> Under100 "HUNDRED"
;                       {HundredsVal := Value*100} After100
;               | Under100
;
; After100      -> {Value := 0} Under100
;               | {Value := 0} <empty string>
;
; Under100              -> {Value := 0} try20 try1s
;               | {Value := 0} doTeens
;               | {Value := 0} do1s
;
; try1s         -> do1s | <empty string>
;
; try20         -> "TWENTY" {Value := Value + 20}
;               | "THIRTY" {Value := Value + 30}
;               | ...
;               | "NINETY" {Value := Value + 90}
;
; doTeens               -> "TEN"        {Value := Value + 10}
;               | "ELEVEN"      {Value := Value + 11}
;               | ...
;               | "NINETEEN"    {Value := Value + 19}
;
; do1s          -> "ONE"        {Value := Value + 1}
;               | "TWO" {Value := Value + 2}
;               | ...
;               | "NINE"        {Value := Value + 9}


separators      pattern {anycset, delimiters, 0, delim2}
delim2          pattern {spancset, delimiters}
doSuccess       pattern {succeed}
AtLast          pattern {sl_match2, separators, AtEOS, AtEOS}
AtEOS           pattern {EOS}


N               pattern {sl_match2, separators, N2, N2}
N2              pattern {matchistr, zero, N3, AtLast}
zero            byte    "ZERO",0

N3              pattern {sl_match2, N4, 0, AtLast}
N4              pattern {sl_match2, do1000s, do100s, Maybe100s}
Maybe100s       pattern {sl_match2, do100s, AtLast, AtLast}

do1000s         pattern {sl_match2, Under100, 0, do1000s2}
do1000s2        pattern {matchistr, str1000, 0, do1000s3}
do1000s3        pattern {sl_match2, separators, do1000s4, do1000s5}
do1000s4        pattern {EOS, 0, 0, do1000s5}
do1000s5        pattern {Get1000s}
str1000         byte    "THOUSAND",0

do100s          pattern {sl_match2, do100s1, Under100, After100}
do100s1         pattern {sl_match2, Under100, 0, do100s2}
do100s2         pattern {matchistr, str100, 0, do100s3}
do100s3         pattern {sl_match2, separators, do100s4, do100s5}
do100s4         pattern {EOS, 0, 0, do100s5}
do100s5         pattern {Get100s}
str100          byte    "HUNDRED",0

After100        pattern {SetVal, 0, 0, After100a}
After100a       pattern {sl_match2, Under100, doSuccess}

Under100        pattern {SetVal, 0, 0, Under100a}
Under100a       pattern {sl_match2, try20, Under100b, Do1orE}
Under100b       pattern {sl_match2, doTeens, do1s}

Do1orE          pattern {sl_match2, do1s, doSuccess, 0}



NumPat          macro   lbl, next, Constant, string
                local   try, SkipSpcs, val, str, tryEOS
lbl             pattern {sl_match2, try, next}
try             pattern {matchistr, str, 0, SkipSpcs}
SkipSpcs                pattern {sl_match2, separators, tryEOS, val}
tryEOS          pattern {EOS, 0, 0, val}
val             pattern {AddVal, Constant}
str             byte    string
                byte    0
                endm

                NumPat  doTeens, try11, 10, "TEN"
                NumPat  try11, try12, 11, "ELEVEN"
                NumPat  try12, try13, 12, "TWELVE"
                NumPat  try13, try14, 13, "THIRTEEN"
                NumPat  try14, try15, 14, "FOURTEEN"
                NumPat  try15, try16, 15, "FIFTEEN"
                NumPat  try16, try17, 16, "SIXTEEN"
                NumPat  try17, try18, 17, "SEVENTEEN"
                NumPat  try18, try19, 18, "EIGHTEEN"
                NumPat  try19, 0, 19, "NINETEEN"

                NumPat  do1s, try2, 1, "ONE"
                NumPat  try2, try3, 2, "TWO"
                NumPat  try3, try4, 3, "THREE"
                NumPat  try4, try5, 4, "FOUR"
                NumPat  try5, try6, 5, "FIVE"
                NumPat  try6, try7, 6, "SIX"
                NumPat  try7, try8, 7, "SEVEN"
                NumPat  try8, try9, 8, "EIGHT"
                NumPat  try9, 0, 9, "NINE"

                NumPat  try20, try30, 20, "TWENTY"
                NumPat  try30, try40, 30, "THIRTY"
                NumPat  try40, try50, 40, "FORTY"
                NumPat  try50, try60, 50, "FIFTY"
                NumPat  try60, try70, 60, "SIXTY"
                NumPat  try70, try80, 70, "SEVENTY"
                NumPat  try80, try90, 80, "EIGHTY"
                NumPat  try90, 0, 90, "NINETY"

                include stdsets.a

dseg            ends



cseg            segment para public 'code'
                assume  cs:cseg, ds:dseg


; Semantic actions for our grammar:
;
;
;
; Get1000s-     We've just processed the value one..nine, grab it from
;               the value variable, multiply it by 1000, and store it
;               into thousandsval.

Get1000s        proc    far
                push    ds
                push    dx
                mov     ax, dseg
                mov     ds, ax

                mov     ax, 1000
                mul     Value
                mov     ThousandsVal, ax
                mov     Value, 0

                pop     dx
                mov     ax, di                  ;Required by sl_match.
                pop     ds
                stc                             ;Always return success.
                ret
Get1000s        endp


; Get100s-      We've just processed the value one..nine, grab it from
;               the value variable, multiply it by 100, and store it
;               into hundredsval.

Get100s         proc    far
                push    ds
                push    dx
                mov     ax, dseg
                mov     ds, ax

                mov     ax, 100
                mul     Value
                mov     HundredsVal, ax
                mov     Value, 0

                pop     dx
                mov     ax, di                  ;Required by sl_match.
                pop     ds
                stc                             ;Always return success.
                ret
Get100s         endp


; SetVal-       This routine sets Value to whatever is in si

SetVal          proc    far
                push    ds
                mov     ax, dseg
                mov     ds, ax
                mov     Value, si
                mov     ax, di
                pop     ds
                stc
                ret
SetVal          endp

; AddVal-       This routine sets adds whatever is in si to Value

AddVal          proc    far
                push    ds
                mov     ax, dseg
                mov     ds, ax
                add     Value, si
                mov     ax, di
                pop     ds
                stc
                ret
AddVal          endp


; Succeed matches the empty string. In other words, it matches anything
; and always returns success without eating any characters from the input
; string.

Succeed         proc    far
                mov     ax, di
                stc
                ret
Succeed         endp


; This subroutine expects a pointer to a string containing the English
; version of an integer number. It converts this to an integer and
; prints the result.

ConvertNumber   proc    near
                mov     value, 0
                mov     HundredsVal, 0
                mov     ThousandsVal, 0

                ldxi    N
                xor     cx, cx
                match
                jnc     NoMatch
                mov     al, "'"
                putc
                puts
                print
                byte    "' = ", 0
                mov     ax, ThousandsVal
                add     ax, HundredsVal
                add     ax, Value
                putu
                putcr
                jmp     Done

NoMatch:        print
                byte    "Illegal number",cr,lf,0

Done:           ret
ConvertNumber   endp



Main            proc
                mov     ax, dseg
                mov     ds, ax
                mov     es, ax

                meminit                         ;Init memory manager.

; Union in a "-" to the delimiters set because numbers can have
; dashes in them.

                lesi    delimiters
                mov     al, '-'
                addchar

; Some calls to test the ConvertNumber routine and the conversion process.

                lesi    Str0
                call    ConvertNumber
                lesi    Str1
                call    ConvertNumber
                lesi    Str2
                call    ConvertNumber
                lesi    Str3
                call    ConvertNumber
                lesi    Str4
                call    ConvertNumber
                lesi    Str5
                call    ConvertNumber
                lesi    Str6
                call    ConvertNumber
                lesi    Str7
                call    ConvertNumber
                lesi    Str8
                call    ConvertNumber
                lesi    Str9
                call    ConvertNumber
                lesi    Str10
                call    ConvertNumber
                lesi    Str11
                call    ConvertNumber
                lesi    Str12
                call    ConvertNumber
                lesi    Str13
                call    ConvertNumber


Quit:           ExitPgm
Main            endp

cseg            ends

sseg            segment para stack 'stack'
stk             db      1024 dup ("stack ")
sseg            ends

zzzzzzseg       segment para public 'zzzzzz'
LastBytes       db      16 dup (?)
zzzzzzseg       ends
                end     Main

Sample output:





'twenty one' = 21
'nineteen hundred thirty-five' = 1935
'thirty three thousand two hundred nineteen' = 33219
'three' = 3
'fourteen' = 14
'fifty two' = 52
'seven hundred' = 700
'two thousand seven' = 2007
'four thousand ninety six' = 4096
'five hundred twelve' = 512
'twenty three thousand two hundred ninety-five' = 23295
'seventy-five hundred' = 7500
'sixty-five thousand' = 65000
'one thousand' = 1000
16.8 - Some Sample Pattern Matching Applications
16.8.1 - Converting Written Numbers to Integers


Art of Assembly: Chapter Sixteen - 29 SEP 1996

[Chapter Sixteen][Previous] [Next] [Art of Assembly][Randall Hyde]



Number of Web Site Hits since Jan 1, 2000: