HelloStr byte 5,"HELLO"
Length-prefixed strings are often called Pascal strings since this is the type of string variable supported by most versions of Pascal[6].
Another popular way to specify string lengths is to use zero-terminated strings. A zero-terminated string consists of a string of characters terminated with a zero byte. These types of strings are often called C-strings since they are the type used by the C/C++ programming language. The UCR Standard Library, since it mimics the C standard library, also uses zero-terminated strings.
Pascal strings are much better than C/C++ strings for several reasons. First, computing the length of a Pascal string is trivial. You need only fetch the first byte (or word) of the string and you've got the length of the string. Computing the length of a C/C++ string is considerably less efficient. You must scan the entire string (e.g., using the scasb
instruction) for a zero byte. If the C/C++ string is long, this can take a long time. Furthermore, C/C++ strings cannot contain the NULL character. On the other hand, C/C++ strings can be any length, yet require only a single extra byte of overhead. Pascal strings, however, can be no longer than 255 characters when using only a single length byte. For strings longer than 255 bytes, you'll need two bytes to hold the length for a Pascal string. Since most strings are less than 256 characters in length, this isn't much of a disadvantage.
An advantage of zero-terminated strings is that they are easy to use in an assembly language program. This is particularly true of strings that are so long they require multiple source code lines in your assembly language programs. Counting up every character in a string is so tedious that it's not even worth considering. However, you can write a macro which will easily build Pascal strings for you:
PString macro String local StringLength, StringStart byte StringLength StringStart byte String StringLength = $-StringStart endm . . . PString "This string has a length prefix"
As long as the string fits entirely on one source line, you can use this macro to generate Pascal style strings.
Common string functions like concatenation, length, substring, index, and others are much easier to write when using length-prefixed strings. So we'll use Pascal strings unless otherwise noted. Furthermore, the UCR Standard library provides a large number of C/C++ string functions, so there is no need to replicate those functions here.
movsb
instruction. For example, if you want to assign the length-prefixed string String1
to String2
, use the following:; Presumably, ES and DS are set up already lea si, String1 lea di, String2 mov ch, 0 ;Extend len to 16 bits. mov cl, String1 ;Get string length. inc cx ;Include length byte. rep movsb
This code increments cx
by one before executing movsb
because the length byte contains the length of the string exclusive of the length byte itself.
Generally, string variables can be initialized to constants by using the PString
macro described earlier. However, if you need to set a string variable to some constant value, you can write a StrAssign
subroutine which assigns the string immediately following the call
. The following procedure does exactly that:
include stdlib.a includelib stdlib.lib cseg segment para public 'code' assume cs:cseg, ds:dseg, es:dseg, ss:sseg ; String assignment procedure MainPgm proc far mov ax, seg dseg mov ds, ax mov es, ax lea di, ToString call StrAssign byte "This is an example of how the " byte "StrAssign routine is used",0 nop ExitPgm MainPgm endp StrAssign proc near push bp mov bp, sp pushf push ds push si push di push cx push ax push di ;Save again for use later. push es cld ; Get the address of the source string mov ax, cs mov es, ax mov di, 2[bp] ;Get return address. mov cx, 0ffffh ;Scan for as long as it takes. mov al, 0 ;Scan for a zero. repne scasb ;Compute the length of string. neg cx ;Convert length to a positive #. dec cx ;Because we started with -1, not 0. dec cx ;skip zero terminating byte. ; Now copy the strings pop es ;Get destination segment. pop di ;Get destination address. mov al, cl ;Store length byte. stosb ; Now copy the source string. mov ax, cs mov ds, ax mov si, 2[bp] rep movsb ; Update the return address and leave: inc si ;Skip over zero byte. mov 2[bp], si pop ax pop cx pop di pop si pop ds popf pop bp ret StrAssign endp cseg ends dseg segment para public 'data' ToString byte 255 dup (0) dseg ends sseg segment para stack 'stack' word 256 dup (?) sseg ends end MainPgm
This code uses the scas
instruction to determine the length of the string immediately following the call
instruction. Once the code determines the length, it stores this length into the first byte of the destination string and then copies the text following the call
to the string variable. After copying the string, this code adjusts the return address so that it points just beyond the zero terminating byte. Then the procedure returns control to the caller.
Of course, this string assignment procedure isn't very efficient, but it's very easy to use. Setting up es:di
is all that you need to do to use this procedure. If you need fast string assignment, simply use the movs
instruction as follows:
; Presumably, DS and ES have already been set up. lea si, SourceString lea di, DestString mov cx, LengthSource rep movsb . . . SourceString byte LengthSource-1 byte "This is an example of how the " byte "StrAssign routine is used" LengthSource = $-SourceString DestString byte 256 dup (?)
Using in-line instructions requires considerably more setup (and typing!), but it is much faster than the StrAssign
procedure. If you don't like the typing, you can always write a macro to do the string assignment for you.
cmps
instruction. Other than providing some concrete examples, there is no reason to consider this subject any further.es
and ds
are pointing at the proper segments containing the destination and source strings.Str1
to Str2
:lea si, Str1 lea di, Str2 ; Get the minimum length of the two strings. mov al, Str1 mov cl, al cmp al, Str2 jb CmpStrs mov cl, Str2 ; Compare the two strings. CmpStrs: mov ch, 0 cld repe cmpsb jne StrsNotEqual ; If CMPS thinks they're equal, compare their lengths ; just to be sure. cmp al, Str2 StrsNotEqual:
At label StrsNotEqual
, the flags will contain all the pertinent information about the ranking of these two strings. You can use the conditional jump instructions to test the result of this comparison.