segment
directive is usually the class type. The class type specifies the ordering of segments that do not have the same segment name. This operand consists of a symbol enclosed by apostrophes (quotation marks are not allowed here). Generally, you should use the following names: CODE (for segments containing program code); DATA (for segments containing variables, constant data, and tables); CONST (for segments containing constant data and tables); and STACK (for a stack segment). The following program section illustrates their use:
CSEG segment public 'CODE' mov ax, bx ret CSEG ends DSEG segment public 'DATA' Item1 byte 0 Item2 word 0 DSEG ends CSEG segment public 'CODE' mov ax, 10 add AX, Item1 ret CSEG ends SSEG segment stack 'STACK' STK word 4000 dup (?) SSEG ends C2SEG segment public 'CODE' ret C2SEG ends end
The actual loading procedure is accomplished as follows. The assembler locates the first segment in the file. Since it's a public
combined segment, MASM concatenates all other CSEG
segments to the end of this segment. Finally, since its combine class is 'CODE
', MASM appends all segments (C2SEG
) with the same class afterwards. After processing these segments, MASM scans the source file for the next uncombined segment and repeats the process. In the example above, the segments will be loaded in the following order: CSEG
, CSEG
(2nd occurrence), C2SEG
, DSEG
, and then SSEG
. The general rule concerning how your files will be loaded into memory is the following:
readonly
is the first operand of the segment
directive, the assembler will generate an error if it encounters any instruction that attempts to write to this segment. This is most useful for code segments, though is it possible to imagine a read-only data segment. This option does not actually prevent you from writing to this segment at run-time. It is very easy to trick the assembler and write to this segment anyway. However, by specifying readonly
you can catch some common programming errors you would otherwise miss. Since you will rarely place writable variables in your code segments, it's probably a good idea to make your code segments readonly
.READONLY
operand:
seg1 segment readonly para public 'DATA' . . . seg1 ends
.386,
.486,
or .586
in your program. If you want to use 32 bit instructions, you will have to explicitly tell MASM to use 16 bit segments. The use16
, use32
, and flat
operands to the segment
directive let you specify the segment size.use16
operand. This tells MASM that the segment is a 16 bit segment and it assembles the code accordingly. If you use one of the directives to activate the 80386 or later instruction sets, you should put use16
in all your code segments or MASM will generate bad code.use16
operand:
seg1 segment para public use16 'data' . . . seg1 ends
The use32
and flat
operands tell MASM to generate code for a 32 bit segment. Since this text does not deal with protected mode programming, we will not consider these options. See the MASM Programmer's Guide for more details.
If you want to force use16
as the default in a program that allows 80386 or later instructions, there is one way to accomplish this. Place the following directive in your program before any segments:
.option segment:use16
segment
directive. For most programs, the following three segments should prove sufficient:
DSEG segment para public 'DATA' ; Insert your variable definitions here DSEG ends CSEG segment para public use16 'CODE' ; Insert your program instructions here CSEG ends SSEG segment para stack 'STACK' stk word 1000h dup (?) EndStk equ this word SSEG ends end
The SHELL.ASM file automatically declares these three segments for you. If you always make a copy of the SHELL.ASM file when writing a new assembly language program, you probably won't need to worry about segment declarations and segmentation in general.
ds
register. So computing the difference of the last byte in your program and the PSP will produce the length of your program. The following code segment computes the length of a program in paragraphs:
CSEG segment public 'CODE' mov ax, ds ;Get PSP segment address sub ax, seg LASTSEG ;Compute difference ; AX now contains the length of this program (in paragraphs) . . . CSEG ends ; Insert ALL your other segments here. LASTSEG segment para public 'LASTSEG' LASTSEG ends end
ds
:, cs:
, ss:
, es:
, fs:
, or gs:
. When used in front of an address expression, a segment prefix instructs the 80x86 to fetch its memory operand from the specified segment rather than the default segment. For example, mov ax, cs:I[bx]
loads the accumulator from address I+bx
within the current code segment. If the cs:
prefix were absent, this instruction would normally load the data from the current data segment. Likewise, mov ds:[bp],ax
stores the accumulator into the memory location pointed at by the bp
register in the current data segment (remember, whenever using bp
as a base register it points into the stack segment).
ds
segment register (or stack segment). Likewise, all code references (jumps, calls, etc.) are always relative to the current code segment. There is only one catch - how does the assembler know which segment is the data segment and which is the code segment (or other segment)? The segment
directive doesn't tell you what type of segment it happens to be in the program. Remember, a data segment is a data segment because the ds
register points at it. Since the ds
register can be changed at run time (using an instruction like mov ds,ax
), any segment can be a data segment. This has some interesting consequences for the assembler. When you specify a segment in your program, not only must you tell the CPU that a segment is a data segment, but you must also tell the assembler where and when that segment is a data (or code/stack/extra/F/G) segment. The assume
directive provides this information to the assembler.assume
directive takes the following form:
assume {CS:seg} {DS:seg} {ES:seg} {FS:seg} {GS:seg} {SS:seg}
The braces surround optional items, you do not type the braces as part of these operands. Note that there must be at least one operand. Seg
is either the name of a segment (defined with the segment
directive) or the reserved word nothing
. Multiple operands in the operand field of the assume
directive must be separated by commas. Examples of valid assume directives:
assume DS:DSEG assume CS:CSEG, DS:DSEG, ES:DSEG, SS:SSEG assume CS:CSEG, DS:NOTHING
The assume
directive tells the assembler that you have loaded the specified segment register(s) with the segment addresses of the specified value. Note that this directive does not modify any of the segment registers, it simply tells the assembler to assume the segment registers are pointing at certain segments in the program. Like the processor selection and equate directives, the assume directive modifies the assembler's behavior from the point MASM encounters it until another assume
directive changes the stated assumption.
Consider the following program:
DSEG1 segment para public 'DATA' var1 word ? DSEG1 ends DSEG2 segment para public 'DATA' var2 word ? DSEG2 ends CSEG segment para public 'CODE' assume CS:CSEG, DS:DSEG1, ES:DSEG2 mov ax, seg DSEG1 mov ds, ax mov ax, seg DSEG2 mov es, ax mov var1, 0 mov var2, 0 . . . assume DS:DSEG2 mov ax, seg DSEG2 mov ds, ax mov var2, 0 . . . CSEG ends end
Whenever the assembler encounters a symbolic name, it checks to see which segment contains that symbol. In the program above, var1
appears in the DSEG1
segment and var2
appears in the DSEG2
segment. Remember, the 80x86 microprocessor doesn't know about segments declared within your program, it can only access data in segments pointed at by the cs, ds, es, ss, fs,
and gs
segment registers. The assume
statement in this program tells the assembler the ds
register points at DSEG1
for the first part of the program and at DSEG2
for the second part of the program.
When the assembler encounters an instruction of the form mov var1,0
, the first thing it does is determine var1
's segment. It then compares this segment against the list of assumptions the assembler makes for the segment registers. If you didn't declare var1
in one of these segments, then the assembler generates an error claiming that the program cannot access that variable. If the symbol (var1
in our example) appears in one of the currently assumed segments, then the assembler checks to see if it is the data segment. If so, then the instruction is assembled as described in the appendices. If the symbol appears in a segment other than the one that the assembler assumes ds
points at, then the assembler emits a segment override prefix byte, specifying the actual segment that contains the data.
In the example program above, MASM would assemble mov VAR1,0
without a segment prefix byte. MASM would assemble the first occurrence of the mov VAR2,0
instruction with an es:
segment prefix byte since the assembler assumes es
, rather than ds
, is pointing at segment DSEG2
. MASM would assemble the second occurrence of this instruction without the es:
segment prefix byte since the assembler, at that point in the source file, assumes that ds
points at DSEG2
. Keep in mind that it is very easy to confuse the assembler. Consider the following code:
CSEG segment para public 'CODE' assume CS:CSEG, DS:DSEG1, ES:DSEG2 mov ax, seg DSEG1 mov ds, ax . . . jmp SkipFixDS assume DS:DSEG2 FixDS: mov ax, seg DSEG2 mov ds, ax SkipFixDS: . . . CSEG ends end
Notice that this program jumps around the code that loads the ds
register with the segment value for DSEG2
. This means that at label SkipFixDS
the ds
register contains a pointer to DSEG1
, not DSEG2
. However, the assembler isn't bright enough to realize this problem, so it blindly assumes that ds
points at DSEG2
rather than DSEG1
. This is a disaster waiting to happen. Because the assembler assumes you're accessing variables in DSEG2
while the ds
register actually points at DSEG1
, such accesses will reference memory locations in DSEG1
at the same offset as the variables accessed in DSEG2
. This will scramble the data in DSEG1
(or cause your program to read incorrect values for the variables assumed to be in segment DSEG2
).
For beginning programmers, the best solution to the problem is to avoid using multiple (data) segments within your programs as much as possible. Save the multiple segment accesses for the day when you're prepared to deal with problems like this. As a beginning assembly language programmer, simply use one code segment, one data segment, and one stack segment and leave the segment registers pointing at each of these segments while your program is executing. The assume
directive is quite complex and can get you into a considerable amount of trouble if you misuse it. Better not to bother with fancy uses of assume
until you are quite comfortable with the whole idea of assembly language programming and segmentation on the 80x86.
The nothing
reserved word tells the assembler that you haven't the slightest idea where a segment register is pointing. It also tells the assembler that you're not going to access any data relative to that segment register unless you explicitly provide a segment prefix to an address. A common programming convention is to place assume
directives before all procedures in a program. Since segment pointers to declared segments in a program rarely change except at procedure entry and exit, this is the ideal place to put assume directives:
assume ds:P1Dseg, cs:cseg, es:nothing Procedure1 proc near push ds ;Preserve DS push ax ;Preserve AX mov ax, P1Dseg ;Get pointer to P1Dseg into the mov ds, ax ; ds register. . . . pop ax ;Restore ax's value. pop ds ;Restore ds' value. ret Procedure1 endp
The only problem with this code is that MASM still assumes that ds
points at P1Dseg
when it encounters code after Procedure1
. The best solution is to put a second assume directive after the endp
directive to tell MASM it doesn't know anything about the value in the ds
register:
. . . ret Procedure1 endp assume ds:nothing
Although the next statement in the program will probably be yet another assume
directive giving the assembler some new assumptions about ds
(at the beginning of the procedure that follows the one above), it's still a good idea to adopt this convention. If you fail to put an assume
directive before the next procedure in your source file, the assume ds:nothing
statement above will keep the assembler from assuming you can access variables in P1Dseg
.
Segment override prefixes always override any assumptions made by the assembler. mov ax, cs:var1
always loads the ax
register with the word at offset var1
within the current code segment, regardless of where you've defined var1
. The main purpose behind the segment override prefixes is handling indirect references. If you have an instruction of the form mov ax,[bx]
the assembler assumes that bx
points into the data segment. If you really need to access data in a different segment you can use a segment override, thusly, mov ax, es:[bx]
.
In general, if you are going to use multiple data segments within your program, you should use full segment:offset names for your variables. E.g., mov ax, DSEG1:I
and mov bx,DSEG2:J
. This does not eliminate the need to load the segment registers or make proper use of the assume
directive, but it will make your program easier to read and help MASM locate possible errors in your program.
The assume
directive is actually quite useful for other things besides just setting the default segment. You'll see some more uses for this directive a little later in this chapter.
group
, that lets you treat two segments as the same physical segment without abandoning the structure and modularity of your program.group
directive lets you create a new segment name that encompasses the segments it groups together. For example, if you have two segments named "Module1Data
" and "Module2Data
" that you wish to combine into a single physical segment, you could use the group directive as follows:
ModuleData group Module1Data, Module2Data
The only restriction is that the end of the second module's data must be no more than 64 kilobytes away from the start of the first module in memory. MASM and the linker will not automatically combine these segments and place them together in memory. If there are other segments between these two in memory, then the total of all such segments must be less than 64K in length. To reduce this problem, you can use the class operand to the segment directive to tell the linker to combine the two segments in memory by using the same class name:
ModuleData group Module1Data, Module2Data Module1Data segment para public 'MODULES' . . . Module1Data ends . . . Module2Data segment byte public 'MODULES' . . . Module2Data ends
With declarations like those above, you can use "ModuleData
" anywhere MASM allows a segment name, as the operand to a mov
instruction, as an operand to the assume
directive, etc. The following example demonstrates the usage of the ModuleData
segment name:
assume ds:ModuleData Module1Proc proc near push ds ;Preserve ds' value. push ax ;Preserve ax's value. mov ax, ModuleData ;Load ds with the segment address mov ds, ax ; of ModuleData. . . . pop ax ;Restore ax's and ds' values. pop ds ret Module1Proc endp assume ds:nothing
Of course, using the group
directive in this manner hasn't really improved the code. Indeed, by using a different name for the data segment, one could argue that using group
in this manner has actually obfuscated the code. However, suppose you had a code sequence that needed to access variables in both the Module1Data
and Module2Data
segments. If these segments were physically and logically separate you would have to load two segment registers with the addresses of these two segments in order to access their data concurrently. This would cost you a segment override prefix on all the instructions that access one of the segments. If you cannot spare an extra segment register, the situation will be even worse, you'll have to constantly load new values into a single segment register as you access data in the two segments. You can avoid this overhead by combining the two logical segments into a single physical segment and accessing them through their group rather than individual segment names.
If you group two or more segments together, all you're really doing is creating a pseudo-segment that encompasses the segments appearing in the group
directive's operand field. Grouping segments does not prevent you from accessing the individual segments in the grouping list. The following code is perfectly legal:
assume ds:Module1Data mov ax, Module1Data mov ds, ax . < Code that accesses data in Module1Data > . assume ds:Module2Data mov ax, Module2Data mov ds, ax . < Code that accesses data in Module2Data > . assume ds:ModuleData mov ax, ModuleData mov ds, ax . < Code that accesses data in both Module1Data and Module2Data > . . .
When the assembler processes segments, it usually starts the location counter value for a given segment at zero. Once you group a set of segments, however, an ambiguity arises; grouping two segments causes MASM and the linker to concatenate the variables of one or more segments to the end of the first segment in the group list. They accomplish this by adjusting the offsets of all symbols in the concatenated segments as though they were all symbols in the same segment. The ambiguity exists because MASM allows you to reference a symbol in its segment or in the group segment. The symbol has a different offset depending on the choice of segment. To resolve the ambiguity, MASM uses the following algorithm:
assume
directive associates the segment name with a segment register but does not associate a segment register with the group name, then MASM uses the offset of the symbol within its segment.
assume
directive associates the group name with a segment register but does not associate a segment register with the symbol's segment name, MASM uses the offset of the symbol with the group.
assume
directive provides segment register association with both the symbol's segment and its group, MASM will pick the offset that would not require a segment override prefix. For example, if the assume directive specifies that ds
points at the group name and es
points at the segment name, MASM will use the group offset if the default segment register would be ds
since this would not require MASM to emit a segment override prefix opcode. If either choice results in the emission of a segment override prefix, MASM will choose the offset (and segment override prefix) associated with the symbol's segment.
MASM uses the algorithm above if you specify a variable name without a segment prefix. If you specify a segment register override prefix, then MASM may choose an arbitrary offset. Often, this turns out to be the group offset. So the following instruction sequence, without an assume directive telling MASM that the BadOffset
symbol is in seg1
may produce bad object code:
DataSegs group Data1, Data2, Data3 . . . Data2 segment . . . BadOffset word ? . . . Data2 ends . . . assume ds:nothing, es:nothing, fs:nothing, gs:nothing mov ax, Data2 ;Force ds to point at data2 despite mov ds, ax ; the assume directive above. mov ax, ds:BadOffset ;May use the offset from DataSegs ; rather than Data2!
If you want to force the correct offset, use the variable name containing the complete segment:offset address form:
; To force the use of the offset within the DataSegs group use an instruction ; like the following: mov ax, DataSegs:BadOffset ; To force the use of the offset within Data2, use: mov ax, Data2:BadOffset
You must use extra care when working with groups within your assembly language programs. If you force MASM to use an offset within some particular segment (or group) and the segment register is not pointing at that particular segment or group, MASM may not generate an error message and the program will not execute correctly. Reading the offsets MASM prints in the assembly listing will not help you find this error. MASM always displays the offsets within the symbol's segment in the assembly listing. The only way to really detect that MASM and the linker are using bad offsets is to get into a debugger like CodeView and look at the actual machine code bytes produced by the linker and loader.