Memory models for assembly libraries for Turbo C

Question

Turbo C follows the Intel Memory Model where in Tiny, Small and Compact models calling a function is near but in Medium, Large and Huge models calling a function uses far calls.

If I want to develop a Turbo C library using assembly, is there a way to make this library functional for all memory models? My problem is that functions in the library will return control using either RETN or RETF - and in either way they will malfunction in half of the memory models.

Is there a method to detect memory model from within the program, so as to notify the function in some way and make it decide between RETN and RETF?

Simply no. The issue is due basic 8086 workings. No (runtime) information about the type of a call taken is recorded on stack (or anywere else). Code has to be assembled specific for near or far calls. That's why (back then) every library was delivered in every (useful) fashion. — Raffzahn
– Raffzahn, Commented Nov 21, 2019 at 10:36
My guess is that you would have to use NEAR and SHORT jumps for all models and keep the code restrained. NEAR and SHORT jumps cause the IP to be updated while FAR jumps cause CS and IP to be updated." So, even using JMP will change the instruction at compile time. Other things to consider is self modifying code, store function data on stack or use batch files to compile in different models etc. — Natural Number Guy
– Natural Number Guy, Commented Nov 22, 2019 at 3:06
Not exactly "runtime" per se, I would not be surprised if there is some internal compile time defined STRING for the model being used. I remember vaguely such a thing for putting FAR or not into variables. If you manage to use it from asm.... — Rui F Ribeiro
– Rui F Ribeiro, Commented Nov 22, 2019 at 15:06

Stephen Kitt · Accepted Answer · 2019-11-21 10:55:45Z

I don’t think there’s a surefire way to detect the memory model being used at run-time, or even adjust code post-build in an object during linking. Libraries were provided in multiple variants, one for each supported memory model.

It is however possible to write code which will adjust to different memory models at build time, so a single assembly file can be used to produce all the required object file variants. In your procedure declarations, you can write

MYPROCEDURE proc DIST

where DIST is a macro you define as NEAR or FAR depending on the memory model (either detected, or specified by flags you provide when building all the variants of your library), and then the assembler will generate RETN or RETF as appropriate. You can also get the assembler to adjust CALL sites by declaring the external symbols appropriately each time (extrn MYPROCEDURE:DIST for TASM).

You will also need to handle segment registers differently in some cases (at least in the tiny memory model).

If you want to handle C calling conventions, things get a little more complex, especially if you need to handle pointers passed as arguments — their size will change depending on the memory model.

See Ralf Brown’s AMISLIB for one such implementation.

Good answer. Usually it should be able to make segment registers handling agnostic. What's more relevant may be parameter passing, as pointer handling can vary according to memory model, which leads to a different stack layout and necessary handling differences (loading segment registers as well, etc.) — Raffzahn
– Raffzahn, Commented Nov 21, 2019 at 10:41
@Raffzahn: Memory model affects the size of pointers declared without qualifiers, but even when using small/medium model it's often useful to have a few pointers which are qualified far and can be used to access memory outside the main 64K. — supercat
– supercat, Commented Nov 21, 2019 at 18:27

supercat · Accepted Answer · 2019-11-21 17:09:29Z

Given a function declaration like

void far copyAnywhere(void far *dest, void far *src, unsigned len);

it will be usable from within any memory model except huge. If called from within e.g. small model code, passing non-qualified pointers, the generated machine code would look something like (arguments are pushed right to left)

; Set up argument len push [len] ; Set up argument src push ds push [src] ; Set up argument dest push ds push [dest] ; Do the call, pushing both CS and IP push cs call _copyAnywhere ; Just pushes IP

The effect of memory models is to determine whether functions and pointers default to being near or far, but explicit far qualifiers can be used to handle cases contrary to that default.

Ah yes, so one could write all the assembly-language functions assuming far pointers and far calls, and declare them appropriately in the C-language headers... (At some cost in stack usage however.) — Stephen Kitt
– Stephen Kitt, Commented Nov 21, 2019 at 17:33
@StephenKitt: Also at cost--sometimes significant and sometimes not--of having to deal with objects being in different segments. — supercat
– supercat, Commented Nov 21, 2019 at 17:36
Yes, indeed. In development terms, for a library capable of supporting far models, you’d need to write the code for that anyway... The cost here is really that nothing can be optimised for smaller memory models. — Stephen Kitt
– Stephen Kitt, Commented Nov 21, 2019 at 17:39
It might also be workable to put a far call wrapper around code written to use near natively, then use different link-time bindings to call either the near or far entry point depending on the memory model of the calling code, via macros in the include file for your library. This of course only works if the library is capable of working internally using near calls (i.e. it doesn't need to access data passed by the caller via far pointers). — Ken Gober
– Ken Gober, Commented Nov 21, 2019 at 18:21

score 5 · Accepted Answer · 2019-11-22 05:11:23Z

You can't really detect the memory model the C code was compiled with at runtime. I suppose you check some sort of variable that indicates what model was used but you'd be constantly testing it in your assembly code making your code horribly inefficient. A much better way to handle multiple memory models with a library is to assemble a separate version of each function for each memory model. Fortunately this doesn't mean you need write a separate function for each memory model. You can use certain features of your assembler, assuming you're using TASM or other MASM derived assembler, so that you only write each function once.

Handling far and near RET

The specific problem of RETN or RETF can be handled more or less automatically. Normally you would just use the RET instruction, and the assembler will automatically pick the correct near or far return instruction based on how the procedure you use them in is defined. For example:

nearproc PROC NEAR ret ; generates opcode C3, near RET nearproc ENDP farproc PROC FAR ret ; generates opcode CB, far RET farproc ENDP

The .MODEL directive

Normally if you don't specify NEAR or FAR, the default is NEAR, but this can be changed by the .MODEL directive:

 .MODEL SMALL .CODE smallproc PROC ret ; generates opcode C3, near RET smallproc ENDP

 .MODEL LARGE .CODE largeproc PROC ret ; generates opcode CB, far RET largeproc ENDP

Specifying a calling convention

The .MODEL directive can also set a default calling convention when using the assemblers facilities for handling arguments to functions:

 .MODEL COMPACT, C .CODE memcpy PROC dest:PTR, src:PTR, len:WORD push cx IF @DataSize ; FAR data model push ds push es les di, [dest] lds si, [src] ELSE ; NEAR data model mov di, [dest] mov si, [src] ENDIF mov cx, [len] cld rep movsb IF @DataSize ; FAR data model pop es pop ds ENDIF pop cx ret memcpy ENDP

Because the "C" language was specified with the .MODEL directive, the assembler handles generating the necessary prologue (push bp mov bp, sp) and epilogue (pop bp) and figures out where the arguments to function live on the stack relative to BP. It automatically handles the fact that arguments will have different sizes depending on whether the code is using a near or far data model. It also automatically adds an underscore (_) to the name of the function, so the actual symbol being defined is _memcpy.

It doesn't however handle picking the instructions needed to load the arguments. As you can see this code is conditionalized on the @DataSize predefined symbol. This symbol is set to 0 for near data models and to 1 for far data models. There's also @CodeSize predefined symbol you can use to determine the code model.

The .MODEL directive and the @DataSize and @CodeSize symbols it sets are all you need to not only ensure the correct RET instruction is used, but also to lets you handle the fact that your code will need to change in order accommodate the different memory models.

This is true whether you use the argument handling and automatic prologue and epilogue and other similar facilities of the compiler. These can be a double edged sword though as the code that they generate isn't always optimal. Instead of specifying a calling convention with the .MODEL directive you can also specify it on a procedure by procedure basis as an additional parameter in the PROC directive. Use NOLANGUAGE to tell the assembler not to do any of this code generation for the current procedure.

How to pass a memory model on the command line

Now there's one problem left, how to tell the assembler which memory model to use. The simple way would be define a symbol using the /D command line option, giving the name of the memory model you wanted:

ml /c /DMemModel=MEDIUM memcpy.asm

You could then use MemModel with the .MODEL directive:

 .MODEL MemModel, C

The problem with this is that only works with MASM 6. With TASM or MASM 5 the /D option works differently. These assemblers would define MemModel as a symbol that's equal to the symbol MEDIUM, where as MASM 6 defines MemModel as a text macro that expands to MEDIUM. Since TASM and MASM 5 would define MemModel as a symbol, you'll get an error because the .MODEL directive doesn't accept symbols.

Unfortunately, if you're using either of these older assemblers you need something more complicated. The command line is almost the same, but note that the quotes are important:

tasm /ml /DMemModel="MEDIUM" memcpy.asm masm /ml /DMemModel="MEDIUM" memcpy.asm;

(The /ml options are there so the assemblers define _memcpy in lower case, otherwise they'll convert it to uppercase _MEMCPY.)

For this work, you'll need this code at the start of your assembly source:

 IFIDNI <"TINY">, %MemModel .MODEL TINY, C ELSEIFIDNI <"SMALL">, %MemModel .MODEL SMALL, C ELSEIFIDNI <"MEDIUM">, %MemModel .MODEL MEDIUM, C ELSEIFIDNI <"COMPACT">, %MemModel .MODEL COMPACT, C ELSEIFIDNI <"LARGE">, %MemModel .MODEL LARGE, C ELSEIFIDNI <"HUGE">, %MemModel .MODEL HUGE, C ENDIF

To save having to having to repeat this code in every source file you can put it in an include file:

 INCLUDE memmodel.inc

Stack Exchange Network

Memory models for assembly libraries for Turbo C

3 Answers 3

Handling far and near RET

The .MODEL directive

Specifying a calling convention

How to pass a memory model on the command line

You must log in to answer this question.

Hot Network Questions

Memory models for assembly libraries for Turbo C

3 Answers 3

Handling far and near RET

The .MODEL directive

Specifying a calling convention

How to pass a memory model on the command line

You must log in to answer this question.

Related

Hot Network Questions