Timeline for When and why is fwait necessary when using the 8087 coprocessor?

Current License: CC BY-SA 4.0

15 events

when toggle format	what		by	license	comment
Jul 29 at 17:45	comment	added	supercat		...triggering the "PPU write" signal. Set the timer to expire after copying a suitable number of bytes, set up the stack properly, and perform an RTI to set D at the starting source address. Then when the timer expires, the interrupt would set I and resume normal operation. Code could use this not only for OAM loads, but general PPU memory copy operations as well, 8x as fast as would be possible via "normal" means.
Jul 29 at 17:41	comment	added	supercat		@dirkt: The NES uses a chip that combines a 6502 core whose decimal-mode circuitry is disabled with some audio, DMA, and I/O hardware. Given the layout of the chip, adding logic to jinx the instruction latches in certain situations could have been simpler, cheaper, and better than some of the things the chip actually does. For example, if the chip had a good countdown timer, then instead of having a store of nn to `$4014` trigger a bulk copy from `$nn00-$nnFF` to $2004, having the D set and I clear could have caused all instruction fetches to behave as though fetching CMP #immed while...
Jul 29 at 13:51	comment	added	dirkt		@supercat: There is an extension of the 6502 via a "coprocessor" that adds 6 new registers and 44 new instructions. Not NES and not PPU, and not "practical" for a low-cost design as the NES, but doable.
Jul 29 at 9:13	history	edited	Omar and Lorraine	CC BY-SA 4.0	edited body
Jul 28 at 14:42	comment	added	supercat		@MichaelKarcher: I hadn't thought about the ROM code issue, which is a bit ironic since I work in embedded development, but I always tended to think of 8086 build tools as targeting the PC. I can see how having an assembler produce linker fixups could be useful, though assembling code twice--once for FPU and one for without--wouldn't seem like a huge burden either.
Jul 28 at 6:36	comment	added	Michael Karcher		@supercat The cost of (possible) emulation is more than the library size. You also have to reserve 10 interrupt vectors and you can not put FPU-using in ROM. The object format was supposed to be as portable as possible, being suitable as distribution format for 8086 machine code (especially in LIB files), so delaying the emulator/8087 decision to link link stage avoided the need generate multiple OBJ file versions. Note that the decision to emit x87 by the Assembler and provide the fixup values in the mutation library allows the library to choose the interrupt vector numbers.
Jul 28 at 6:21	history	edited	Michael Karcher	CC BY-SA 4.0	Edited the bytes emitted by FWAIT (Thanks Stephen Kitt)
Jul 27 at 15:20	history	edited	Sep Roland	CC BY-SA 4.0	corrected grammar/spelling Also I'm sure that the 8087 didn't appear in 8087!!!
Jul 26 at 17:12	comment	added	Stephen Kitt		@supercat the Turbo C 2 8087 library is 3540 bytes in size, compared to 15384 bytes for the emulation library (which includes the 8087/287 support code as well).
Jul 26 at 17:05	comment	added	supercat		@StephenKitt: How big would an "emulation library" be if it only had to work on systems with an 8087 (patching code, but not having to actually emulate FPU operations)?
Jul 26 at 17:04	comment	added	supercat		I like the discussion of the anticipated uses of ESC. I've sometimes wondered about whether it would have been practical for something like the 6502 core in the NES to have logic to interpret some opcodes whose bottom bits are 11 as transfers between the PPU and memory. That would have required replacing some of the I/O pins on the DIP40 with address and control pins for the PPU, but would have doubled the speed of transferring data between memory and the PPU, which would have reduced the hardships imposed by the inability to access data mid-frame.
Jul 26 at 17:01	comment	added	Stephen Kitt		@supercat the advantage is that you can assemble once into an object file, then produce executables with or without emulation without re-assembling.
Jul 26 at 16:59	comment	added	supercat		How much advantage was there to having the assembler generate special link info as opposed to telling it whether an interrupt handler would be present and having it generate the INT-based code directly. On a system with an FPU, I think the interrupt would, the first time it was executed, patch the interrupt-based code into a combination of instructions that would, except in the case of a lone FWAIT or an FPU instruction without an FWAIT, be the same size and execute just as quickly as if the FPU instruction had been specified directly.
Jul 26 at 12:10	comment	added	Stephen Kitt		See also the ASM-86/LINK-86 description starting at page S-61 of the Intel manual (it doesn’t go into as much detail but it shows that this setup was available from the beginning).
Jul 25 at 20:59	history	answered	Michael Karcher	CC BY-SA 4.0