Patching DGS Software's Mini Golf to fix crashes with HD Audio driver on Windows 3.1

After patching Pinball three months ago, this is essentially another instalment in the "patching dodgy media code in retro Windows games" series…

Recently I installed a HD Audio driver on my Windows 3.1 (Standard Mode) system running on my modern PC from 2014. It only works when booting using Microsoft's HIMEM.SYS and no Memory Manager, it slows down Windows startup (and seems to completely stall the shutdown), and I haven't figured out how to turn up the volume, but still, it works! I have Windows 3.1 running natively on my modern PC, with sound!

With that, I decided to have a quick game of Mini Golf, since that's just about the only game I have for Windows 3.1 that actually plays any sound! I've had a copy of this game forever, but this would be a "new" way of playing it, of sorts! The game is from DGS Software, and appears to be a version of this, although the word "Twisted" doesn't appear anywhere in my copy, and mine has more courses than mentioned on that page…

Anyway, when I fired up the game, sure enough, I heard the jingle playing on the DGS logo, and I was able to start playing a course. However, after a few putts, I got a message about a General Protection fault (#GP), with "Close" and "Ignore" options (but the latter just made it pop up again). My first thought was that this was a bug in the HD Audio driver itself – perhaps it wasn't holding up as well on my hardware as I thought. After all, as I mentioned, the driver is rather choosy about what sort of environment it can run in. Still, the issue with Mini Golf seemed pretty systematic, and I decided it merited further investigation.

The Investigation

I downloaded the latest Win16 build (yes, they are still published regularly!) of OpenWatcom V2, and installed it under Windows 3.1. Then, having booted natively with the HDA driver in place, I used the OpenWatcom debugger to run GOLF.EXE. Again, I started playing, and again, I got a #GP after a short while, but this time it was trapped by the debugger!

So, what was happening? Firstly, the fault was inside GOLF.EXE, so it was nothing to do with the sound driver per se, which was encouraging. This [*] was the offending instruction:

2:6c98:       c5 4e 0a                  lds   cx, [bp+0Ah]

So, it's loading a far pointer into DS:CX. Windows is in Standard Mode, which means the CPU is in Protected Mode, so this can indeed cause a #GP if the segment being loaded into the DS register isn't a valid selector in the GDT/LDT, or if the program doesn't have the privilege to access it.

Right then, what far pointer was on the stack at SS:[BP+0Ah]?


Oh dear. While FFFF could be a Ring-3 LDT selector (if the OS had set the LDT limit to the maximum possible), this definitely does not look like a valid pointer!

I looked at the next few lines to see what it tries to do with this pointer:

2:6c9b:       8c da                     mov   dx, ds
2:6c9d:       8b c2                     mov   ax, dx
2:6c9f:       0b c1                     or    ax, cx
2:6ca1:       74 39                     jz    6cdc

So it immediately moves the segment of this far pointer into the general-purpose register DX, copies it into AX, and or-s it with the offset in CX. If the segment and offset are both zero, the or sets the zero flag, and in that case, it jumps to the end of the function. Realizing this, I manually used the debugger to change the FFFF:FFFF on the stack to 0000:0000, and restarted execution. Sure enough, the game proceeded, albeit with some minor graphical glitches due to the interruption!

However, it wasn't long before another #GP occurred, at the same instruction and for the same reason. I could keep manually changing the Fs to 0s, but this was really no way to play the game! I needed to dig in and understand how this crazy pointer was being passed to this function.

I went up the stack to get to the calling function, and see where the "pointer" was coming from. The progression included the following:

2:7271:       2b 46 fa                  sub   ax, [bp-06h]
2:7274:       1b 56 fc                  sbb   dx, [bp-04h]
2:7277:       89 46 e8                  mov   [bp-18h], ax
2:727a:       89 56 ea                  mov   [bp-16h], dx
2:72ec:       8b 46 e8                  mov   ax, [bp-18h]
2:72ef:       8b 56 ea                  mov   dx, [bp-16h]
2:72f2:       89 46 e4                  mov   [bp-1Ch], ax
2:72f5:       89 56 e6                  mov   [bp-1Ah], dx
2:7304:       ff 76 e6                  push  word ptr [bp-1Ah]
2:7307:       ff 76 e4                  push  word ptr [bp-1Ch]
2:730a:       ff 76 ee                  push  word ptr [bp-12h]
2:730d:       ff 76 ec                  push  word ptr [bp-14h]
2:7310:       9a 6a 6c 0a 72            call  2:6c6a

So the "pointer" makes its way from DX:AX at 2:7274, to SS:[BP-18h] at 2:727a, to SS:[BP-1Ch] at 2:72f5, and eventually onto the stack frame for the called function at 2:7307. Now, an immediate red flag is the use of the sub and sbb instructions! These are "subtract" and "subtract with borrow" respectively, and imply that the contents of DX:AX are being treated as a straightforward 32-bit integer, and not as a far pointer!

Going a bit further back, here's where this 32-bit integer is coming from:

2:7225:       c7 46 f0 04 00            mov   word ptr [bp-10h], 0004h
2:722a:       c4 5e 06                  les   bx, [bp+06h]
2:722d:       26 ff 77 06               push  word ptr es:[bx+06h]
2:7231:       8d 46 f0                  lea   ax, [bp-10h]
2:7234:       8c d2                     mov   dx, ss
2:7236:       52                        push  dx
2:7237:       50                        push  ax
2:7238:       6a 08                     push  word 0008h
2:723a:       9a ff ff 00 00            call  MMSYSTEM.412    ; <WAVEOUTGETPOSITION>
2:723f:       8b 46 f2                  mov   ax, [bp-0Eh]
2:7242:       8b 56 f4                  mov   dx, [bp-0Ch]
2:7245:       89 46 fa                  mov   [bp-06h], ax
2:7248:       89 56 fc                  mov   [bp-04h], dx
2:7262:       c4 5e 06                  les   bx, [bp+06h]
2:7265:       26 c4 5f 08               les   bx, es:[bx+08h]
2:7269:       26 8b 47 0c               mov   ax, es:[bx+0Ch]
2:726d:       26 8b 57 0e               mov   dx, es:[bx+0Eh]
2:7271:       2b 46 fa                  sub   ax, [bp-06h]
2:7274:       1b 56 fc                  sbb   dx, [bp-04h]

It all seems to start with a call to WAVEOUTGETPOSITION, with a stack frame looking like this:

SS:[SP]        cbmmt == 8
SS:[SP+2]      pmmt  == SS:[BP-10h]
SS:[SP+6]      hwo   == word ptr ES:[BX+06h]

Which means that there is an MMTIME structure at SS:[BP-10h], of size 8 bytes, looking something like this [†]:

SS:[BP-10h]    wType == 4 == TIME_BYTES
SS:[BP-0Eh]    cb    == ? (to be filled in by the function call)
SS:[BP-0Ah]    (2 padding bytes)

There is also a far pointer at SS:[BP+06h] to some other type of structure. This structure has a HWAVEOUT at offset 6 (hence the use of ES:[BX+06h] after loading the far pointer into ES:BX).

Immediately after the function call, from 2:723f to 2:7248, the cb value in the MMTIME (which has just been filled in by the function) is loaded into DX:AX, and then stored at SS:[BP-06h]. This value is a 32-bit integer representing the number of bytes traversed by the wave device while playing a sound.

Later, at 2:7262, the far pointer at SS:[BP+06h] is again loaded into ES:BX, and from there, another far pointer to some structure is loaded from ES:[BX+08h]. This latter structure has another 32-bit integer at offset 12 (0Ch), which is loaded into DX:AX at 2:7269 and 2:726d. It is from this integer, presumably representing the total number of bytes in the current sound sample, that the cb value is subtracted.

I used a breakpoint to step through this code path a few times, as I played the game. It seems that WAVEOUTGETPOSITION was always getting called when the sample was completely finished, so this subtraction should have been returning a harmless zero. However, about half the time, the function call was reporting cb equal to one byte more than there actually were in the sample! (Presumably, this was due to a lack of sanity checking in the HD Audio driver, which explains why these #GP faults were only happening with that driver.) As a result, the subtraction was returning -1, or, as a 32-bit integer, FFFFFFFFh!

So now, let's return to the offending sequence of instructions, and include a few more to make things really clear:

2:6c98:       c5 4e 0a                  lds   cx, [bp+0Ah]
2:6c9b:       8c da                     mov   dx, ds
2:6c9d:       8b c2                     mov   ax, dx
2:6c9f:       0b c1                     or    ax, cx
2:6ca1:       74 39                     jz    6cdc
2:6ca3:       c5 76 f6                  lds   si, [bp-0Ah]
2:6ca6:       c4 7e fa                  les   di, [bp-06h]

Right, this use of the lds instruction is clearly not sane. The value at SS:[BP+0Ah] is a 32-bit integer, not a far pointer. As such, the upper half is not actually used as a segment – note that it is immediately moved into a general purpose register (DX), and the selector in DS is almost immediately replaced, via another unrelated use of the lds instruction, at 2:6ca3.

This abuse of lds seems to have been a bit of code golf (!) played by someone writing this routine directly in assembly. It would work fine in Real or Virtual 8086 Mode (e.g. in DOS), because any 16-bit integer is acceptable as a segment value. It also usually "works" in Protected Mode, because the samples are short enough that the upper half of the 32-bit integer tends to be zero (which is always valid as a "null" segment selector).

Anyway, after this badly-coded check for zero, what does this routine do if the value is not zero? Well, apparently, it tries to fill in all the bytes in the sound sample buffer that have not been played yet, with some new value. If it overflows a segment while doing this, it tries to move to the next segment:

2:6cb8:       83 c6 01                  add   si, 01h
2:6cbb:       75 07                     jnz   6cc4
2:6cbd:       8c d8                     mov   ax, ds
2:6cbf:       05 00 10                  add   ax, 1000h
2:6cc2:       8e d8                     mov   ds, ax
2:6cc4:       83 c7 01                  add   di, 01h
2:6cc7:       75 07                     jnz   6cd0
2:6cc9:       8c c0                     mov   ax, es
2:6ccb:       05 00 10                  add   ax, 1000h
2:6cce:       8e c0                     mov   es, ax
2:6cd0:       e2 d8                     loop  6caa

It's copying stuff from DS:SI to ES:DI, and if SI or DI wraps around to zero on increment, it adds 1000h to DS or ES respectively. Again, this would work fine in Real or Virtual 8086 Mode, but not in Protected Mode! Adding 1000h to a Real-Mode segment advances its base by 10000h, compensating for overflow in a 16-bit offset, but adding 1000h to a Protected-Mode selector is undefined (unless you know exactly what's in the Descriptor Tables, which this code does not)!

Again, under normal circumstances, this shouldn't even come up, since the samples in question are so short, but this kind of code simply should not be written when the application can be run in Protected Mode! It's strange, because I found things like this in other parts of the program:

3:08ea:       8c c1                     mov   cx, es
3:08ec:       81 c1 ff ff               add   cx, KERNEL.114  ; <__AHINCR>
3:08f0:       8e c1                     mov   es, cx

This block indicates an awareness in the program at large that running under Protected-Mode Windows is a possibility. Presumably, this sane block was compiler-generated, while, as I said before, the offending function was hand-coded in assembly, and probably copied from some DOS program that always runs in Real or Virtual 8086 Mode…

Anyway, clearly, they never imagined that there would be a negative argument passed to this function. It would run completely wild trying to fill in FFFFh bytes in a buffer [‡] and possibly overwrite some critical data structure. Or, in Protected Mode, it would just cause another #GP after running off the end of the buffer segment!

The Fix

With all that in mind, I had two problems to fix with this function:

  • Abuse of the lds instruction to check if a 32-bit integer is zero
  • Failure to check if that integer is negative

I fired up GHex and made the following edits to GOLF.EXE:

Offset Original byte values New byte values
0x1D1C9 c5 4e fa 8c da 8b c2 0b c1 74 66 8b 4e fa 66 09 c9 66 90 7e
0x1D2D8 c5 4e 0a 8c da 8b c2 0b c1 74 66 8b 4e 0a 66 09 c9 66 90 7e

The second of these changed the sequence above:

2:6c98:       c5 4e 0a                  lds   cx, [bp+0Ah]
2:6c9b:       8c da                     mov   dx, ds
2:6c9d:       8b c2                     mov   ax, dx
2:6c9f:       0b c1                     or    ax, cx
2:6ca1:       74 39                     jz    6cdc

To the following new one:

2:6c98:       66 8b 4e 0a               mov   ecx, [bp+0Ah]
2:6c9c:       66 09 c9                  or    ecx, ecx
2:6c9f:       66 90                     nop
2:6ca1:       7e 39                     jle   6cdc

Yikes, three 66h-prefixed instructions in a row! Well, now this function won't run on anything older than a 386… 🙂️ The first of the two edits in the table above makes a similar change to a function at another point in the program.

So now, the entire 32-bit integer is loaded into the 32-bit ECX register, which is then or-ed with itself to set the zero or sign flag according to its contents. The two-byte nop is just there for padding. The replacement of jz with jle ensures that the function is skipped if the integer is zero or negative (as opposed to just zero). If the integer is positive, its bottom half is still in the 16-bit register CX, like before, so the function should still work as intended.

I booted into Windows 3.1, fired up my patched Mini Golf game, and I was able to play a full round without being bothered by any more #GP faults. Mission accomplished!


[*]All the disassembly snippets in this article were actually taken after the fact, using Semblance with the -D option. So, they don't look 100% identical to what was written in the OpenWatcom debugger, but they give the right idea!
[†]This listing includes 4 == TYPE_BYTES, which I determined by looking at MMSYSTEM.H from the Win16 API headers, also handily included in the OpenWatcom installation!
[‡]Not FFFFFFFFh since it seems it doesn't actually use that upper half at all…