m88k-based OpenBSD ports switch to gcc 3.3.6 as the system compiler. And there was much rejoicing.
Over the last couple of weeks, I have been working towards getting a working
gcc 3.3 compiler for m88k. There is still a couple of bugs with the new
variadic function argument handling code, which prevents the stack protector
code from being used and also seems to fail if a variable-sized type is passed
before the variadic part. I'll be investigating this shortly; in the meantime
I have been able to succesfully switch my mvme88k system to gcc3.
I am now awaiting gcc3-built luna88k kernel test reports, to switch to gcc3
for real (read: in the official OpenBSD tree).
I finally completed my rework of the cache handling code, which I had started early this year. Using write back cache on more mappings exposed a lot of painful bugs which took a lot of time and energy to fix. I am now working on the TLB management code, to speed-up 88110 operation.
Been working on memory management and cache bowels again, to try and speed the system up a bit. I am quite puzzled as MVME188 systems hit spectacular failures, while MVME187 and MVME197 systems run fine. More investigation is needed...
After a long week of intensive code-churning, all mvme88k systems are stable again, and I finally implemented the intrusive memory management changes I was sitting on since years! Soon in a snapshot on an ftp mirror near you...
Looks like the 88110 regression is caused by a bug in the cache handling routines. Forcing the cache to write-through on 88100 makes them run nicely... but then as this also affects timing, it may still be a race nevertheless. I need to tinker with the 88200 cache routines.
Working on emptying (well, seriously shrinking) my mvme88k todolist. I finally
completed the VBR page relocation diff I was working on since july
2004. That is, I finally found the most important (and stupid) bug in it, and
then all the remaining bugs were easy to tame.
However, 88100 systems (both MVME187 and MVME188) panic very early during
multiuser boot at the moment, so I need to track this down (MVME197 being
perfectly happy in the meantime...)
GENERIC.MP is currently completing a make -j2 build (it is in
the install phase). So I'll pretend SMP kernels are supported now,
although there is a noticeable clock drift under load.
But I have SMP improvements in the pipeline, which should improve this.
Unfortunately, GENERIC.MP dies from stack exhaustion during build. The sort-of good thing is, this seems to be a 100% reproducable behaviour, which should help me debug this issue.
GENERIC.MP runs stably on a MVME197DP board now. There are occasional
spurious interrupt messages, which I'll need to investigate, but the
system is stable overall.
If it survives a make build I'll call it a day.
I found why the SMP kernels eventually freeze on MVME197DP. It turns out
some sequence I wrote back in the MVME188 times expect not to be reentrant,
and does whatever is necessary to ensure this. It works on 88100, but on
88110 it can get interrupted by NMIs... and very bad things happen then.
Back to the drawing board. It's easy to fix with ugly code, and I'll try
to do something at least half-decent. But it's almost midnight, so I'll
get a night of sleep first...
I am running out of ideas on what goes wrong with MVME197 SMP kernels. The GENERIC.MP kernel will boot single user, run a few processes, and then hang so well the abort switch won't help.
Been working on MVME327A and MVME197 SMP recently, although none of them works at the moment...
The MVME328XT boards now work. Stupid me didn't realize these boards do not expose any signals to P2, and do not use P2 adapter boards. No wonder I was not seeing any drives.
The code driving the 88410 secondary cache on the 197SP and 197DP is subtly flawed, and this caused invalid data to be read from (or written to) disks connected to the on-board osiop controller. I did not notice this earlier because I had been doing the MVME197 work mostly with a VME cage which disks are connected to a MVME328 controller (which did not expose this issue).
So I checked in a workaround, and I'll have to dig the errata information again. But since I don't have information about the 88110-88410 communication procedures, I don't know if a real fix will ever appear (I think the invalidation command returns before the operation is completed).
I have replaced the MVME197LE board I had been using for the last snapshot with a 197DP with 128MB. This will hopefully gain a bit more time for builds, and this will make sure I will notice the secondary cache problem if it ever reappears.
A new snapshot has been made, and should appear soon on the mirrors (files dated january 1st). This is the first snapshot built on a MVME197 board (a 50MHz MVME197LE with 64MB of memory), replacing the 33MHz MVME187 board which had built all the snapshots and releases for more than four years now! And it took only about two thirds of the time it took with the MVME187, even though creating the tarballs probably took longer because of the smaller caches (2x8KB on the 197LE vs 2x16KB on the 187).
I found one last debugging flag I had forgot to remove in the processor status register. This flag caused instructions accessing memory (i.e. loads and stores) to force a synchronization, defeating the superscalar gain.
After disabling this behaviour, there was a nice speed boost (crypto performance going up from 50% to 120%, depending on the algorithm, and kernel compile time being 20% faster). Of course, some regression was to be expected, and the 40MHz board problems came back, but disappeared after increasing again the processor bus timeout.
These 88110-based boards are really starting to rock!
40MHz MVME197LE boards are (hopefully) fixed for good, and instruction cache is enabled again on them! Looks like I found the right way to make them happy.
The 88110 FPU trap code is far from passing John Hauser's TestFloat test suite, but the most blatant problems have been fixed.
I think this is solid enough to try and build a snapshot on a 88110, so I just started a make build on a 197DP loaded with 384MB of memory. Looking forward for the results.
After seven hours of intensive work (including cooking a nice dinner), the rewritten 88110 FPU exception code works nicely and better than ever. There are still some rough edges to polish, but all the IEEE754 regression tests in the tree now pass. I will run John Hauser's extensive test suite in the next few days.
The new snapshot is uploading, of course it does not contain the FPU work. Soon to hit a ftp mirror near you.
Also, I tinkered a bit with the 40MHz boards over the last couple of days, to no avail. Freezing half the instruction cache makes it run longer, but does not help on the long run. I am now looking at the BusSwitch programming, in case increasing some latencies would help.
I have been working more on the aviion port recently. Some of the interrupt system changes will benefit the MVME188 systems as well, but I need to make them cohabit nicely with the m68k-style interrupt system used on MVME187 and MVME197.
A new mvme88k snapshot is currently being built. It should be completed on sunday or early monday (GMT).
A workaround for the MVME197LE boards with 40MHz processors instability has been commited. Unfortunately, it has a huge impact on performance. I will try a few awkwards ideas, maybe I'll find something less aggressive.