Updated 20151121
Help yourself!
I am no longer working on OpenBSD for the mvme88k hardware.
Despite the previous news entry, I am still maintaining a (currently private)
mvme88k repository.
I am considering building unofficial (and not signed) OpenBSD releases for the
few mvme88k hobbyists around.
The m88k development log will now occur on the
OpenBSD/aviion page.
The mvme88k port will be dropped from OpenBSD after the 5.5 release (and has
already been removed from the trunk). It was not worth the effort anymore.
Note that the other m88k-based ports (aviion, luna88k) are still alive and
kicking.
The last two week have been spent slaving on 88100 systems trying to track down
and fix odd (and not easily reproduceable) failures, which had never been seen
on 88110 systems.
After spending a lot of time in the 88100-specific code paths of exception
processing, I finally fixed two bugs, both found in the ``data access
emulation'' logic.
The first bug would cause interrupted signed loads of halfwords or bytes to not sign-extend the value, and was caused by a seemingly innocent prototype change I introduced in summer 2004, while doing some code shuffling to share more code between the new (back then) luna88k port and the mvme88k port. It was most often encountered with awk aborting early with a bogus (but always the same) syntax error message. But running the failing command again would run as intended countless times afterwards...
The second bug would cause interrupted atomic operations (i.e. xmem assembly instructions) to not be replayed correctly: the store was performed correctly, but instead of putting the previous value in the destination register, the address of the memory location was put instead. This bug was triggered by processes linked against libpthread invoking fork(), which would (often) cause a page fault on an xmem instruction in the child. The incorrect returned value would cause libpthread to spin on a lock, thinking it could not have been acquired, while it had in fact been free and had just been acquired. The unfortunate process would sched_yield() until killed. The origin of the bug is much older, and was tracked back to the initial mvme88k port in 1995. Interestingly enough, the Mach codebase it was based upon, did not have this bug.
With these two bugs out of the way, 88100 systems are now much more reliable. I am now investigating low-memory conditions userland instability in multiprocessor kernels...
For some reason, I had tought the only VME boards compatible with AngelFire
boards were MVME236 boards (which I don't have any). However, glancing at the
SVR3/88k release notes, I noticed MVME224 boards are listed as compatible with
the MVME181... So I set up mine to reply to the 8-16MB A32 VME range, and
I got a 16MB system!
The system boots multiuser in 16MB, which makes me believe we have yet another
low memory swap problem (likely, an emergency memory area being sized too small
in 8MB). This would explain why a ktrace of the ``hanging'' vi.recover
would show it spinning during an ld.so PLT update (and thus, with
all signals masked, preventing ^C or ^Z from working).
After further tinkering, and playing with some of my oldest 188 gear
(especially my 188, not 188A, board assembly), it turns out that, due to
silicon bugs, older 88100 and 88200 systems also need their page table
pages to be mapped cache inhibited.
Doing this fixed the 188 board, which would suffer from frequent segmentation
faults without this. I guess I now understand why Mach on the Luna88k was
mapping page tables cache inhibited, and why OpenBSD/mvme88k did too until quite
recently.
The good news is that the kernel will now decide at runtime, of the best cache
settings it may use, so as not to impact the overwhelming majority of
88100-based systems.
For the record, the processor and CMMU revision chart is now:
88100 version | 88200 version | page tables | regular memory |
<= 9 | <= 6 | cache inhibited | write-through |
<= 8 | cache inhibited | write-back | write-through | write-back |
And the combinations I can currently test are:
Update: searching for errata information, I stumbled upon a newsgroup post in comp.arch in february 1990, from Ron Widell of Motorola (back then), who says he isn't aware of any C82N errata (lucky man), and that the previous mask, C64D, has two issues:
In the same message, Mr. Widell also mentions no known errata for the 88200 "since Jan, '89", and that "we may soon see 40MHz (Maybe even 50?) devices" ...
While that won't likely fix the vi.recover freeze, this gives me enough information to be able to tell if a given system has the DIV/DIVU errata, or not. I'm going to try having the compiler emit calls to libc for these instructions, where libc would still do the `manually check for zero' dance, but ld.so would override these with a direct divide routine, similar to what the sparc ld.so does to replace integer multiply and divide with faster ones on v8 systems. That would hopefully make most of our 88k-based systems run faster (except for statically-linked binaries).
Kernel runs again. It turns out that, on this old 88100 and 88200 system,
the xmem instruction corrupts the cache line it is run on, if it is
dirty. The single-processor kernel, when running on 88100, uses xmem in two
places: to fetch and reset the software interrupt register, and to atomically
clear page table entries.
The memory used for page table entries being forced write-through, no ill
effect occurs when they are cleared. However, the software interrupt register
shares a cache line with the current cpu_info struct, causing its
contents to be clobbered and corrupted...
At the moment, I am using a simple workaround of allocating a complete
cache line for the software interrupt register. I need to decide how to fix the
issue in a nicer way. There are two options:
I am quite convinced the problems I have been stumbling upon are CPU errata (keep in mind the CPU and both CMMU are very old versions, way older that what the luna88k prototypes were fitted with, and way below the common knowledge errata version). Shuffling a few things around eventually got me the system booting multiuser (swapping like mad, of course). And then I started cleaning the kernel code, and now the machine no longer boots. Grr. I'll nevertheless call this a day and reread my current changes carefully tomorrow when I'm fully awake.
Boot log (note how appropriate the fortune is):
Copyright Motorola Inc. 1988, 1989, All Rights Reserved MVME181 Debugger/Diagnostics Release Version 3.01 - 08/17/89 COLD Start Exception: Interrupt (Abort) Vector Number =1, Address =008 SXIP =FF81D5C8 SNIP =FF81C662 TPSR =A00003F0 1) Continue System Start Up 2) Select Alternate Boot Device 3) Go to System Debugger 4) Initiate Service Call 5) Display System Test Errors 6) Dump Memory to Tape Enter Menu #: 3 181-Diag>SD 181-Bug>LO 0 181-Bug>GO Effective address: 00680000 >> OpenBSD/mvme88k sboot [1.1] Network Controllers/Nodes Supported Driver CLUN DLUN Name Address Ethernet Address le0 2 0 VME376 $ffff1200 00:00:77:83:ac:56 boot: test -as boot: client IP address: 10.0.1.138 boot: client name: bourbouillou root addr=10.0.1.1 path=/netboot/bourbouillou/root 2511152+450388 [52+124720+109413]=0x30c4c4 Start @ 0x10000 Controller Address 0xffff1200 CPU0 is associated to 2 MC88200 CMMUs [ using 234560 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2013 OpenBSD. All rights reserved. http://www.OpenBSD.org OpenBSD 5.3-current (GENERIC) #59: Tue May 14 19:49:06 GMT 2013 miod@tarentaine.gentiane.org:/usr/src/sys/arch/mvme88k/compile/GENERIC real mem = 8388608 (8MB) avail mem = 4980736 (4MB) mainbus0 at root: Motorola MVME181, 20MHz cpu0: M88100 rev 0x8, 2 CMMU cpu0: M88200 (16K) rev 0x5, full Icache, M88200 (16K) rev 0x5, full Dcache angelfire0 at mainbus0 addr 0xff800000 dart0 at angelfire0 offset 0x640000 ipl 5: console vme0 at angelfire0 offset 0x680000: system controller vmes0 at vme0 le0 at vmes0 addr 0xffff1200 ipl 3 vec 0x0: address 00:00:77:83:ac:56 le0: 128 receive buffers, 32 transmit buffers vs0 at vmes0 addr 0xffff9000 ipl 2 vec 0x1 vec 0x2: Jaguar vs0: channel 0 scsibus0 at vs0: 8 targets, initiator 7 sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST15150N, 0011> SCSI2 0/direct fixed ser ial.SEAGATE_ST15150N_00371805 sd0: 4095MB, 512 bytes/sector, 8388315 sectors sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST15150N, 0011> SCSI2 0/direct fixed ser ial.SEAGATE_ST15150N_00375704 sd1: 4095MB, 512 bytes/sector, 8388315 sectors vmel0 at vme0 vscsi0 at root scsibus1 at vscsi0: 256 targets softraid0 at root scsibus2 at softraid0: 256 targets boot device: le0 root device (default le0): sd1a swap device (default sd1b): (return) root on sd1a swap on sd1b dump on sd1b WARNING: / was not properly unmounted WARNING: preposterous time in file system WARNING: clock lost 4739 days -- CHECK AND RESET THE DATE! Enter pathname of shell or RETURN for sh: @ / [/] # date Fri Dec 31 00:00:06 GMT 1999 @ / [/] # fsck -p /dev/sd1a (6f4eefc062e7a39d.a): 1125 files, 17058 used, 23053 free (253 frags, 2 850 blocks, 0.6% fragmentation) /dev/sd1a (6f4eefc062e7a39d.a): MARKING FILE SYSTEM CLEAN /dev/sd1h (6f4eefc062e7a39d.h): FREE BLK COUNT(S) WRONG IN SUPERBLK (SALVAGED) /dev/sd1h (6f4eefc062e7a39d.h): 8867 files, 98607 used, 302824 free (1896 frags, 37616 blocks, 0.5% fragmentation) /dev/sd1h (6f4eefc062e7a39d.h): MARKING FILE SYSTEM CLEAN /dev/sd1d (6f4eefc062e7a39d.d): 3 files, 3 used, 50540 free (20 frags, 6315 bloc ks, 0.0% fragmentation) /dev/sd1d (6f4eefc062e7a39d.d): MARKING FILE SYSTEM CLEAN /dev/sd1e (6f4eefc062e7a39d.e): 21961 files, 305419 used, 96004 free (4356 frags , 11456 blocks, 1.1% fragmentation) /dev/sd1e (6f4eefc062e7a39d.e): MARKING FILE SYSTEM CLEAN /dev/sd1i (6f4eefc062e7a39d.i): 1645 files, 3271 used, 1017416 free (152 frags, 127158 blocks, 0.0% fragmentation) /dev/sd1i (6f4eefc062e7a39d.i): MARKING FILE SYSTEM CLEAN /dev/sd1f (6f4eefc062e7a39d.f): 592 files, 3105 used, 26678 free (206 frags, 330 9 blocks, 0.7% fragmentation) /dev/sd1f (6f4eefc062e7a39d.f): MARKING FILE SYSTEM CLEAN /dev/sd1g (6f4eefc062e7a39d.g): 2 files, 2 used, 50029 free (21 frags, 6251 bloc ks, 0.0% fragmentation) /dev/sd1g (6f4eefc062e7a39d.g): MARKING FILE SYSTEM CLEAN /dev/sd0i (571a16618db10047.i): 98996 files, 473481 used, 203801 free (16833 fra gs, 23371 blocks, 2.5% fragmentation) /dev/sd0i (571a16618db10047.i): MARKING FILE SYSTEM CLEAN @ / [/] # mount -at ffs @ / [/] # umount /data @ / [/] # umount /usr/obj @ / [/] # umount /usr/src @ / [/] # mount -u -o ro /usr @ / [/] # d Filesystem 1K-blocks Used Avail Capacity Mounted on /dev/sd1a 80222 34116 42096 45% / /dev/sd1d 101086 6 96026 0% /tmp /dev/sd1e 802846 610838 151866 80% /usr /dev/sd1f 59566 6210 50378 11% /var /dev/sd1g 100062 4 95056 0% /var/tmp @ / [/] # sync @ / [/] # ^D swapctl: adding 571a16618db10047.b as swap device at priority 0 setting tty flags pf enabled ddb.console: 0 -> 1 vm.swapencrypt.enable: 1 -> 0 kern.splassert: 1 -> 2 starting network starting early daemons: syslogd pflogd ntpd. starting RPC daemons: portmap ypbind. savecore: no core dump checking quotas: done. clearing /tmp starting pre-securelevel daemons:. setting kernel security level: kern.securelevel: 0 -> 1 creating runtime link editor directory cache. preserving editor files. ^Cstarting network daemons: sshd sendmail inetd. starting local daemons: cron. Tue May 14 20:31:55 GMT 2013 OpenBSD/mvme88k (bourbouillou.gentiane.org) (console) login: miod Password: (you'll never know) Last login: Sun May 12 10:09:14 on ttyp0 from tazenat OpenBSD 5.3-current (GENERIC) #59: Tue May 14 19:49:06 GMT 2013 Welcome to OpenBSD: The proactively secure Unix-like operating system. Please use the sendbug(1) utility to report bugs in the system. Before reporting a bug, please try to reproduce it with the latest version of the code. With bug reports, please try to ensure that enough information to reproduce the problem is enclosed, and if a known fix for it exists, include that as well. You have mail. >>> Terminal: /dev/console 43rd Law of Computing: Anything that can go wr fortune: Segmentation violation -- Core dumped miod@bourbouillou OpenBSD/mvme88k [/users/miod] $ sysctl hw hw.machine=mvme88k hw.model=Motorola MVME181, 20MHz hw.ncpu=1 hw.byteorder=4321 hw.pagesize=4096 hw.disknames=sd0:571a16618db10047,sd1:6f4eefc062e7a39d hw.diskcount=2 hw.physmem=8388608 hw.usermem=8376320 hw.ncpufound=1 hw.allowpowerdown=1 miod@bourbouillou OpenBSD/mvme88k [/users/miod] $ pstat -ks Device 1K-blocks Used Avail Capacity Priority /dev/sd1b 130341 2188 128153 2% 0 /dev/sd0b 131449 2348 129101 2% 0 Total 261791 4536 257255 2% miod@bourbouillou OpenBSD/mvme88k [/users/miod] $ w 9:02PM up 1:11, 1 user, load averages: 6.94, 6.53, 5.58 USER TTY FROM LOGIN@ IDLE WHAT miod co - 8:48PM 0 w miod@bourbouillou OpenBSD/mvme88k [/users/miod] $ uname -a OpenBSD bourbouillou.gentiane.org 5.3 GENERIC#59 mvme88k miod@bourbouillou OpenBSD/mvme88k [/users/miod] $ date Tue May 14 21:14:10 GMT 2013 miod@bourbouillou OpenBSD/mvme88k [/users/miod] $ /sbin/shutdown -h now Shutdown NOW! shutdown: [pid 29679] miod@bourbouillou OpenBSD/mvme88k [/users/miod] $ System shutdown time has arrived /etc/rc.shutdown in progress... /etc/rc.shutdown complete. syncing disks... done System halted. Press any key to reboot...
After a hectic evening of coding and testing, I have reached single user mode!
181-Bug>GO Effective address: 00680000 >> OpenBSD/mvme88k sboot [1.1] Network Controllers/Nodes Supported Driver CLUN DLUN Name Address Ethernet Address le0 2 0 VME376 $ffff1200 00:00:77:83:ac:56 boot: test -as boot: client IP address: 10.0.1.138 boot: client name: bourbouillou root addr=10.0.1.1 path=/netboot/bourbouillou/root 2511104+450484 [52+124704+109344]=0x30c49c Start @ 0x10000 Controller Address 0xffff1200 CPU0 is associated to 2 MC88200 CMMUs [ using 234472 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2013 OpenBSD. All rights reserved. http://www.OpenBSD.org OpenBSD 5.3-current (GENERIC) #38: Mon May 13 21:40:40 GMT 2013 miod@tarentaine.gentiane.org:/usr/src/sys/arch/mvme88k/compile/GENERIC real mem = 8388608 (8MB) avail mem = 4980736 (4MB) mainbus0 at root: Motorola MVME181, 20MHz cpu0: M88100 rev 0x8, 2 CMMU cpu0: M88200 (16K) rev 0x5, full Icache, M88200 (16K) rev 0x5, full Dcache afcon0 at mainbus0 addr 0xff800000 dart0 at afcon0 offset 0x640000 ipl 5: console vme0 at afcon0 offset 0x680000: system controller vmes0 at vme0 le0 at vmes0 addr 0xffff1200 ipl 3 vec 0x0: address 00:00:77:83:ac:56 le0: 128 receive buffers, 32 transmit buffers vs0 at vmes0 addr 0xffff9000 ipl 2 vec 0x1 vec 0x2: Jaguar vs0: channel 0 scsibus0 at vs0: 8 targets, initiator 7 sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST15150N, 0011> SCSI2 0/direct fixed ser ial.SEAGATE_ST15150N_00371805 sd0: 4095MB, 512 bytes/sector, 8388315 sectors sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST15150N, 0011> SCSI2 0/direct fixed ser ial.SEAGATE_ST15150N_00375704 sd1: 4095MB, 512 bytes/sector, 8388315 sectors vmel0 at vme0 vscsi0 at root scsibus1 at vscsi0: 256 targets softraid0 at root scsibus2 at softraid0: 256 targets boot device: le0 root device (default le0): sd1a swap device (default sd1b): root on sd1a swap on sd1b dump on sd1b WARNING: clock lost 4881 days -- CHECK AND RESET THE DATE! Enter pathname of shell or RETURN for sh: parity error @ / [/] # fsck -p /dev/sd1a (6f4eefc062e7a39d.a): file system is clean; not checking /dev/sd1h (6f4eefc062e7a39d.h): file system is clean; not checking /dev/sd1d (6f4eefc062e7a39d.d): file system is clean; not checking /dev/sd0i (571a16618db10047.i): file system is clean; not checking /dev/sd1e (6f4eefc062e7a39d.e): file system is clean; not checking /dev/sd1i (6f4eefc062e7a39d.i): file system is clean; not checking /dev/sd1f (6f4eefc062e7a39d.f): file system is clean; not checking /dev/sd1g (6f4eefc062e7a39d.g): file system is clean; not checking @ / [/] # mount -at ffs @ / [/] # sysctl hw hw.machine=mvme88k hw.model=Motorola MVME181, 20MHz hw.ncpu=1 hw.byteorder=4321 hw.pagesize=4096 hw.disknames=sd0:571a16618db10047,sd1:6f4eefc062e7a39d hw.diskcount=2 hw.physmem=8388608 hw.usermem=8376320 hw.ncpufound=1 hw.allowpowerdown=1 @ / [/] #
I still need to write code to read and write the TOD (that would get rid of the
``clock lost 4881 days'' warning), and the VME interrupt handling gets confused
after a while, failing to get VME interrupt vector numbers in time.
I am also worried about the parity error interrupt. I can't do anything but
acknowledge it... and dropping into the debugger might be a tad too harsh.
Tried booting multiuser... system eventually paniced with a kernel write to a NULL pointer. Traceback hints at a bug in CMMU code. It could also be a 88100 errata, this particular model using a very old mask (pre-C82N, so the ld.usr errata applies, but this one is supposedly handled correctly in the OpenBSD kernel.
Using a board lent to me by Matti Nummi, I am working towards MVME181 support.
I was expecting this work to be as simple as MVME141 support in
mvme68k land, but this beast has a very strange
interrupt controller, which I have not figured out the operation yet.
Here is the current boot log:
Copyright Motorola Inc. 1988, 1989, All Rights Reserved MVME181 Debugger/Diagnostics Release Version 3.01 - 08/17/89 COLD Start sio txrx: Transmit/Rec.................................... RUNNING > PASSED Autoboot in progress... To abort hit <BREAK> Autoboot Failed 1) Continue System Start Up 2) Select Alternate Boot Device 3) Go to System Debugger 4) Initiate Service Call 5) Display System Test Errors 6) Dump Memory to Tape Enter Menu #: 3 181-Diag>SD 181-Bug>LO 0 181-Bug>GO Effective address: 00680000 >> OpenBSD/mvme88k sboot [1.1] Network Controllers/Nodes Supported Driver CLUN DLUN Name Address Ethernet Address le0 2 0 VME376 $ffff1200 00:00:77:83:ac:56 boot: bsd boot: client IP address: 10.0.1.138 boot: client name: bourbouillou root addr=10.0.1.1 path=/netboot/bourbouillou/root 2511096+450472 [52+124608+109259]=0x30c3d4 Start @ 0x10000 Controller Address 0xffff1200 CPU0 is associated to 2 MC88200 CMMUs [ using 234292 bytes of bsd ELF symbol table ] Copyright (c) 1982, 1986, 1989, 1991, 1993 The Regents of the University of California. All rights reserved. Copyright (c) 1995-2013 OpenBSD. All rights reserved. http://www.OpenBSD.org OpenBSD 5.3-current (GENERIC) #23: Sun May 12 20:25:46 GMT 2013 miod@tarentaine.gentiane.org:/usr/src/sys/arch/mvme88k/compile/GENERIC real mem = 8388608 (8MB) avail mem = 4980736 (4MB) mainbus0 at root: Motorola MVME181, 20MHz cpu0: M88100 rev 0x8, 2 CMMU cpu0: M88200 (16K) rev 0x5, full Icache, M88200 (16K) rev 0x5, full Dcache afcon0 at mainbus0 addr 0xff800000 dart0 at afcon0 offset 0x640000 ipl 3: console vme0 at afcon0 offset 0x680000: system controller vmes0 at vme0 le0 at vmes0 addr 0xffff1200 ipl 3 vec 0x0: address 00:00:77:83:ac:56 le0: 128 receive buffers, 32 transmit buffers vs0 at vmes0 addr 0xffff9000 ipl 2 vec 0x1 vec 0x2: Jaguar vs0: channel 0 scsibus0 at vs0: 8 targets, initiator 7 sd0 at scsibus0 targ 0 lun 0: <SEAGATE, ST15150N, 0011> SCSI2 0/direct fixed serial.SEAGATE_ST15150N_00371805 sd0: 4095MB, 512 bytes/sector, 8388315 sectors sd1 at scsibus0 targ 1 lun 0: <SEAGATE, ST15150N, 0011> SCSI2 0/direct fixed serial.SEAGATE_ST15150N_00375704 sd1: 4095MB, 512 bytes/sector, 8388315 sectors vmel0 at vme0
As we are closing towards the 5.3 release, I can look back at the work achieved
during this release cycle and call it a dayrelease.
I only wish Allen Briggs could share this joy... Rest in peace pal, the world
is missing you.
PIC bugs are still being found, but I have almost completed building a whole PIC userland. Still, every day turns out to need one gcc fix and one ld bugfix...
Turns out -fPIC still had problems - the previous fixes would prevent proper PIC relocations from being generated for the second instruction loading a symbol address from the GOT (while -fpic, using only one instruction, was unaffected). I have hopefully been able to improve this (i.e. fix my previous fix), and am looking forward delivering an OpenBSD/mvme88k snapshot with shared libraries!
A few fixes went in. Hopefully -fPIC is reliable now.
Lazy binding works. -fPIC (as opposed to -fpic) doesn't. Still, getting closer...
After being stuck with a relocation problem in ld for almost two weeks, I have finally seen the light (a tweak to as made the problem go away). ld.so is working to some extent (lazy binding doesn't yet). I am now trying to build a snapshot with shared libraries and dynamically linked binaries in /usr, before I enable shared libraries.
The switch to ELF has been completed and commited. Next step, a PIC toolchain...
I have reached the point where I have a working and self-hosting ELF mvme88k port. I need to clean a few things in the compiler changes, and the m88k ports will be ready to switch from a.out to ELF as their binary file format.
OpenBSD 5.5 is the last release.
Between releases, -CURRENT snapshots are available and will occur on
a semi-regular basis.
The switch from gcc 2.95.3 to gcc 3.3.6, as well as the switch from a.out to ELF, have been completed.
Shared libraries and dynamically-linked binaries seem to work well; the current
ld.so will fail to load binaries with text relocations
on purpose, to help spot binaries not correctly linked (e.g. against the
static libgcc.a instead of the -fPIC libgcc.a). Once the bad citizens in ports
are fixed, I'll probably add support for the text relocations to cope with
minor errors, and we'll see.
Update: it turns out this was a bad idea. DISP26 relocations can not
work when libc is loaded more than 256MB away from the code static linking to
it. It might be better to always link against the -fPIC libgcc to avoid this
kind of problem.
I have started working on a gcc 4.2.1 m88k backend (basically updating the
3.3.6 backend to cope with newer trends in gcc development). The resulting
code quality is much worse than 3.3.6 at the moment, but hopefully this can
get improved.
Supported cards:
Not supported yet: