E450 Support and bus_space cleanup

First of all, if you don't like experimental code, don't even think of using this.

First Generation

Here are some changes that let my E450 run OpenBSD -current. There were a few things that needed to be fixed and this was my first hack at it:

  • The E450 ("SUNW,Ultra-4") seems to need the same OpenFirmware work-around as the E250 ("SUNW,Ultra-250"). Without docs regarding the "test"/"test-method" code, I can only say that without the change, OF REDs back to the OF console when called. [Checked in 30-Jan-03]
  • The bus_space_map2() call modifies the ASI of the bus tag (indirectly, it really modifies an index into a table that looks up the ASI). This causes all sorts of grief (especially when the root bus tag is clobbered). Then meaning of any bus_space*() call using that tag then depends on what was passed to bus_space_map2() the last time someone called it with the same bus tag (or, in some cases, with any child of the same bus tag). The implemented change is to make bus_space_handle_t a struct that includes the ASI and pass it by value. That isn't quite as bad as it sounds since most (all?) calls that use it in the critical path get inlined anyway.
  • Many places simply cast addresses to bus_space_handle_t's. Quite apart from the stupendously ugly hack that it is, using a struct for bus_space_handle_t breaks this code completely.
  • The streaming buffer code assumed flushes used 8-byte buffers to indicate completion. The psycho actually writes 64-bytes to a truncated 64-byte boundary (leading to trouble if the pointer supplied to the hardware doesn't point to an appropriate area of the required size).
  • The streaming buffer code was missing a some membar()s (??? The v9, US-II, IIi, and III manuals all seem to have different things to say about this).
  • The psycho's interrupts are mapped to UPA port 0. Since this box only has CPUs 1 and 3 populated, no interrupts ever got through. intr_establish() now insists on remapping interrupts to point to the current CPU (this needs to be revisited for MP support, but even then, there are certain advantages to sending interrupts to someplace other than /dev/null).
  • The mainbus bus tag has been made const. This puts it in "rodata" which makes it much more difficult to accidentally clobber it (the system faults). This could perhaps be useful for other kernel objects. This change is not required to get the E450 booting.
  • bus_space_tag_t was made a pointer to const. The only code that has any reason to muck with it should be working with (a pointer to) the underlying struct anyway. (There is no abstraction in a constructor; it needs to know what it's creating.) This change is not required to get the E450 booting.
  • In order to try to be more strict about argument types and to prevent the odd bugs that can result from using macros that declare variables (consider a macro with an argument that evaluates to the name of one of the local variables), much of bus_space*() code in bus.h was converted from macros to inline functions. A spot check suggests that the generated code is pretty much identical.

This diff against -current gets my E450 up and running well enough to do “make build” (there are still occasional messages along the lines of, “timeout delayed -3,” but they are no longer spewing out continually and since they only show up on DEBUG kernels, I don't know if this has anything to do with these patches). It is also happy on my U5. Here's what an OpenBSD E450 boot looks like. This diff was made against -current as of Nov 6th; there may be some conflicts if applied to a later version of current (hint: check out with -D "Nov 7"). I'm leaving this diff around as it has run both a U5 and E450 without grief since early November (no warmboot issues, panic()s, or other strangeness).

Second Generation

I'm working on a cleaner solution for these issues. The basic idea is to add some diagnostic code to the bus API, get rid of bus_space_map2(), and force each different kind of bus access to use a bus tag appropriate for that particular way to access a bus (so, each bus tag has an ASI for normal accesses). A flag bit has been added to allow bus_space_map() to support pre-mapped addresses (previously mapped by the boot loader). The streaming buffer code still needs to be split into per-buffer pieces. e450_new_20021213.diff.gz.

Third Generation

A later generation of this code adds a script to generate the four copies of each fundamental bus_space operation (one for each of the word sizes: 1, 2, 4, 8). Some more awk hacking and it should be possible to generate the raw and non-raw functions from the same template. Most of the implementations for these functions are huge and have been moved out of the headers to save kernel space (looks like GENERIC shrinks by ~200k). This makes experimenting with the bus_space implementation much less painful. Some changes/optimizations to the low-level ld*/st* operations have been made; much of it is from FreeBSD's sparc64 port. The existing *_wenable() code in clock.c has been changed to use a new function: bus_space_protect(). (This could also have been handled by calling the pmap code directly.) Previous versions of the clock.c change had no effect other than rendering the RTC unreadable. There are also a few minor changes that allow the kernel to compile with gcc-3.2.x. At any rate, a full diff between my “sys” tree and -current that appears to be happy on Ultra 1, Ultra 5, Blade 100, and Enterprise 450 can be found here: bus_space_20030215b.diff.gz.

The End

bus_space_20030215.bdiff.gz was merged into the OpenBSD CVS repository on Feb. 16th. All resulting gremlins, glitches, and grief are probably my fault. Let me (henric [ at ] openbsd [ period ] com) know so that I can fix them. If at all possible, build a kernel with the below debug options and send dmesg output and the output of a trace if the system drops into ddb.

If there are problems on a machine, it is probably a good idea to enable bus_space assertions. Make sure the BSDB_ASSERT flag is set when sparc64/sparc64/machdep.c initializes bus_space_debug and use a kernel config like this:

include "../sys/arch/sparc64/conf/GENERIC"

option          DEBUG
makeoptions     DEBUG="-g"      # compile full symbol table

option          BUS_SPACE_DEBUG

To do

  • The ASIs used by the new code should be reviewed to make sure they make sense (which is hopefully equivalent to making sure they match what was intended before the bus_space changes).
  • The streaming buffer code needs to be reviewed. There are six PCI busses in this box, each with a streaming buffer. The right one (or all of 'em) need to be properly flushed after each DMA or it could lead to nasty, difficult to reproduce data IO corruption.
  • The whole sparc64 codebase should probably be reviewed for proper use of of membars. The system runs TSO (?), but that only relates to interactions within the coherency domain. Talking hardware devices and the like can have surprising results if the proper membars are not there. BTW, getting this thing to run RMO (or PSO) might not be so hard, after all, there is no existing MP infrastructure worry about. A config option coupled with some kind of conditional membar directive could be implemented.
  • Review the ASI/membar/etc. usage in pmap.
  • IOMMU and streaming buffer access needs to happen at a safe enough PL such that it doesn't get clobbered if one driver interrupts another while it is being used. (Can this happen?)
  • Verify clobbers for the inline assembly (esp., “cc” and “memory“).
  • The sparc64 tree uses two each of sbusvar.h and sbusreg.h. (A variant of the former also apears in the sparc tree.)
  • The code that creates new bus space tags can probably be consolidated.
  • The code that searches for parent tag with non-zero function pointers can probably be consolidated.
  • There are redundant membars scattered throughout (and some that may not be redundant).
  • The default_type member of bus_space_tag_t isn't used for anything close to what the name suggests.
  • There may be some debugging code still left in there (pciide/wdc comes to mind).
  • Some places in the code use C++ style comments. This is to make sure I remember to revisit them before thinking about checking in stuff. (“XXX” was already used.)
  • Console drivers should protect themselves from BSDB_ALL_ACCESS by setting the BSHDB_NO_ACCESS flag. This is only relevant when BUS_SPACE_DEBUG is defined in the kernel config (not the default). At present, only the sab.c driver does so and it does it unconditionally (it should only do so when it is the console).
  • Testing on other boxes...

Notes

Here's a diagnostic boot on the thing.

This is what booting a FreeBSD ISO looks like and here's the same for Gentoo.

Sun has some stuff to say about the E450.

Unsafe Interrupt Code ("softclock issue")

The sparc64 code that deals with pending interrupts uses the same field both as a link in a null-terminated, singly-linked list and as a flag to indicate that the handler is already on the list. For those who don't know what a singly-linked list is, trust me when I say that this is not a good thing.

The most common symptoms reported for this include ping stopping after one packet, top refusing to update automatically, and that PF states stop timing out. The simplest way verify that this is the bug that is causing the problem, is to break into ddb(4) (send a serial break if using a serial console), type “break softclock”, and then continue the debugger with “c”. If the debugger doesn't stop in softclock() almost immediately, then this patch should hopefully fix it. If it does stop, type “delete softclock” to remove the breakpoint, and “c” to continue running. If softclock() is operating normally, this is what the console should look like:

kdb breakpoint at 124a7f0
Stopped at      Debugger+0x4:   nop
ddb> break softclock
ddb> c
kdb breakpoint at 109904c
Breakpoint at   softclock:      save            %sp, -0xd0, %sp
ddb> delete softclock
ddb> c

If softclock() has stopped, then expect to see this:

kdb breakpoint at 124a7f0
Stopped at      Debugger+0x4:   nop
ddb>
ddb> break softclock
ddb> c

If the second “Breakpoint at” never appears, this interrupt issue it is likely the cause.

For those crazies out there that need to keep a box running at all costs, use the debugger to find where softclock_si is pointing—it is a pointer to a struct intrhand (defined in sparc64/include/cpu.h), then change the ih_pending field such that it points to zero instead of pointing to itself (yes, it usually does point to itself; that's really bad, but not as bad as it could be since the way entries are removed from the list usually causes the otherwise infinite loop to be broken). softclock() should start running again after that.

The following patch adds an ih_busy field to struct intrhand. This is used to indicate when the handler is in use. The ih_pending field is now only used as a link in the pending list. Some things were also moved around slightly to make it safe (safer?) to insert a software interrupt while the handler is running.

sparc64_intr_20030319.diff.gz This was checked in to -current on 20-Mar-03.

Ultra 5 Boot Hang

At least some Sun U5 systems occasionally hang during startup right after “pcons at mainbus0 not configured.” This may be more common during warm than cold boots. On at least one U5, this hang is due to the CMD PCI0646U IDE controller chip (this one has the date code 9909, plus the following text “KAG0” and “0007 Japan”) entering an unexpected state. The hang is the result of the IDE interrupt handler being repeatedly entered without either of the interrupt status bits being true. Without any interrupt status bits, the handler returns and is then immediately reentered. It is unknown how widespread this is or if related boxes have the same problem (e.g., the Ultra 10).

With a logic probe on the CMD chip's INTA# line (pin 84), it was verified that the CMD chip is asserting the interrupt line. The interrupt can be deasserted by masking the secondary channel interrupt in the MRDMODE register (writing 0x20 to offset 0x71). Unmasking the interrupt in the MRDMODE register reasserts INTA#. (The same can be accomplished by switching the chip between PCI and legacy modes.) The interrupt status bits in the CFR, ARTTIM23, and MRDMODE registers all read false (0). Attempting to clear the phantom interrupt by writing “1”s to these bits has no effect.

Here is a dump of the nearby registers during one of these events:

ddb> x /bx 0xe0039850,20
0xe0039850:     40  ec  0   c0  3f  c0  0   8c  3f  40  0   0   0   0   0
0xe003985f:     0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
0xe003986e:     0   0
ddb> x /bx 0xe0039870,20
0xe0039870:     8   0   20  0   f4  ff  ed  be  8   0   20  0   dc  fb  ff
0xe003987f:     ff  0   0   0   0   0   0   0   0   0   0   0   0   0   0
0xe003988e:     0   0
ddb>

CMD's website does provides some basic information on the CMD PCI0646U, but not enough to shed any light on this problem.

Experimentation has suggested a workaround, but without more insight into the underlying cause, one would hesitate to call it a “fix”. Diffs were created against -current (24-Jan-03) and against the 3.2 branch; the latter has not been tested: u5hang-current_20030124.diff.gz and u5hang-3.2.diff.gz. It was checked in to -current on 30-Jan-03. A U5 that does hang regularly with an unpatched kernel has been running without trouble for over 48 hours with “sleep 70; shutdown -r now” in rc.local. Using delay(5000) instead of wdcreset() does hang, which suggests that the underlying problem is not simply a race (granted, the execution time of wdcreset() and delay(5000) is unlikely to be the same). Regardless, it is believed that the wdcreset() call is harmless and the consistent lack of hangs suggests the patch has some merit.

It has been suggested that the following would clear the phantom interrupt:

bus_space_read_1(wdc_cp->cmd_iot, wdc_cp->cmd_ioh, wdr_status & _WDC_REGMASK);

One of the earlier bus_space iterations triggered some odd IDE behavior during startup. One of the symptoms is identical to the boot hang problem (at least as far as one can see from a dmesg). Adding the read from the status register to the IDE interrupt handler does clear the stuck interrupt condition (see pciide.c in the bus_space diff for details). The logic probe experiment should be repeated with this code applied to a clean tree.

Slight changes to the way bus_space does 8-bit reads/writes can cause messages like these during startup instead (which eventually recover by themselves):

pciide0:1:0: device timeout, c_bcount=58, c_skip=0, status=58, ireason=01

Note that the PCI config space accesses always use 32-bit wide operations. This implies that whatever is triggering the “device timeout” lies outside the cmd0646U-specific code. Consider also that every emulated 8-bit access to PCI config space will read/write to the 24 adjacent bits as well with no guarantee that this is a NOP. One shudders to think of the subtlety of the bugs that could result.

The only thing that does suggest itself is that there is something going wrong before the wdcreset() call. Workarounds and hacks to undo the damage are better than nothing, but a solution to the underlying problem would be even better.

Any additional information, even of the, “Hey, it fixes it for me,” variety would be appreciated. I've heard from a few people that this patch does fix their U5 reboot problems (thanks).

(Please note that the ratio of hours worked to lines of code produced was rather large.)

Intel Pro/1000 Support

FreeBSD's “em” version 1.6.6 has been merged and it seems to be working on sparc64. This adds support for a number of new chips. Some more testing is still on the agenda. There's a good chance that all PCI-capable platforms can now use this driver, but the only platform other than sparc64 where it has been reported to work is x86.

For 3.3-current systems: em_20030613b.diff.gz
For 3.3-stable systems: em-stable_20030613.diff.gz

These changes were checked in to -current on June 13th.