Author: Michael Boehmer
last modified: 17.05.2005

The DMA problem on Prometheus

The Prometheus PCI bus board can operate with up to four PCI cards. In normal usage as a VGA card, and also with 10MBit ethernet cards everything works just fine.
Problems start if PCI cards are in the system which need DMA (Direct Memory Access) to work correctly. Almost all modern PCI cards use this feature (USB cards, 100MBit ethernet, sound cards, ...).
The Amiga system freezes shortly after the software using the DMA card has been started; in other cases the VGA display is getting distorted or data errors occur.

How DMA works on Prometheus

You may regard the PCI part of Prometheus as a completely isolated computer system. PCI cards located there can operate on their own without even knowing what is happening on the Amiga side.
DMA is a crucial feature of PCI bus operation; it means that a single card is taking the role of a bus master and is doing data transfer just like a CPU would.

In every DMA transfer you have a master (initiator) and a slave (target). The master is requesting transfer time on the PCI bus (from the arbiter, more on this later) and starts its transfer as soon as he is getting the bus granted.
DMA on Prometheus will always be between two PCI cards, and never between a PCI card and the Zorro bus.

In general, the DMA card will transfer its data to or from the PCI graphics card using a part of the display memory as data buffer. Data from the DMA card will be fetched after DMA from the Amiga CPU by reading; the other way round the Amiga CPU will deposite the data aimed for the DMA card inside the memory buffer and instruct the DMA card to fetch it there.

Signal description

To understand the problem and the timings from the logic analyzer some explanation is needed. Please keep in mind that a "#" following a signal name indicates that this signal is active low.
The signals FRAME#, IRDY#, TRDY# and DEVSEL# are used as basic handshaking signals for PCI data transfer. CBE[3:0]# signals are used for setting the transfer type of the access; the also are used as data strobes in the data phases of the access.

AD[17:16] are two of the 32 multiplexed address / data lines.

REQ0# is the bus request line from PCI slot 0 (here: USB card) to the arbiter; REQ1# is the same for the Voodoo3 card in PCI slot 1. The arbiter is granting the bus to a requesting master by asserting its grant line: GNT0# for the USB card, GNT1# for the Voodoo3.

One Zorro III signal is of concern here: /SLAVE is an active low signal indicating the the Amiga CPU us accessing the Prometheus card (and in this case, the PCI bus).

Big Brother: the arbiter

There is one instance in the Prometheus system which is responsible for doing the bus arbitration. This means that different masters which want to access the PCI bus need to tell the arbiter that they want access; the arbiter will then distribute access slots to each of them. A master requesting bus access but not taking its chance will timeout and the next requesting bus master will be served.

In the Prometheus system we have five different potential masters: the four PCI slots (more specific, if DMA is used one slot is the graphics card, so only three real masters will be on PCI cards as long as the graphics card does not do DMA on her own) and the Amiga CPU on the Zorro III bus.

While arbitring between PCI cards is fairly easy, getting the Amiga CPU in the game is rather tricky. The CPU does not known anything about the state of the PCI bus, and even asking for the current status wouldn't help, as a single Zorro III cycle takes about 175ns minimum (Prometheus: about 300ns), whereas one PCI cycle can be done in about 100ns - the CPU will never get a current status.

The main problem is that Zorro III cycles cannot be extended in time beyond a limit of 1us; if the bus cycle is longer it will be broken by the Buster chip which will give a GURU meditation. Therefore accesses from Zorro III to PCI must always be treated highest priority.
If the Amiga CPU is starting an access, the arbiter will take over the PCI bus (if granted to a PCI master) and grant it the Amiga CPU. This means that a running PCI DMA transfer will have to be finished fast (this shouldn't be a problem, as PCI cards know about this rule).

The smoking gun

So let's take a closer look on the problem. I could reproduce the error condition by simply using a NEC USB2.0 card as PCI master together with a Voodoo3 as DMA buffer. The USB card is in PCI slot 0, the Voodoo3 in PCI slot 1.

timing of the bug
Snapshot of the broken PCI access

What is happening:

By this bug (which will happen to each Prometheus out there, once in a while, at a statistical point) both the DMA transfer between the USB card and Voodoo3 memory, as well as the data transfer between Amiga CPU and Voodoo3 card have produced scrambled data.

Bug fix

I included a small hack in the Prometheus arbiter. It simply delays the internal grant signal for the Amiga CPU after deasserting the PCI GNTx# (i.e. getting the current PCI master off the bus).
Now the Amiga CPU access is delayed until the current master has finished its DMA transfer.

timing of the bug fix
Snapshot of the workaround

What is happening now: in principle the same as explained above. The main difference is now that the Amiga CPU access (visible by /SLAVE asserted low) which occurs in parallel to a running PCI DMA is not disturbing the DMA transfer. The arbiter is getting the PCI master off the bus (GNT0# deasserted), waits for the end of the PCI cycle (TRDY# asserted by the PCI slave) and then starts its own PCI access (FRAME# asserted low).
The USB card is not happy about being taken off the bus, so it requests the bus again immediately by asserting its REQ0# line; with the end of the Amiga CPU access the arbiter is granting the bus back to the USB card (GNT0# asserted again). The USB card is the doing one more DMA transfer.

New arbiter under development

To get rid of the know limitations of the current arbiter (which has serious problems with fairness arbitration) a new arbiter concept is under development. The PCI bus arbitration is based on several simple rules, which are easy to understand but hard to implement.

Rules of engagement:

A first version of this arbiter is under test now: in slot 0 there is a NEC USB card, in slot 1 an ethernet 8139 card. Slot 3 carries the Voodoo3 card being used as DMA buffer.

timing of the new arbiter
Snapshot of first tests with a new arbiter design

You can see that the Zorro master is accessing the bus frequently (/SLAVE and /DTACK signals mark each Zorro access). At the same time the USB controller is asking for bus access, and also the NIC wants to transfer data.
On the left side, the bus is granted to the USB controller (three small time slots on GNT0#, between high priority Zorro accesses); the waiting NIC is granted the bus after the USB controller has used up his maximum clock cycles (four small time slots on GNT1#). Please note that the USB controller again wants the bus (REQ0# is asserted), but the arbiter is regranting it to the NIC, as his time slot is still not used up.
After this, the PCI bus is given to the USB controller (again three time slots on GNT0#), then the NIC is finishing his transfer (which didn't fit into the first granting period) by one small GNT1#; the rest of this time slot is skipped as the NIC doesn't want the bus anymore, so the pending request of the USB controller on REQ0# is being served.

This is work under progress.