68kMLA Classic Interface
This is a version of the 68kMLA forums for viewing on your favorite old mac. Visitors on modern platforms may prefer the main site.
| Click here to select a new forum. | | Phoenix: Open-Source NuBus FPGA Accelerator for 68040 Macs | Posted by: Sailcat on 2026-02-10 01:41:40
Hey all,
I've been designing an open-source NuBus FPGA accelerator card targeting the Quadra 700 (and other 68040 NuBus machines). The idea is to give the 68040 hardware acceleration for things it was never designed to do — TLS 1.3 cryptography, hardware blitting, basic video decode, DSP, and general-purpose compute — without replacing anything about Classic Mac OS.
I have a KiCad schematic, a design document, and a BOM ready for review. Looking for feedback before I move toward PCB layout.
GitHub repo: https://github.com/sailcat/phoenix-nubus
What's on the card
- FPGA: Lattice ECP5 LFE5U-85F (BGA-381) — chose this specifically for the open-source toolchain (Yosys + nextpnr + Project Trellis). 84k LUTs, which gives room for all target cores simultaneously with about 25% headroom.
- SRAM: 2x IS61WV51216EBLL — 1MB each, 10ns async. One dedicated to crypto/watchdog buffers, one to blitter/DSP scratch.
- SDRAM: IS42S16320F — 64MB, 166MHz capable. Frame buffer, video decode reference frames, bulk storage.
- Flash: W25Q128JVSIQ — 16MB SPI. Stores multiple bitstream images (switchable via DIP).
- Level shifters: 3x SN74LVC16T245 — bidirectional 5V/3.3V translation for the full NuBus bus (AD[31:0] + control signals).
- Power: 3x TLV62568 buck converters with TPS3700 supervisor for sequenced power-up (1.1V core → 2.5V aux → 3.3V I/O, per ECP5 requirements).
- Clock: 50MHz MEMS oscillator. FPGA PLLs derive everything else internally. 10MHz NuBus clock comes in through a level shifter.
Estimated BOM is $60-90 at qty 1. Total power draw around 2-3W, well within the NuBus per-slot budget.
Target accelerator cores
| Core | Est. LUTs | What it does |
|---|
| Crypto Engine | ~12k | AES-256, ChaCha20, SHA-256, Curve25519 — enough for TLS 1.3 | | Graphics Blitter | ~8k | Hardware blit, scale, rotate, alpha blend | | Video Decode | ~15k | Motion compensation, color space conversion, targeting 15-20fps @ 320x240 | | DSP | ~10k | 8-channel mixer, sample rate conversion, wavetable synthesis | | Compute Unit | ~8k | Vector MAC, matrix multiply | | Watchdog | ~2k | Bus monitor + DMA for memory protection via 68040 MMU |
Total is ~63k LUTs out of 84k available.
How it talks to the MacThe card sits in a standard NuBus slot and maps its registers into the assigned 256MB address window. A system extension (INIT) detects the card via Slot Manager, installs the interrupt handler, and exposes a shared library (PhoenixLib) with C-callable APIs for each accelerator core. Software talks to the card through memory-mapped register writes — nothing exotic.
The card also supports DMA bus mastering for bulk transfers (frame buffer writes, watchdog shadow copies) and generates interrupts via /NMRQ for completion notifications.
The companion bridge conceptThe card handles acceleration. A Raspberry Pi on the local network handles the internet-facing stuff — TLS termination, HTTP content simplification (strip tracking/JS/autoplay), protocol translation, media transcoding. The Pi prepares data, ships it to the card over Ethernet, and the card's crypto/video/DSP engines do the heavy lifting. The Mac never touches the raw modern internet directly.
Current stateThe schematic is architecturally complete — all components selected, all inter-sheet connections defined, design constraints documented. The full pin-level wiring (especially the BGA-381 fanout) needs to be finished in KiCad before layout. The design document in the repo covers component rationale, NuBus register map, PCB stackup notes, FPGA fabric allocation, and a preliminary C API.
No HDL written yet. No PCB layout started.
What I need from youSpecific questions:
- Quadra 700 NuBus cage clearance — Does anyone have physical measurements of the vertical clearance above a NuBus card in the Q700? I need to confirm component height constraints. The board is designed as a short Eurocard (100mm height) but I need to know if tall components on the top side are going to be a problem.
- NuBus Declaration ROM — Has anyone here written an sResource directory from scratch? I've read Designing Cards and Drivers for the Macintosh Family but practical experience with the format would be incredibly helpful. If anyone has disassembled the ROM from an existing NuBus card and has notes, I'd love to see them.
- ECP5 vs Artix-7 — I went with the ECP5 for open-toolchain reasons, but the Artix-7 (XC7A100T) has more fabric and better DSP blocks. Anyone have strong opinions here? The ECP5 is well-proven in the open-source FPGA community (ULX3S, etc.) but I'm open to arguments.
- Level shifting approach — Is three 74LVC16T245s the right call for the NuBus interface, or is there a better approach people have used? The bidirectional direction control adds a bit of complexity since the AD bus is multiplexed. Particularly interested in how to handle /NMRQ (open-drain on NuBus).
- PCB layout help — I'm looking for someone experienced with BGA fanout on 4-layer boards. The ECP5 BGA-381 at 0.8mm pitch is the hard part. This would be a paid gig, not asking for free labor. If anyone does this kind of work or can recommend someone, please reach out.
- NuBus connector sourcing — Where are people getting DIN 41612 Type C connectors for new card designs these days? Any preferred suppliers?
Everything is on GitHub, open source (leaning CERN-OHL-S for hardware, MIT for software). Happy to answer questions about any part of the design.
Edited to Add:
I want to call out @Melkhior's NuBusFPGA project — which I discovered via the similar threads prompt right after posting this. That project has already solved several of the problems I'm facing, particularly the Declaration ROM development workflow (QEMU digital twin approach), NuBus bus interface design, and the LiteX/Wishbone bus fabric for connecting multiple devices inside the FPGA. Phoenix is targeting a different use case (accelerator cores rather than video output), but the NuBus interface engineering and driver architecture are directly relevant, and I expect to learn a lot from that codebase. If you're interested in NuBus FPGA work generally, that thread is essential reading.
| Posted by: GRudolf94 on 2026-02-10 02:42:09 Were these kicad files... written by AI? | Posted by: Phipli on 2026-02-10 03:00:48 @Sailcat I usually just use the report button to report my own post rather than DM wthww | Posted by: Sailcat on 2026-02-10 03:01:59 The design is mine, the KiCad files were generated with AI assistance. I'm stronger on the architecture and systems design side than I am on KiCad — so I used Claude to get the schematic into files from my design spec. The component selections, architecture, and design constraints are all my own work. The pin-level wiring still needs to be finished by hand in KiCad before layout, which I noted in the post. If that's a dealbreaker for anyone here I understand, but the engineering is real and I'm here to get it right. | Posted by: GRudolf94 on 2026-02-10 03:10:59 Not a problem with claude per se, more of a problem with the files containing a bunch of invalid syntax that doesn't open in either of the kicad installs I have here (and being still broken after me manually patching them a bunch) 🙂
To be honest, there's not a lot to review here, and a lot more to do before this is close enough to go on a PCB. You ideally have to make the pinmap, HDL, and layout walk together. I've not written a single line of VHDL in close to 10 years, so I won't be much help with that, but you really want that part to be better-defined before presenting it, or worrying about other details of the physical implementation... which are mostly trivial (clearances are well-documented I think, even for the Q700, modern '245s are fast enough to be used as nubus buffers, open drain is open drain and !NMRQ is slave-to-master only so not a lot to worry about, DIN41612 connectors are still made and widely available) | Posted by: Melkhior on 2026-02-10 07:14:58 You found the NubusFPGA, but some answers that may not be in the repo:
- ECP5 vs Artix-7 — I went with the ECP5 for open-toolchain reasons, but the Artix-7 (XC7A100T) has more fabric and better DSP blocks. Anyone have strong opinions here? The ECP5 is well-proven in the open-source FPGA community (ULX3S, etc.) but I'm open to arguments.
Unless you need a specific feature of the FPGA (like 7-series' TMDS signalling for HDMI), I'd say
* Toolchain you're accustomed to for design, synthesis & simulation
* Package complexity/number of pins; 1.0mm pitch BGA are easier to deal with than 0.8mm, QFP might be an option for some FPGAs but probably not at the size you want (see DoubleVision). The power supply/decoupling might be tricky as well.
* Spartan-7 might be a better alternative to Artix-7 if you don't care for the high-speed I/Os only available in Artix-7
* No idea for ECP, but Spartan/Artix can handle DDR2/3 memory
- Level shifting approach — Is three 74LVC16T245s the right call for the NuBus interface, or is there a better approach people have used? The bidirectional direction control adds a bit of complexity since the AD bus is multiplexed. Particularly interested in how to handle /NMRQ
CB3T family is expensive and not true shifter, but they are bidirectional and really, really fast. For NuBus they are probably overkill (didn't stop me) as the delay from a normal shifter is unlikely to be an issue, and would likely be cheaper.
- NuBus connector sourcing — Where are people getting DIN 41612 Type C connectors for new card designs these days? Any preferred suppliers?
Those DIN connectors, in particular Nubus' 3x32 pins version, are still very available (the 3x40 of the '030 PDS less so, the unrelated 3M PAK50 from the '040 PDS is unobtainium unfortunately). Pick the cheapest you can get. | Posted by: paws on 2026-02-10 12:12:37 Interesting idea!
You mention some music/audio related functionality. I would encourage you to look at Digidesign's Sound Accelerator/Audiomedia and SampleCell cards.
The Sound Accelerator and Audiomedia cards were basically a Motorola 56k DSP chip with its "host port" connected via Nubus and audio input/output hardware. Early versions of the ProTools DAW, the Sound Designer audio editor, and the Turbosynth synthesizer were partially written as 56k DSP programmes that were uploaded from the host to the card. There is some documentation available, and I also have archived a DIY programme somebody made to loader 56k DSP programmes to an Audiomedia card, as well as the toolchain.
The SampleCell hardware was basically a hardware sampler in an ASIC controlled by a very decent software editor running on the Mac. The ASIC, of course, isn't documented, nor is the protocol used to communicate. But with an FPGA running on the Nubus I think it'd be possible to reverse engineer it fairly quickly.
It might be a different sort of project than what you're interested in, but I think the thing to consider here is software. There's a *lot* of person-hours in this software, which was made for professional use and honestly still holds up. Making new hardware to talk to existing high-quality software is something to consider, I think. | Posted by: joshc on 2026-02-10 13:05:36 I’m not sure the world needs AI generated hardware for old Macs.
This sounds like it would be highly expensive. | Posted by: Nixontheknight on 2026-02-10 13:43:26 I wonder if this can be made to work over the version of LC-PDS that the LC III and later have | Posted by: Melkhior on 2026-02-10 13:59:44
I wonder if this can be made to work over the version of LC-PDS that the LC III and later have Physically the pinout is different, but logically it's equivalent to the IIsi/SE/30 (except it's 25/33 MHz in the LCIII/III+instead of 20/16). So a different board is required, but the gateware will need a new pinlist at most and the software can be the same.
EDIT: oups, I was thinking of the PDS. This is meant to be a NuBus device, so no; it will require a completely different bus interface to talk to any PDS slot. Doable, but not as easy as adapting from one of the '030 PDS to another. | Posted by: Nixontheknight on 2026-02-10 14:01:04
Physically the pinout is different, but logically it's equivalent to the IIsi/SE/30 (except it's 25/33 MHz in the LCIII/III+instead of 20/16). So a different board is required, but the gateware will need a new pinlist at most and the software can be the same. and higher in the 68040 LCs and PowerPC 5xxx/6xxx Performas | Posted by: Byrd on 2026-02-10 22:34:07
The companion bridge concept
The card handles acceleration. A Raspberry Pi on the local network handles the internet-facing stuff — TLS termination, HTTP content simplification (strip tracking/JS/autoplay), protocol translation, media transcoding. The Pi prepares data, ships it to the card over Ethernet, and the card's crypto/video/DSP engines do the heavy lifting. The Mac never touches the raw modern internet directly
Not an EE and don't get the recent projects here of let's throw AI at it and put it on Github, but why not a variant of the PiStorm in a Mac? We are cheap and nobody is going to fork out $1000+ for this thing that the Amiga enthusiast might consider. | Posted by: Phipli on 2026-02-10 23:46:32
and higher in the 68040 LCs and PowerPC 5xxx/6xxx Performas If you mean the speed of the LCPDS is higher in those machines, to maintain compatibility it is 16MHz in the 630/6200/6300 etc. Regardless of the computer's bus speed. | Posted by: Melkhior on 2026-02-11 02:46:01
This sounds like it would be highly expensive. nobody is going to fork out $1000+ for this thing that the Amiga enthusiast might consider. Costs for entry-level FPGAs and PCB manufacturing has gone way done over the years. I just did a trial quote at JLCPCB for a 6-layers board (with 0.2/.35 via-in-pad and double-sided assembly, far from the cheapest option!) with an onboard smallish Artix-7 (50T-1) + DDR2, and it was less than $600 for 5 populated boards (a big fifth of which is the cost of the FPGA itself). I expected double to triple that from older attempts.
ECP5 85k are a bit more expensive, but that's only $5-$10 more per device (cheaper Artix in the $20-30 range, large ECP5 in the $25-$40). Using 74LVC instead of 74CB3T is cheaper, from the other components only the SRAM are costly (or they are needed and comparable to my test). From the original description, if the OP can route everything on a similar setup (6 layers only), my educated guess is the per-board cost should be at most $150 (fully assembled) for ultra-small volume (5 pieces), and as usual go significantly down for higher volume. | | 1 |
|