Cortex · Hardware specs (Seratonin · Big Apple · Mini Apple

Seratonin

Primary GPU · runs TRIBE v2 + most narration · Chicago, IL

● live · primary

GPU: NVIDIA RTX 5090 · 32 GB GDDR7 · sm_120 (Blackwell) · 575 W TDP
CPU: AMD Ryzen 7 9800X3D · 8c / 16t · 4.7 GHz · 3D V-Cache
RAM: 64 GB DDR5-4800 · 2× 32 GB
Mobo: MSI X870E-P PRO · AM5 · WiFi 7
OS: Windows 11 Pro · WSL2 Ubuntu 24.04 · CUDA 12.8
Network: Tailscale (seratonin) · Funnel exposed publicly
Role: FastAPI backend (:8773) + inference router (:8766) + Vite frontend (:5173) + Mercury web (:9119)

What it does for the demo: hosts the public URL you're looking at right now. Runs TRIBE v2 (~6 GB VRAM) for cortical-surface BOLD prediction, then runs Gemma 4 E4B locally via Ollama for the four parallel persona narrations. Tailscale Funnel makes the Vite dev server reachable from any phone browser without authentication.

Big Apple

Overflow narration · Apple Silicon · Chicago, IL (next to Seratonin)

● live · overflow

Chip: Apple M4 Max · 16 cores (12P + 4E) · 40-core GPU · 16-core Neural Engine
RAM: 48 GB unified memory (LPDDR5X) · 546 GB/s bandwidth
Storage: 1 TB NVMe · 7.4 GB/s read
OS: macOS Sequoia 15.x · MLX + Ollama (Metal backend)
Network: Tailscale (big-apple · 100.93.240.52)
Models: gemma4:e4b (~9 GB), gemma4:26b (~21 GB), gemma4:31b (~21 GB), gemma4:e2b
Role: Round-robin overflow when the 5090 is busy. ~90 tokens/sec on Gemma 4 E4B.

What it does for the demo: when Seratonin is mid-TRIBE-inference and a narration job arrives, the inference router transparently sends it here. The user sees no difference — the M4 Max produces ~30-token-per-sentence narrations in 0.3s end-to-end. Acts as a hot standby so the demo URL never blocks on a single GPU.

Mini Apple

ISU research node · Apple Silicon · Bloomington–Normal, IL

● standby · research node

Chip: Apple Silicon (Mac mini class)
OS: macOS Sequoia 15.x · Ollama on Metal
Network: Tailscale (miniapple)
Location: Near ISU campus, Normal IL — used for in-person research meetings with collaborators (Mangolika Bhattacharya, Sally Xie, Rosangela Follmann)
Role: Tertiary node. Available as a third Ollama backend if both Chicago machines are saturated.

Why it exists: live demo + collaboration sessions at ISU don't have to round-trip to Chicago. The whole inference router design assumes any Tailscale node can be added or removed from the load-balancer pool with zero config change.

Baby Pi

Edge node · Raspberry Pi 5 · work in progress

○ WIP · not deployed

Hardware: Raspberry Pi 5 · 8 GB · ARM Cortex-A76 (4 cores @ 2.4 GHz)
Plan: AdGuard Home + Tailscale subnet router for Ascended Base. No LLM inference (the Pi is too small for anything bigger than BitNet, which we deprioritized).
Status: SD card prep + AdGuard + Tailscale install scripts written. Hardware not yet flashed and connected.

Honest status: this is on the to-do list, not in production. Listed here so the diagram on /about lines up with what's actually on the workbench.

The four physical machines behind Cortex & Mercury