TraceHub is a Kafka-shaped logging platform packed into a single monolith you can read end-to-end. Sequence-based ingest, ACK/replay protocol, Redis-backed queue, and a real-time dashboard — wired so the failure modes are visible, not abstract.
Every log carries a monotonically increasing seq per service. The backend computes the highest contiguous seq it has, sends the producer an ACK with the exact missing set, and queues those gaps as replay requests. No magic — just three contracts you can read in an afternoon.
Each service buffers logs in a Redis sorted set keyed by sequence, then flushes batches via POST /ingest.
Returns the highest contiguous seq seen and lists every gap. The producer trims its buffer up to ackTill.
Missing seqs land in replay_requests. The worker pulls them from the producer buffer, dedupes, and inserts.
{ "service": "payment-service", "batch": [ { "seq": 1097, "level": "info", "message": "charge.created" }, { "seq": 1099, "level": "error", "message": "gateway.timeout" } ] }
{ "batchId": "ack_1705312200", "ackTill": 1099, "missing": [1088, 1092], "status": "partial", "replay": { "queued": 2 } } // producer trims up to seq=1099 // gaps queued in replay_requests
The Next.js dashboard subscribes to a single Socket.io channel. Each page renders one slice of the same broadcast: queue, replays, services, raw stream, system overview.
5s recovery
10s failure
3s latency
clear state
EPS chart, service status, queue depth, ACK p99 — the screen you keep open while breaking things.
Every log:new event renders as a row. Filter by service, level, or search text.
Pending + completed replays, manual trigger by service · seq range, live replay:completed events.
Depth chart for queue:logs and queue:retry, worker liveness, sim events.
EPS, error rate, ACK state per producer. Drill in when one service starts gapping.
POST /metrics/control exposes 5 controls. Crash worker, drop network, delay ACK, flush queue, reset.
Every command is one you can paste. Every directory is one you can read. Follow top-to-bottom and you'll have producers writing to Postgres through the queue in under two minutes.
Everything else — Postgres, Redis — is provisioned by compose. Verify once:
docker --version # >= 24 node --version # v20.x pnpm --version # >= 8
One repo, three apps under /apps: backend, dashboard, producer-simulator.
git clone https://github.com/CodeWithZezo/tracehub cd tracehub
Docker Compose boots postgres, redis, backend, dashboard, and the three producer-simulators simultaneously.
docker compose up --build
# or use the helper
chmod +x start.sh && ./start.sh up
The backend health probe pings both Postgres and Redis. If both are "up", you're live.
curl http://localhost:3001/health # => { "status": "ok", "postgres": "up", "redis": "up" }
You don't need the simulator. Send your own batch and read the ACK shape directly.
curl -X POST http://localhost:3001/ingest \
-H "content-type: application/json" \
-d '{
"service": "payment-service",
"batch": [
{ "seq": 1, "level": "info", "message": "hello" },
{ "seq": 2, "level": "error", "message": "oops" }
]
}'
Logs land within seconds of the queue becoming healthy. The producer-simulator starts automatically.
open http://localhost:3000 # pages: # / ▸ system overview # /live-logs ▸ real-time stream # /queue ▸ depth + worker # /replay ▸ manual replay
Keep Postgres + Redis in containers, run the three apps natively for hot reload.
# 1. just the dependencies docker compose up postgres redis -d # 2. backend cd apps/backend && pnpm install && pnpm dev # 3. producer (new terminal) cd apps/producer-simulator && pnpm install && pnpm dev # 4. dashboard (new terminal) cd apps/dashboard && pnpm install && pnpm dev
No SDK, no codegen. Curl-able from day one.
metrics:snapshot // SystemMetrics, every 2s log:new // individual LogEntry ack:sent // { service, ackTill, missing[] } replay:completed // { service, count, latencyMs } worker:crashed // auto-recovers in 5s worker:recovered sim:control // fault-injection broadcast
CREATE TABLE logs ( id BIGSERIAL PRIMARY KEY, service TEXT NOT NULL, seq BIGINT NOT NULL, level TEXT, message TEXT, request_id TEXT, ts TIMESTAMPTZ, UNIQUE (service, seq) ); CREATE TABLE ack_state ( ... ); CREATE TABLE replay_requests ( ... );
Each route is structured to peel off into its own service when you're ready — ingest, worker, query, metrics, replay all share the same Postgres schema and Redis keyspace.