TraceHub — Distributed logging, learnable in one repo

core protocol

Sequence in. ACK out.
Replay when reality lies.

Every log carries a monotonically increasing seq per service. The backend computes the highest contiguous seq it has, sends the producer an ACK with the exact missing set, and queues those gaps as replay requests. No magic — just three contracts you can read in an afternoon.

01 · INGEST

Producer batches by seq

Each service buffers logs in a Redis sorted set keyed by sequence, then flushes batches via POST /ingest.

02 · ACK

Backend replies with truth

Returns the highest contiguous seq seen and lists every gap. The producer trims its buffer up to ackTill.

03 · REPLAY

Worker fills the gaps

Missing seqs land in replay_requests. The worker pulls them from the producer buffer, dedupes, and inserts.

POST /ingest · request

{
  "service": "payment-service",
  "batch": [
    { "seq": 1097, "level": "info",
      "message": "charge.created" },
    { "seq": 1099, "level": "error",
      "message": "gateway.timeout" }
  ]
}

200 OK · ack response

{
  "batchId":  "ack_1705312200",
  "ackTill":  1099,
  "missing":  [1088, 1092],
  "status":   "partial",
  "replay":   { "queued": 2 }
}
// producer trims up to seq=1099
// gaps queued in replay_requests

dashboard · 5 pages

Five views, one source of truth.

The Next.js dashboard subscribes to a single Socket.io channel. Each page renders one slice of the same broadcast: queue, replays, services, raw stream, system overview.

Overview Live Logs Replay Queue Services

socket connected

EPS

28.4

↑ streaming

Queue

142

queue:logs

ACK Rate

98.1%

↑ healthy

Replays

pending

Live log stream

10:24:01INFOJWT issued

10:24:01ERRORcard declined

10:24:02WARNbounce rate ↑

10:24:03INFOrefund processed

Fault injection

⚡

Crash Worker

5s recovery

⊘

Drop Network

10s failure

⏱

Delay ACK

3s latency

↺

Reset All

clear state

/ · OVERVIEW

System metrics

EPS chart, service status, queue depth, ACK p99 — the screen you keep open while breaking things.

/live-logs

Real-time stream

Every log:new event renders as a row. Filter by service, level, or search text.

/replay

Replay center

Pending + completed replays, manual trigger by service · seq range, live replay:completed events.

/queue

Queue monitor

Depth chart for queue:logs and queue:retry, worker liveness, sim events.

/services

Per-service detail

EPS, error rate, ACK state per producer. Drill in when one service starts gapping.

FAULT INJECTION

Break it on purpose

POST /metrics/control exposes 5 controls. Crash worker, drop network, delay ACK, flush queue, reset.

step-by-step · from clone to ack

Six steps. No hidden magic.

Every command is one you can paste. Every directory is one you can read. Follow top-to-bottom and you'll have producers writing to Postgres through the queue in under two minutes.

prerequisites

Install Docker & Node 20

Everything else — Postgres, Redis — is provisioned by compose. Verify once:

~ shell

docker --version    # >= 24
node --version      # v20.x
pnpm --version      # >= 8

clone

Pull the monolith

One repo, three apps under /apps: backend, dashboard, producer-simulator.

~ shell

git clone https://github.com/CodeWithZezo/tracehub
cd tracehub

start

One command to rule them all

Docker Compose boots postgres, redis, backend, dashboard, and the three producer-simulators simultaneously.

~ shell

docker compose up --build

# or use the helper
chmod +x start.sh && ./start.sh up

verify

Check the health endpoint

The backend health probe pings both Postgres and Redis. If both are "up", you're live.

~ shell

curl http://localhost:3001/health

# =>
{ "status": "ok", "postgres": "up", "redis": "up" }

send a batch

POST your first ingest

You don't need the simulator. Send your own batch and read the ACK shape directly.

~ shell

curl -X POST http://localhost:3001/ingest \
  -H "content-type: application/json" \
  -d '{
    "service": "payment-service",
    "batch": [
      { "seq": 1, "level": "info",  "message": "hello" },
      { "seq": 2, "level": "error", "message": "oops" }
    ]
  }'

observe

Open the dashboard

Logs land within seconds of the queue becoming healthy. The producer-simulator starts automatically.

~ browser

open http://localhost:3000

# pages:
#   /          ▸ system overview
#   /live-logs ▸ real-time stream
#   /queue     ▸ depth + worker
#   /replay    ▸ manual replay

optional · local dev without docker

Run apps natively with pnpm

Keep Postgres + Redis in containers, run the three apps natively for hot reload.

~ shell

# 1. just the dependencies
docker compose up postgres redis -d

# 2. backend
cd apps/backend     && pnpm install && pnpm dev

# 3. producer (new terminal)
cd apps/producer-simulator && pnpm install && pnpm dev

# 4. dashboard (new terminal)
cd apps/dashboard   && pnpm install && pnpm dev

api surface

Eight routes. One websocket.

No SDK, no codegen. Curl-able from day one.

verbpathpurpose

POST/ingestReceive a log batch, return ACK with ackTill + missing seqs.

GET/logsQuery logs by service, level, search text, time range.

GET/logs/replayList pending and completed replay requests.

POST/logs/replayManually trigger a replay for a service / seq range.

GET/metricsSnapshot of EPS, queue depth, error rate, ACK p99.

GET/metrics/queueQueue depth + worker liveness + retry stats.

POST/metrics/controlFault-injection: crash, delay, drop, flush, reset.

GET/healthLiveness probe — pings Postgres + Redis.

websocket events → client

socket.io

metrics:snapshot   // SystemMetrics, every 2s
log:new            // individual LogEntry
ack:sent           // { service, ackTill, missing[] }
replay:completed   // { service, count, latencyMs }
worker:crashed     // auto-recovers in 5s
worker:recovered
sim:control        // fault-injection broadcast

postgres schema

init.sql

CREATE TABLE logs (
  id          BIGSERIAL PRIMARY KEY,
  service     TEXT NOT NULL,
  seq         BIGINT NOT NULL,
  level       TEXT, message TEXT,
  request_id  TEXT, ts TIMESTAMPTZ,
  UNIQUE (service, seq)
);
CREATE TABLE ack_state    ( ... );
CREATE TABLE replay_requests ( ... );

Distributed logging,
made legible.

Sequence in. ACK out.
Replay when reality lies.

Producer batches by seq

Backend replies with truth

Worker fills the gaps

Five views, one source of truth.

Crash Worker

Drop Network

Delay ACK

Reset All

System metrics

Real-time stream

Replay center

Queue monitor

Per-service detail

Break it on purpose

Six steps. No hidden magic.

Install Docker & Node 20

Pull the monolith

One command to rule them all

Check the health endpoint

POST your first ingest

Open the dashboard

Run apps natively with pnpm

Eight routes. One websocket.

The whole stack fits in one tab.
Read it. Break it. Extract it.

Sequence in. ACK out.Replay when reality lies.

Producer batches by seq

Backend replies with truth

Worker fills the gaps

Five views, one source of truth.

Crash Worker

Drop Network

Delay ACK

Reset All

System metrics

Real-time stream

Replay center

Queue monitor

Per-service detail

Break it on purpose

Six steps. No hidden magic.

Install Docker & Node 20

Pull the monolith

One command to rule them all

Check the health endpoint

POST your first ingest

Open the dashboard

Run apps natively with pnpm

Eight routes. One websocket.

The whole stack fits in one tab.Read it. Break it. Extract it.

Sequence in. ACK out.
Replay when reality lies.

The whole stack fits in one tab.
Read it. Break it. Extract it.