Skip to Content

Data Storage

Cortex uses a layered storage architecture optimized for simplicity and reliability.

SQLite + FTS5

The primary data store is SQLite with FTS5 for full-text search:

  • Notes table: id, title, content, note_type, tags, importance, state, timestamps
  • Links table: source_id, target_id, relation, timestamps
  • FTS5 index: Full-text search over title and content

SQLite was chosen over Postgres for:

  • Zero infrastructure (embedded database)
  • Excellent read performance
  • Simple backups and migration
  • WAL mode for concurrent reads

SQLite serializes writes. This is fine for single-user or low-write workloads. If you need high write throughput, consider Postgres.

ChromaDB

Vector embeddings for semantic search:

  • Model: all-MiniLM-L6-v2 (384-dimensional embeddings)
  • Storage: Persistent on disk (./chroma_data/)
  • Usage: Semantic similarity search, auto-linking, recall queries

ChromaDB runs in-process β€” no separate service needed.

Litestream Replication

LitestreamΒ  provides continuous replication of the SQLite database to S3-compatible storage:

SQLite WAL changes β†’ Litestream β†’ Cloudflare R2 bucket

Configuration

litestream.yml
dbs: - path: /app/data/cortex.db replicas: - type: s3 endpoint: https://ACCOUNT_ID.r2.cloudflarestorage.com bucket: cortex-backup access-key-id: ${R2_ACCESS_KEY} secret-access-key: ${R2_SECRET_KEY}

Recovery

On container start, the entrypoint.sh conditionally restores from backup:

if [ ! -f "$DB_PATH" ] || [ ! -s "$DB_PATH" ]; then litestream restore -config "$LITESTREAM_CONFIG" "$DB_PATH" fi

This ensures data survives container restarts without overwriting existing local data.

Data Flow

User creates note ↓ FastAPI handler β”œβ”€β”€ ZettelStore.insert_note() β†’ SQLite β”œβ”€β”€ ChromaDB.add() β†’ Vector embedding β”œβ”€β”€ Auto-link scan β†’ SQLite (new links) └── emit_activity() β†’ SSE stream
Last updated on