Data Storage
Cortex uses a layered storage architecture optimized for simplicity and reliability.
SQLite + FTS5
The primary data store is SQLite with FTS5 for full-text search:
- Notes table: id, title, content, note_type, tags, importance, state, timestamps
- Links table: source_id, target_id, relation, timestamps
- FTS5 index: Full-text search over title and content
SQLite was chosen over Postgres for:
- Zero infrastructure (embedded database)
- Excellent read performance
- Simple backups and migration
- WAL mode for concurrent reads
SQLite serializes writes. This is fine for single-user or low-write workloads. If you need high write throughput, consider Postgres.
ChromaDB
Vector embeddings for semantic search:
- Model:
all-MiniLM-L6-v2(384-dimensional embeddings) - Storage: Persistent on disk (
./chroma_data/) - Usage: Semantic similarity search, auto-linking, recall queries
ChromaDB runs in-process β no separate service needed.
Litestream Replication
LitestreamΒ provides continuous replication of the SQLite database to S3-compatible storage:
SQLite WAL changes β Litestream β Cloudflare R2 bucketConfiguration
litestream.yml
dbs:
- path: /app/data/cortex.db
replicas:
- type: s3
endpoint: https://ACCOUNT_ID.r2.cloudflarestorage.com
bucket: cortex-backup
access-key-id: ${R2_ACCESS_KEY}
secret-access-key: ${R2_SECRET_KEY}Recovery
On container start, the entrypoint.sh conditionally restores from backup:
if [ ! -f "$DB_PATH" ] || [ ! -s "$DB_PATH" ]; then
litestream restore -config "$LITESTREAM_CONFIG" "$DB_PATH"
fiThis ensures data survives container restarts without overwriting existing local data.
Data Flow
User creates note
β
FastAPI handler
βββ ZettelStore.insert_note() β SQLite
βββ ChromaDB.add() β Vector embedding
βββ Auto-link scan β SQLite (new links)
βββ emit_activity() β SSE streamLast updated on