Memex Architecture¶

Memex is a local-first, markdown-native second-brain framework optimized for strategic synthesis, not passive note collection. The markdown vault is the source of truth; schemas, indexes, skills, and evals are operating layers around it.

Lineage¶

Primary credit goes to Andrej Karpathy's LLM Knowledge Bases / LLM Wiki pattern: raw sources stay intact, an LLM maintains a persistent markdown wiki, and answers can be filed back into the wiki so knowledge compounds. Memex extends that idea with schemas, skills, validation, governance, evals, and a clean public/private vault boundary.

Garry Tan's GStack and GBrain are major operational influences: GStack for skill-based agent workflows, and GBrain for retrieval, graph traversal, cited synthesis, and local-first brain operations.

See Lineage and Credits.

Design goals¶

Durability — notes should remain useful years later.
Traceability — synthesized claims should point back to sources.
Composability — individual notes should become inputs for maps, essays, decisions, and strategy.
Searchability — metadata should support both human browsing and future machine retrieval.
Agent-operability — workflows should be repeatable by an AI agent with citations and guardrails.
Low ceremony — structured enough to compound, simple enough to use daily.

System layers¶

Raw/source layer — original or lightly cleaned inputs: articles, exports, transcripts, PDFs, bookmarks, conversations, research snippets.
Compiled wiki/content layer — human-readable markdown entities, synthesized items, maps, and decision notes.
Schema and index layer — JSON schemas, front matter conventions, content indexes, graph edges, and provenance.
Machine layer — lexical indexes, embeddings, metadata tables, and eval traces.
Agent workflow layer — repeatable skills for ingest, query, curation, briefing, markdown editing, and evaluation.
Interface layer — chat, CLI, editor, GitHub Pages, and scheduled maintenance.
Harness layer — execution boundaries, tool contracts, context policy, lifecycle state, observability, verification, and governance.

See Harness Architecture for the ETCLOVG mapping.

Memory horizons¶

Horizon	Memex layer	Purpose	Example artifacts
Active context	Current agent/session window	Immediate reasoning state	Current prompt, retrieved files, tool results
Mid-term state	Session and task handoff files	Recover work across turns, compactions, and restarts	Daily memory files, task notes, state templates
Long-term memory	Brain/Memex markdown plus indexes	Durable recall and synthesis across projects	Entity pages, source notes, items, GBrain index

Long-running work should not trust chat history as the source of truth. Important state must move into durable files with provenance, uncertainty markers, and last-verified timestamps.

Information model¶

1. Sources¶

Path: content/sources/

Sources are evidence and raw material. They are not expected to be polished.

Examples:

book notes
article excerpts
transcripts
meeting notes
research snippets
copied source markdown

2. Items¶

Path: content/items/

Items are synthesized notes. Each item should include:

title
type
ingestion or creation date
source references
tags
core thesis
key ideas
reusable frameworks or questions

Items should be opinionated. A good item says what matters and why.

3. Index¶

Path: content/index.jsonl

The index provides machine-readable metadata for each item. One JSON object per line.

Required fields:

id
title
path
source_files
tags
ingested_at

4. Entity pages¶

Future or expanded vaults can use typed entity pages for:

person
company
concept
project
source
decision
relationship
synthesis
index

Entity pages make the memex graph-readable and allow agents to resolve people, companies, concepts, and relationships directly.

5. Maps¶

Future path: content/maps/

Maps connect multiple items into larger structures:

mental models
strategic themes
decision frameworks
company or market theses
reading programs

Examples:

AI deterrence and sovereign capability
Digital banking infrastructure in emerging markets
Long games and compounding institutions
Clear thinking under uncertainty

Naming conventions¶

Use stable, human-readable slugs:

YYYY-MM-DD-short-topic-slug.md

Example:

2026-05-09-technological-republic-ai-deterrence.md

Metadata conventions¶

Items use YAML front matter. Keep field names consistent so future tooling can parse them.

---
title: "Example Title"
type: memex_ingest
ingested_at: "2026-05-14T07:50:00+03:00"
source_files:
  - "content/sources/example.md"
tags:
  - "AI"
  - "strategy"
---

Claim discipline¶

Every analytical answer should separate:

Cited fact — grounded in a note or source path.
Inference — the agent's judgment based on cited facts.
Unknown — missing or insufficient evidence.

Public and private boundary¶

This repository contains the framework: documentation, templates, schemas, skills, eval methodology, synthetic example vaults, and public-source examples.

Real deployments should keep private vault data in a separate, access-controlled repository. Private vaults can reuse the same schema and skills without exposing real people, meetings, investor materials, customer records, or strategy.

Search and retrieval modes¶

Use the smallest retrieval mode that can answer the question:

Exact lookup: known slug or known file path.
Keyword search: names, companies, exact phrases, source titles.
Hybrid query: concepts, analogies, strategic themes, fuzzy recall.
Graph traversal: neighboring people, companies, relationships, decisions.
Eval set: regression checks for expected entities, citation quality, and synthesis quality.

Agent operating rule¶

Agents should not treat the vault as generic prompt context. They should resolve entities, retrieve sources, separate cited facts from inference, write durable updates only with provenance, run validation after structural edits, and avoid publishing private vault material.

Future automation¶

The current repository is intentionally simple. Later automation can add:

index rebuilding
duplicate detection
semantic search
graph extraction
tag normalization
source-to-item traceability checks
privacy scans
benchmark reports
weekly synthesis reports