Technical Architecture: GMC For-Benefit Economy RAG Agent¶
Version: 1.0 Date: 2026-06-23
1. System Overview¶
┌─────────────────────────────────────────────────────────────────────┐
│ User Interface Layer │
│ ┌──────────────────┐ ┌─────────────────┐ ┌───────────────────┐ │
│ │ CLI (Rich) │ │ Streamlit UI │ │ API (FastAPI) │ │
│ │ (primary) │ │ (optional) │ │ (future) │ │
│ └────────┬─────────┘ └────────┬────────┘ └────────┬──────────┘ │
└───────────┼─────────────────────┼─────────────────────┼────────────┘
│ │ │
┌───────────▼─────────────────────▼─────────────────────▼────────────┐
│ Orchestration Layer │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ Session Manager │ │
│ │ • Workspace CRUD • Conversation History • Metadata │ │
│ │ • Decision Logging • State Persistence • User Mgmt │ │
│ └───────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼──────────────────────────────────┐ │
│ │ Query Router (Classifier) │ │
│ │ • Parse user intent • Detect reasoning paths │ │
│ │ • Select retrieval mode (Quick/Standard/Deep) │ │
│ │ • Route to path executor │ │
│ └───────────────────────────┬──────────────────────────────────┘ │
│ │ │
│ ┌───────────────────────────▼──────────────────────────────────┐ │
│ │ Reasoning Path Executor │ │
│ │ • Path 1: Pillar Alignment • Path 5: Legal Check │ │
│ │ • Path 2: For-Benefit Design • Path 6: Ecosystem Mapping │ │
│ │ • Path 3: GNH Impact • Path 7: Mindful Capitalism │ │
│ │ • Path 4: Sustainability • Comparative Mode │ │
│ └───────────────────────────┬──────────────────────────────────┘ │
└───────────────────────────────┼────────────────────────────────────┘
│
┌───────────────────────────────▼────────────────────────────────────┐
│ Retrieval Pipeline │
│ │
│ ┌─────────────┐ ┌──────────────┐ ┌─────────────┐ ┌──────────┐ │
│ │ Query │ │ Metadata │ │ Hybrid │ │ Rerank │ │
│ │ Expansion │→│ Filter │→│ Search │→│ (Cross- │ │
│ │ (3 variants)│ │ (pillar, │ │ (Dense + │ │ encoder) │ │
│ │ │ │ type, date)│ │ BM25) │ │ │ │
│ └─────────────┘ └──────────────┘ └──────┬──────┘ └────┬─────┘ │
│ │ │ │
│ ┌─────────────────────────────────────────▼───────────────▼─────┐ │
│ │ Context Assembly │ │
│ │ • Format chunks with source metadata │ │
│ │ • Inject into system prompt │ │
│ │ • Trim to fit context window (128K tokens) │ │
│ └────────────────────────────┬──────────────────────────────────┘ │
└───────────────────────────────┼────────────────────────────────────┘
│
┌───────────────────────────────▼────────────────────────────────────┐
│ Generation Layer │
│ │
│ ┌──────────────────────────────────────────────────────────────┐ │
│ │ LLM (Ollama / Qwen 2.5) │ │
│ │ • System prompt (grounding + reasoning paths) │ │
│ │ • Retrieved context (15-25 chunks) │ │
│ │ • User query │ │
│ │ → Structured evaluation output │ │
│ └──────────────────────────────────────────────────────────────┘ │
└───────────────────────────────┼────────────────────────────────────┘
│
┌───────────────────────────────▼────────────────────────────────────┐
│ Storage Layer │
│ │
│ ┌──────────────────┐ ┌──────────────┐ ┌────────────────────┐ │
│ │ ChromaDB │ │ SQLite FTS5 │ │ File System │ │
│ │ (vector store) │ │ (BM25 index)│ │ (raw docs .jsonl) │ │
│ │ • 768-dim dense │ │ • full-text │ │ • session logs │ │
│ │ • metadata index │ │ • keyword │ │ • decision logs │ │
│ │ • persistent (disk)│ │ search │ │ • exports │ │
│ └──────────────────┘ └──────────────┘ └────────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘
2. Component Details¶
2.1 Embeddings¶
Model: intfloat/multilingual-e5-large
| Property | Value |
|---|---|
| Dimensions | 768 |
| Max tokens | 512 |
| Languages | 100+ (English + Dzongkha capable) |
| Model type | Decoder-only, bi-encoder |
| Memory | ~2GB loaded |
| Source | HuggingFace (permissive license) |
Usage pattern:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("intfloat/multilingual-e5-large")
# Prefix queries with "query: " and documents with "passage: "
query_embedding = model.encode("query: " + user_query)
doc_embedding = model.encode("passage: " + chunk_text)
Fallback model: BAAI/bge-large-en-v1.5 (1024 dim, English-only, slightly higher accuracy on English text). Use if Dzongkha performance is unsatisfactory.
2.2 Vector Store: ChromaDB¶
| Property | Value |
|---|---|
| Version | 0.5.x+ |
| Storage | Persistent (chroma_db/ directory) |
| Distance | Cosine (default) |
| Metadata filtering | Supported (pillar, doc_type, tier, date) |
| Max collection size | Unlimited (disk-bound) |
Collection schema:
collection = client.create_collection(
name="gmc_documents",
metadata={"hnsw:space": "cosine"}
)
# Document schema
collection.add(
embeddings=[[...], ...], # 768-dim float arrays
documents=["chunk text", ...], # raw text
metadatas=[{
"doc_id": "GMC-001",
"title": "GMC Homepage",
"pillar": "pillar-all",
"doc_type": "core-vision",
"tier": 1,
"language": "en",
"chunk_index": 3,
"url": "https://gmc.bt",
"ingested_at": "2026-06-23"
}, ...],
ids=["GMC-001-chunk-003", ...]
)
2.3 Full-Text Index: SQLite FTS5¶
CREATE VIRTUAL TABLE docs_fts USING fts5(
doc_id, title, chunk_text, pillar,
content='documents',
content_rowid='rowid'
);
-- BM25 query
SELECT * FROM docs_fts
WHERE docs_fts MATCH 'geothermal energy sustainability'
ORDER BY rank
LIMIT 50;
2.4 Hybrid Search Fusion¶
Both dense and sparse results are merged using Reciprocal Rank Fusion (RRF):
def reciprocal_rank_fusion(dense_results, sparse_results, k=60):
scores = {}
for rank, (doc_id, score) in enumerate(dense_results):
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
for rank, (doc_id, score) in enumerate(sparse_results):
scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
return sorted(scores.items(), key=lambda x: x[1], reverse=True)
2.5 Cross-Encoder Reranker¶
Model: BAAI/bge-reranker-v2-m3
| Property | Value |
|---|---|
| Type | Cross-encoder (query + doc pair) |
| Max tokens | 8192 |
| Languages | Multilingual |
| Memory | ~4GB loaded |
from transformers import AutoModelForSequenceClassification, AutoTokenizer
reranker = AutoModelForSequenceClassification.from_pretrained(
"BAAI/bge-reranker-v2-m3"
)
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-reranker-v2-m3")
pairs = [[query, chunk] for chunk in top_75_chunks]
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors="pt")
scores = reranker(**inputs).logits.view(-1).tolist()
# Re-sort by reranker score, keep top 15-25
scored = sorted(zip(top_75_chunks, scores), key=lambda x: x[1], reverse=True)
2.6 LLM: Qwen 2.5 (via Ollama)¶
Model: qwen2.5:14b (recommended) or qwen2.5:7b (minimum)
| Property | Value |
|---|---|
| Context window | 128K tokens |
| Languages | English + Chinese + Dzongkha capability |
| Quantization | Q4_K_M (7B ~4.5GB, 14B ~9GB) |
| API | Ollama REST API (localhost:11434) |
import requests
response = requests.post("http://localhost:11434/api/chat", json={
"model": "qwen2.5:14b",
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_query_with_context}
],
"options": {
"temperature": 0.3,
"top_p": 0.9,
"max_tokens": 4096
}
})
Why Qwen 2.5: - Strong multilingual support (important for Dzongkha) - 128K context (can fit full system prompt + many retrieved chunks) - Competes with Llama 3.1 on English benchmarks - Available via Ollama (easy local deployment)
3. File System Layout¶
/home/nahar/Documents/code/GMC/
├── docs/
│ ├── information-directory.md # document index (this file)
│ ├── prd.md # product requirements document
│ ├── system-prompt.md # system prompt (all reasoning paths)
│ └── technical-architecture.md # this file
│
├── data/
│ ├── raw/ # raw scraped documents
│ │ ├── gmc_bt/ # from gmc.bt
│ │ ├── fourthsector_org/ # from fourthsector.org
│ │ ├── wikipedia/ # from Wikipedia
│ │ └── media/ # from news/media sources
│ ├── processed/ # chunked documents (JSONL)
│ │ ├── chunks.jsonl # all chunks with metadata
│ │ └── summaries.jsonl # document-level summaries
│ └── chroma_db/ # ChromaDB persistent store
│
├── agent/
│ ├── __init__.py
│ ├── main.py # CLI entry point
│ ├── router.py # query classifier
│ ├── paths/
│ │ ├── __init__.py
│ │ ├── pillar_alignment.py # Path 1
│ │ ├── for_benefit_design.py # Path 2
│ │ ├── gnh_impact.py # Path 3
│ │ ├── sustainability.py # Path 4
│ │ ├── legal_check.py # Path 5
│ │ ├── ecosystem_mapping.py # Path 6
│ │ ├── mindful_capitalism.py # Path 7
│ │ └── comparative.py # Comparative mode
│ ├── retrieval/
│ │ ├── __init__.py
│ │ ├── embedder.py # embedding generation
│ │ ├── hybrid_search.py # dense + BM25 search
│ │ ├── reranker.py # cross-encoder reranking
│ │ └── context.py # context assembly
│ ├── storage/
│ │ ├── __init__.py
│ │ ├── vector_store.py # ChromaDB interface
│ │ ├── fulltext.py # SQLite FTS5 interface
│ │ ├── document_store.py # JSONL document store
│ │ └── decision_logger.py # session logging
│ ├── session/
│ │ ├── __init__.py
│ │ ├── workspace.py # workspace manager
│ │ └── history.py # conversation history
│ └── prompts/
│ ├── __init__.py
│ ├── system.py # base system prompt
│ ├── paths/ # path-specific prompt templates
│ └── router.py # router classifier prompt
│
├── scripts/
│ ├── scrape_gmc.py # scrape gmc.bt
│ ├── scrape_fourthsector.py # scrape fourthsector.org
│ ├── scrape_wikipedia.py # scrape relevant Wikipedia pages
│ ├── ingest.py # chunk + embed + store
│ └── test_retrieval.py # manual retrieval quality tests
│
├── tests/
│ ├── test_retrieval.py
│ ├── test_paths.py
│ ├── test_router.py
│ └── fixtures/ # test document fixtures
│
├── exports/ # user exports (markdown/JSON)
│
├── docker-compose.yml # Ollama + ChromaDB + agent
├── Dockerfile # agent container
├── requirements.txt # Python dependencies
└── Makefile # common commands
4. Data Flow: End-to-End Query¶
User: "Evaluate a wellness tourism cooperative in GMC"
1. ROUTER CLASSIFIES
→ Path 1 (Pillar: Health/Wellness + Tourism)
→ Path 2 (For-Benefit: cooperative = stakeholder governance)
→ Path 3 (GNH: community vitality, health, living standards)
→ Path 7 (Mindful Capitalism: balance check)
→ Retrieval mode: STANDARD
2. QUERY EXPANSION (3 variants)
a. "wellness tourism cooperative in GMC"
b. "for-benefit cooperative health tourism Gelephu"
c. "GMC wellness cooperative enterprise design"
3. METADATA FILTER
pillar IN (health-wellness, tourism) OR doc_type IN (case-study, core-vision)
4. HYBRID SEARCH
Dense(top 50) + BM25(top 50) → RRF merge → top 75
5. RERANKER
Cross-encoder scores → sort → top 20 chunks
6. CONTEXT ASSEMBLY
Inject top 20 chunks with [Source: ...] tags into system prompt
7. GENERATION
LLM receives: system prompt + 20 chunks + user query
→ Outputs structured evaluation per Path 1, 2, 3, 7
→ Each claim cites source
→ Gap analysis included
8. LOG
Full query + chunks + response saved to session workspace
5. Document Ingestion Pipeline¶
┌─────────────────┐
│ Web Scraper │
│ (httpx + │
│ BeautifulSoup)│
└────────┬────────┘
│ raw HTML
▼
┌─────────────────┐
│ Text Extractor │
│ (markdownify, │
│ readability) │
└────────┬────────┘
│ clean text
▼
┌─────────────────┐
│ Metadata │
│ Annotator │
│ (pillar, type, │
│ tier, lang) │
└────────┬────────┘
│ annotated text
▼
┌─────────────────┐
│ Hierarchical │
│ Chunker │
│ (summary + │
│ semantic + │
│ sentence) │
└────────┬────────┘
│ chunks
▼
┌──────────────┴──────────────┐
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Embedder │ │ Text Indexer │
│ (E5-large) │ │ (SQLite FTS5) │
└────────┬─────────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ ChromaDB │ │ SQLite DB │
│ (vectors + │ │ (full-text + │
│ metadata) │ │ metadata) │
└──────────────────┘ └──────────────────┘
6. Workspace & Session Data Model¶
@dataclass
class Workspace:
id: str # "digital-assets-framework"
name: str # "Digital Assets Framework"
created_at: datetime
metadata_filters: dict # {"pillar": "pillar-finance-digital"}
session_count: int = 0
@dataclass
class Session:
id: str # uuid
workspace_id: str # foreign key to workspace
created_at: datetime
messages: List[Message]
@dataclass
class Message:
role: Literal["user", "assistant"]
content: str
retrieved_chunks: List[ChunkRef] # which chunks were used
paths_activated: List[int] # which paths ran
timestamp: datetime
@dataclass
class ChunkRef:
doc_id: str # "GMC-004"
chunk_index: int # 3
text: str # "The intention of this pledge is clear..."
score: float # relevance score
7. Deployment¶
Docker Compose¶
version: '3.8'
services:
ollama:
image: ollama/ollama:latest
ports:
- "11434:11434"
volumes:
- ./data/ollama:/root/.ollama
command: serve
restart: unless-stopped
chromadb:
image: chromadb/chroma:latest
ports:
- "8000:8000"
volumes:
- ./data/chroma_db:/chroma/chroma
environment:
- IS_PERSISTENT=TRUE
restart: unless-stopped
agent:
build: .
ports:
- "8080:8080"
volumes:
- ./data:/app/data
- ./agent:/app/agent
- ./exports:/app/exports
- ./docs:/app/docs
environment:
- OLLAMA_HOST=http://ollama:11434
- CHROMA_HOST=http://chromadb:8000
depends_on:
- ollama
- chromadb
stdin_open: true
tty: true
command: python -m agent.main
Makefile Commands¶
# Scrape documents
scrape:
python scripts/scrape_gmc.py
python scripts/scrape_fourthsector.py
python scripts/scrape_wikipedia.py
# Ingest into vector store
ingest:
python scripts/ingest.py
# Run the agent CLI
run:
docker-compose up agent
# Run tests
test:
python -m pytest tests/ -v
# Export decision log
export:
python -m agent.export --workspace $(workspace)
# Rebuild vector store (full reset)
rebuild:
rm -rf data/chroma_db data/processed
make ingest
8. Requirements¶
requirements.txt¶
# Core
sentence-transformers>=3.0.0
chromadb>=0.5.0
ollama>=0.3.0
httpx>=0.27.0
# Retrieval
numpy>=1.24.0
scipy>=1.11.0
# Storage
datasets>=2.18.0
sqlite-utils>=3.36
# Scraping
beautifulsoup4>=4.12.0
markdownify>=0.12.0
readability-lxml>=0.8.1
lxml>=5.1.0
# CLI
rich>=13.0.0
typer>=0.12.0
# Utilities
python-dateutil>=2.8.0
pydantic>=2.0.0
9. Performance Budgets¶
| Operation | Budget | Measured by |
|---|---|---|
| Query classification | <200ms | router.invoke() |
| Hybrid search (50+50) | <500ms | hybrid_search.search() |
| Reranker (75 pairs) | <3s | reranker.rerank() |
| Context assembly | <100ms | context.assemble() |
| LLM generation (Qwen 2.5 7B) | <8s | LLM response time |
| Total (Standard mode) | <15s | End-to-end |
| Document embedding | <1s/doc | embedder.embed_document() |
| Ingestion throughput | ~50 docs/min | Full pipeline |
Profiling output (expected for Standard mode):
Router: 45ms
├── Intent classification: 22ms
├── Path selection: 15ms
└── Mode selection: 8ms
Retrieval: 2.1s
├── Query expansion: 12ms
├── Metadata filter: 4ms
├── Dense search (ChromaDB): 280ms
├── Sparse search (SQLite FTS5): 120ms
├── RRF merge: 2ms
└── Reranker (75 pairs): 1.7s
Context assembly: 95ms
├── Format chunks: 40ms
├── Apply citations: 35ms
└── Trim to window: 20ms
Generation (Qwen 2.5 7B, Q4_K_M): 8.2s
├── Prompt processing: 1.1s
└── Token generation (600 tokens @ 8t/s): 7.1s
Total: 10.4s
10. Development Phases¶
Phase 0: Foundation (Week 1)¶
- Scrape Wave 1 documents →
data/raw/ - Write
scripts/ingest.py(chunk → embed → store) - Set up ChromaDB + SQLite FTS5
- Verify:
pingquery returns all 8 pillars
Phase 1: RAG Pipeline (Week 2)¶
- Implement
retrieval/module - Integrate Ollama
- CLI interface (basic chat)
- Test 10 sample queries manually
Phase 2: Paths (Week 3)¶
- Implement all 7 path modules in
agent/paths/ - Implement
router.py - Context assembly with citations
- Structured output formatting
Phase 3: Workspaces (Week 4)¶
- Workspace CRUD
- Decision logger
- Export
- Multi-session support
Phase 4: Polish (Week 5-6)¶
- Wave 2+ document ingestion
- Dzongkha testing
- Performance tuning (quantization, batch size)
- Team dogfooding
- Iterate