Technical Architecture: GMC For-Benefit Economy RAG Agent¶

Version: 1.0 Date: 2026-06-23

1. System Overview¶

┌─────────────────────────────────────────────────────────────────────┐
│                        User Interface Layer                         │
│  ┌──────────────────┐  ┌─────────────────┐  ┌───────────────────┐  │
│  │  CLI (Rich)      │  │  Streamlit UI    │  │  API (FastAPI)    │  │
│  │  (primary)       │  │  (optional)      │  │  (future)         │  │
│  └────────┬─────────┘  └────────┬────────┘  └────────┬──────────┘  │
└───────────┼─────────────────────┼─────────────────────┼────────────┘
            │                     │                     │
┌───────────▼─────────────────────▼─────────────────────▼────────────┐
│                      Orchestration Layer                            │
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │                    Session Manager                            │  │
│  │  • Workspace CRUD     • Conversation History    • Metadata   │  │
│  │  • Decision Logging   • State Persistence      • User Mgmt  │  │
│  └───────────────────────────┬──────────────────────────────────┘  │
│                              │                                       │
│  ┌───────────────────────────▼──────────────────────────────────┐  │
│  │                  Query Router (Classifier)                    │  │
│  │  • Parse user intent    • Detect reasoning paths             │  │
│  │  • Select retrieval mode (Quick/Standard/Deep)               │  │
│  │  • Route to path executor                                   │  │
│  └───────────────────────────┬──────────────────────────────────┘  │
│                              │                                       │
│  ┌───────────────────────────▼──────────────────────────────────┐  │
│  │                  Reasoning Path Executor                      │  │
│  │  • Path 1: Pillar Alignment    • Path 5: Legal Check         │  │
│  │  • Path 2: For-Benefit Design  • Path 6: Ecosystem Mapping  │  │
│  │  • Path 3: GNH Impact         • Path 7: Mindful Capitalism  │  │
│  │  • Path 4: Sustainability     • Comparative Mode            │  │
│  └───────────────────────────┬──────────────────────────────────┘  │
└───────────────────────────────┼────────────────────────────────────┘
                                │
┌───────────────────────────────▼────────────────────────────────────┐
│                       Retrieval Pipeline                           │
│                                                                     │
│  ┌─────────────┐  ┌──────────────┐  ┌─────────────┐  ┌──────────┐ │
│  │  Query       │  │  Metadata    │  │  Hybrid     │  │  Rerank  │ │
│  │  Expansion  │→│  Filter      │→│  Search     │→│  (Cross- │ │
│  │  (3 variants)│  │  (pillar,   │  │  (Dense +   │  │  encoder) │ │
│  │              │  │  type, date)│  │  BM25)      │  │          │ │
│  └─────────────┘  └──────────────┘  └──────┬──────┘  └────┬─────┘ │
│                                            │               │       │
│  ┌─────────────────────────────────────────▼───────────────▼─────┐ │
│  │                    Context Assembly                            │ │
│  │  • Format chunks with source metadata                        │ │
│  │  • Inject into system prompt                                  │ │
│  │  • Trim to fit context window (128K tokens)                  │ │
│  └────────────────────────────┬──────────────────────────────────┘ │
└───────────────────────────────┼────────────────────────────────────┘
                                │
┌───────────────────────────────▼────────────────────────────────────┐
│                      Generation Layer                               │
│                                                                     │
│  ┌──────────────────────────────────────────────────────────────┐  │
│  │              LLM (Ollama / Qwen 2.5)                         │  │
│  │  • System prompt (grounding + reasoning paths)               │  │
│  │  • Retrieved context (15-25 chunks)                          │  │
│  │  • User query                                                │  │
│  │  → Structured evaluation output                               │  │
│  └──────────────────────────────────────────────────────────────┘  │
└───────────────────────────────┼────────────────────────────────────┘
                                │
┌───────────────────────────────▼────────────────────────────────────┐
│                        Storage Layer                                │
│                                                                     │
│  ┌──────────────────┐  ┌──────────────┐  ┌────────────────────┐  │
│  │  ChromaDB         │  │  SQLite FTS5 │  │  File System       │  │
│  │  (vector store)   │  │  (BM25 index)│  │  (raw docs .jsonl) │  │
│  │  • 768-dim dense  │  │  • full-text │  │  • session logs    │  │
│  │  • metadata index │  │  • keyword   │  │  • decision logs   │  │
│  │  • persistent (disk)│ │  search     │  │  • exports         │  │
│  └──────────────────┘  └──────────────┘  └────────────────────┘  │
└─────────────────────────────────────────────────────────────────────┘

2. Component Details¶

2.1 Embeddings¶

Model: intfloat/multilingual-e5-large

Property	Value
Dimensions	768
Max tokens	512
Languages	100+ (English + Dzongkha capable)
Model type	Decoder-only, bi-encoder
Memory	~2GB loaded
Source	HuggingFace (permissive license)

Usage pattern:

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("intfloat/multilingual-e5-large")

# Prefix queries with "query: " and documents with "passage: "
query_embedding = model.encode("query: " + user_query)
doc_embedding   = model.encode("passage: " + chunk_text)

Fallback model: BAAI/bge-large-en-v1.5 (1024 dim, English-only, slightly higher accuracy on English text). Use if Dzongkha performance is unsatisfactory.

2.2 Vector Store: ChromaDB¶

Property	Value
Version	0.5.x+
Storage	Persistent (chroma_db/ directory)
Distance	Cosine (default)
Metadata filtering	Supported (pillar, doc_type, tier, date)
Max collection size	Unlimited (disk-bound)

Collection schema:

collection = client.create_collection(
    name="gmc_documents",
    metadata={"hnsw:space": "cosine"}
)

# Document schema
collection.add(
    embeddings=[[...], ...],           # 768-dim float arrays
    documents=["chunk text", ...],      # raw text
    metadatas=[{
        "doc_id": "GMC-001",
        "title": "GMC Homepage",
        "pillar": "pillar-all",
        "doc_type": "core-vision",
        "tier": 1,
        "language": "en",
        "chunk_index": 3,
        "url": "https://gmc.bt",
        "ingested_at": "2026-06-23"
    }, ...],
    ids=["GMC-001-chunk-003", ...]
)

2.3 Full-Text Index: SQLite FTS5¶

CREATE VIRTUAL TABLE docs_fts USING fts5(
    doc_id, title, chunk_text, pillar,
    content='documents',
    content_rowid='rowid'
);

-- BM25 query
SELECT * FROM docs_fts
WHERE docs_fts MATCH 'geothermal energy sustainability'
ORDER BY rank
LIMIT 50;

2.4 Hybrid Search Fusion¶

Both dense and sparse results are merged using Reciprocal Rank Fusion (RRF):

def reciprocal_rank_fusion(dense_results, sparse_results, k=60):
    scores = {}
    for rank, (doc_id, score) in enumerate(dense_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
    for rank, (doc_id, score) in enumerate(sparse_results):
        scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
    return sorted(scores.items(), key=lambda x: x[1], reverse=True)

2.5 Cross-Encoder Reranker¶

Model: BAAI/bge-reranker-v2-m3

Property	Value
Type	Cross-encoder (query + doc pair)
Max tokens	8192
Languages	Multilingual
Memory	~4GB loaded

from transformers import AutoModelForSequenceClassification, AutoTokenizer

reranker = AutoModelForSequenceClassification.from_pretrained(
    "BAAI/bge-reranker-v2-m3"
)
tokenizer = AutoTokenizer.from_pretrained("BAAI/bge-reranker-v2-m3")

pairs = [[query, chunk] for chunk in top_75_chunks]
inputs = tokenizer(pairs, padding=True, truncation=True, return_tensors="pt")
scores = reranker(**inputs).logits.view(-1).tolist()

# Re-sort by reranker score, keep top 15-25
scored = sorted(zip(top_75_chunks, scores), key=lambda x: x[1], reverse=True)

2.6 LLM: Qwen 2.5 (via Ollama)¶

Model: qwen2.5:14b (recommended) or qwen2.5:7b (minimum)

Property	Value
Context window	128K tokens
Languages	English + Chinese + Dzongkha capability
Quantization	Q4_K_M (7B ~4.5GB, 14B ~9GB)
API	Ollama REST API (localhost:11434)

import requests

response = requests.post("http://localhost:11434/api/chat", json={
    "model": "qwen2.5:14b",
    "messages": [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_query_with_context}
    ],
    "options": {
        "temperature": 0.3,
        "top_p": 0.9,
        "max_tokens": 4096
    }
})

Why Qwen 2.5: - Strong multilingual support (important for Dzongkha) - 128K context (can fit full system prompt + many retrieved chunks) - Competes with Llama 3.1 on English benchmarks - Available via Ollama (easy local deployment)

3. File System Layout¶

/home/nahar/Documents/code/GMC/
├── docs/
│   ├── information-directory.md    # document index (this file)
│   ├── prd.md                      # product requirements document
│   ├── system-prompt.md            # system prompt (all reasoning paths)
│   └── technical-architecture.md   # this file
│
├── data/
│   ├── raw/                        # raw scraped documents
│   │   ├── gmc_bt/                 # from gmc.bt
│   │   ├── fourthsector_org/       # from fourthsector.org
│   │   ├── wikipedia/              # from Wikipedia
│   │   └── media/                  # from news/media sources
│   ├── processed/                  # chunked documents (JSONL)
│   │   ├── chunks.jsonl            # all chunks with metadata
│   │   └── summaries.jsonl         # document-level summaries
│   └── chroma_db/                  # ChromaDB persistent store
│
├── agent/
│   ├── __init__.py
│   ├── main.py                     # CLI entry point
│   ├── router.py                   # query classifier
│   ├── paths/
│   │   ├── __init__.py
│   │   ├── pillar_alignment.py     # Path 1
│   │   ├── for_benefit_design.py   # Path 2
│   │   ├── gnh_impact.py           # Path 3
│   │   ├── sustainability.py       # Path 4
│   │   ├── legal_check.py          # Path 5
│   │   ├── ecosystem_mapping.py    # Path 6
│   │   ├── mindful_capitalism.py   # Path 7
│   │   └── comparative.py          # Comparative mode
│   ├── retrieval/
│   │   ├── __init__.py
│   │   ├── embedder.py             # embedding generation
│   │   ├── hybrid_search.py        # dense + BM25 search
│   │   ├── reranker.py             # cross-encoder reranking
│   │   └── context.py              # context assembly
│   ├── storage/
│   │   ├── __init__.py
│   │   ├── vector_store.py         # ChromaDB interface
│   │   ├── fulltext.py             # SQLite FTS5 interface
│   │   ├── document_store.py       # JSONL document store
│   │   └── decision_logger.py      # session logging
│   ├── session/
│   │   ├── __init__.py
│   │   ├── workspace.py            # workspace manager
│   │   └── history.py              # conversation history
│   └── prompts/
│       ├── __init__.py
│       ├── system.py               # base system prompt
│       ├── paths/                  # path-specific prompt templates
│       └── router.py               # router classifier prompt
│
├── scripts/
│   ├── scrape_gmc.py               # scrape gmc.bt
│   ├── scrape_fourthsector.py      # scrape fourthsector.org
│   ├── scrape_wikipedia.py         # scrape relevant Wikipedia pages
│   ├── ingest.py                   # chunk + embed + store
│   └── test_retrieval.py           # manual retrieval quality tests
│
├── tests/
│   ├── test_retrieval.py
│   ├── test_paths.py
│   ├── test_router.py
│   └── fixtures/                   # test document fixtures
│
├── exports/                        # user exports (markdown/JSON)
│
├── docker-compose.yml              # Ollama + ChromaDB + agent
├── Dockerfile                      # agent container
├── requirements.txt                # Python dependencies
└── Makefile                        # common commands

4. Data Flow: End-to-End Query¶

User: "Evaluate a wellness tourism cooperative in GMC"

1. ROUTER CLASSIFIES
   → Path 1 (Pillar: Health/Wellness + Tourism)
   → Path 2 (For-Benefit: cooperative = stakeholder governance)
   → Path 3 (GNH: community vitality, health, living standards)
   → Path 7 (Mindful Capitalism: balance check)
   → Retrieval mode: STANDARD

2. QUERY EXPANSION (3 variants)
   a. "wellness tourism cooperative in GMC"
   b. "for-benefit cooperative health tourism Gelephu"
   c. "GMC wellness cooperative enterprise design"

3. METADATA FILTER
   pillar IN (health-wellness, tourism) OR doc_type IN (case-study, core-vision)

4. HYBRID SEARCH
   Dense(top 50) + BM25(top 50) → RRF merge → top 75

5. RERANKER
   Cross-encoder scores → sort → top 20 chunks

6. CONTEXT ASSEMBLY
   Inject top 20 chunks with [Source: ...] tags into system prompt

7. GENERATION
   LLM receives: system prompt + 20 chunks + user query
   → Outputs structured evaluation per Path 1, 2, 3, 7
   → Each claim cites source
   → Gap analysis included

8. LOG
   Full query + chunks + response saved to session workspace

5. Document Ingestion Pipeline¶

                    ┌─────────────────┐
                    │  Web Scraper    │
                    │  (httpx +       │
                    │   BeautifulSoup)│
                    └────────┬────────┘
                             │ raw HTML
                             ▼
                    ┌─────────────────┐
                    │  Text Extractor │
                    │  (markdownify,  │
                    │   readability)  │
                    └────────┬────────┘
                             │ clean text
                             ▼
                    ┌─────────────────┐
                    │  Metadata       │
                    │  Annotator      │
                    │  (pillar, type, │
                    │   tier, lang)   │
                    └────────┬────────┘
                             │ annotated text
                             ▼
                    ┌─────────────────┐
                    │  Hierarchical   │
                    │  Chunker        │
                    │  (summary +     │
                    │   semantic +    │
                    │   sentence)     │
                    └────────┬────────┘
                             │ chunks
                             ▼
              ┌──────────────┴──────────────┐
              │                             │
              ▼                             ▼
     ┌──────────────────┐        ┌──────────────────┐
     │  Embedder        │        │  Text Indexer    │
     │  (E5-large)      │        │  (SQLite FTS5)   │
     └────────┬─────────┘        └────────┬─────────┘
              │                           │
              ▼                           ▼
     ┌──────────────────┐        ┌──────────────────┐
     │  ChromaDB        │        │  SQLite DB       │
     │  (vectors +      │        │  (full-text +    │
     │   metadata)      │        │   metadata)      │
     └──────────────────┘        └──────────────────┘

6. Workspace & Session Data Model¶

@dataclass
class Workspace:
    id: str                          # "digital-assets-framework"
    name: str                        # "Digital Assets Framework"
    created_at: datetime
    metadata_filters: dict           # {"pillar": "pillar-finance-digital"}
    session_count: int = 0

@dataclass
class Session:
    id: str                          # uuid
    workspace_id: str                # foreign key to workspace
    created_at: datetime
    messages: List[Message]

@dataclass
class Message:
    role: Literal["user", "assistant"]
    content: str
    retrieved_chunks: List[ChunkRef]  # which chunks were used
    paths_activated: List[int]        # which paths ran
    timestamp: datetime

@dataclass
class ChunkRef:
    doc_id: str                      # "GMC-004"
    chunk_index: int                 # 3
    text: str                        # "The intention of this pledge is clear..."
    score: float                     # relevance score

7. Deployment¶

Docker Compose¶

version: '3.8'
services:
  ollama:
    image: ollama/ollama:latest
    ports:
      - "11434:11434"
    volumes:
      - ./data/ollama:/root/.ollama
    command: serve
    restart: unless-stopped

  chromadb:
    image: chromadb/chroma:latest
    ports:
      - "8000:8000"
    volumes:
      - ./data/chroma_db:/chroma/chroma
    environment:
      - IS_PERSISTENT=TRUE
    restart: unless-stopped

  agent:
    build: .
    ports:
      - "8080:8080"
    volumes:
      - ./data:/app/data
      - ./agent:/app/agent
      - ./exports:/app/exports
      - ./docs:/app/docs
    environment:
      - OLLAMA_HOST=http://ollama:11434
      - CHROMA_HOST=http://chromadb:8000
    depends_on:
      - ollama
      - chromadb
    stdin_open: true
    tty: true
    command: python -m agent.main

Makefile Commands¶

# Scrape documents
scrape:
    python scripts/scrape_gmc.py
    python scripts/scrape_fourthsector.py
    python scripts/scrape_wikipedia.py

# Ingest into vector store
ingest:
    python scripts/ingest.py

# Run the agent CLI
run:
    docker-compose up agent

# Run tests
test:
    python -m pytest tests/ -v

# Export decision log
export:
    python -m agent.export --workspace $(workspace)

# Rebuild vector store (full reset)
rebuild:
    rm -rf data/chroma_db data/processed
    make ingest

8. Requirements¶

requirements.txt¶

# Core
sentence-transformers>=3.0.0
chromadb>=0.5.0
ollama>=0.3.0
httpx>=0.27.0

# Retrieval
numpy>=1.24.0
scipy>=1.11.0

# Storage
datasets>=2.18.0
sqlite-utils>=3.36

# Scraping
beautifulsoup4>=4.12.0
markdownify>=0.12.0
readability-lxml>=0.8.1
lxml>=5.1.0

# CLI
rich>=13.0.0
typer>=0.12.0

# Utilities
python-dateutil>=2.8.0
pydantic>=2.0.0

9. Performance Budgets¶

Operation	Budget	Measured by
Query classification	<200ms	router.invoke()
Hybrid search (50+50)	<500ms	hybrid_search.search()
Reranker (75 pairs)	<3s	reranker.rerank()
Context assembly	<100ms	context.assemble()
LLM generation (Qwen 2.5 7B)	<8s	LLM response time
Total (Standard mode)	<15s	End-to-end
Document embedding	<1s/doc	embedder.embed_document()
Ingestion throughput	~50 docs/min	Full pipeline

Profiling output (expected for Standard mode):

Router: 45ms
  ├── Intent classification: 22ms
  ├── Path selection: 15ms
  └── Mode selection: 8ms

Retrieval: 2.1s
  ├── Query expansion: 12ms
  ├── Metadata filter: 4ms
  ├── Dense search (ChromaDB): 280ms
  ├── Sparse search (SQLite FTS5): 120ms
  ├── RRF merge: 2ms
  └── Reranker (75 pairs): 1.7s

Context assembly: 95ms
  ├── Format chunks: 40ms
  ├── Apply citations: 35ms
  └── Trim to window: 20ms

Generation (Qwen 2.5 7B, Q4_K_M): 8.2s
  ├── Prompt processing: 1.1s
  └── Token generation (600 tokens @ 8t/s): 7.1s

Total: 10.4s