Skip to content

Architecture Overview

A new session's first read for understanding how the system is shaped. Single page, dense, end-to-end. Other pages drill in.


What the system does

  1. Collects public ESG data about a set of pilot institutions from structured upstream sources (NZBA membership list, SBTi targets dashboard, PRB signatories, etc.).
  2. Scores each institution against a rule catalogue (148 rules across pillars E / S / G, sub-criteria, and themes).
  3. Displays the scores, coverage, and underlying signals through a server-rendered web UI gated behind Cloudflare Access.
  4. Refreshes weekly via a cron-triggered scrape + score cycle.

Stage 1 (universal ESG) is live. Stage 2 (financials triangulation — ESG × credit × returns) has ESG in place; credit and returns dimensions are placeholders.


Two-stage architecture

The system is split into two scoring stages, recorded in ADR-style narrative in the design note.

Stage Scope Composite Status
Stage 1 — ESG All institutions (financial + non-financial) E + S + G pillar weighted average, coverage-weighted per ADR-0001 Live
Stage 2 — Triangulation Financial institutions only (GICS 40) Stage 1 ESG + Credit + Returns ESG in place; Credit/Returns awaiting

The original v0.4 spec was banks-only single-stage; the two-stage split emerged in cycle 1 of the v0.5 work to honestly separate universal ESG screening from the financials-only triangulation thesis. See the design note for the full rationale.


High-level data flow

┌───────────────────────────────────────────────────────────────────────┐
│                                                                        │
│   Upstream sources                                                     │
│   ─────────────                                                        │
│   • NZBA, SBTi, PRB (membership lists — live)                          │
│   • BHRRC, UK MSA, UK GPG, BankTrack, PAX (planned)                    │
│   • Corporate disclosure / IFRS S2 (heavy engineering, future)         │
│                                                                        │
│              │                                                         │
│              ▼ (scrapers, src/scrapers/*.js)                           │
│                                                                        │
│   SQLite at data/esg.db                                                │
│   ─────────────────────                                                │
│   • Reference: rule, signal_source, peer_group, gics_classification    │
│   • Institutions: institution × 8 pilot                                │
│   • Run records: scrape_run (one per scraper execution)                │
│   • Signals: signal (one row per run × institution × rule when found)  │
│   • Scores: score_sub_criterion, score_pillar,                         │
│             score_stage1_esg, score_stage2_composite                   │
│                                                                        │
│              │                                                         │
│              ▼ (scoring engine, src/scoring/*.js)                      │
│                                                                        │
│   Scores written back to DB                                            │
│   ─────────────────────                                                │
│   • Per rule, per pillar, per stage, per run                           │
│   • Both raw (v0.4) and coverage-weighted (ADR-0001)                   │
│                                                                        │
│              │                                                         │
│              ▼ (read interface, ADR-0002, pending)                     │
│                                                                        │
│   Web UI at esg-screen.org                                             │
│   ───────────────────────                                              │
│   • Server-rendered EJS/Handlebars via Express                         │
│   • Cloudflare Access OTP gate                                         │
│   • Five pages: institution index, institution detail,                 │
│     methodology, runs, watchlist                                       │
│                                                                        │
└───────────────────────────────────────────────────────────────────────┘

For the data model in detail, see Data model. For the scoring methodology in prose, see Scoring. For the stack (hosting, deployment, secrets), see Stack. For the UI design, see UI pages and Design system.


Where each ADR's decision lives in the stack

ADR Decision Where it lives
0001 Coverage-weighted scoring Migration 011, scoring engine modules
0002 Server-rendered read interface src/ui/, Express routes
0003 Source register + Methodology page Migration 012, /methodology route
0004 RAG thresholds src/config/rag.js, base stylesheet
0005 Theme-based rule grouping src/config/rule-themes.js, detail page template

What the system does NOT do

Recording what's out of scope so it doesn't keep getting raised:

  • No portfolio analysis. Per-institution screening only. A portfolio view is a v2+ surface.
  • No write actions. v1 is read-only. Manual intake (red flags, shareholder data) happens outside the UI.
  • No client login. Cloudflare Access OTP only, 5-address allow list.
  • No paid data sources. Only public data, structurally.
  • No real-time scraping. Weekly cron only. There is no "scrape on demand" surface in v1.
  • No automated email or notification. Watchlist surface in UI; no push.
  • No mobile design. Desktop only. Screening happens at desks.
  • No comparison view in v1. Per-institution depth, not cross-institution comparison. Comparison is a v2 surface.

Scale assumptions

The system is designed for:

  • ~10 users (the McMillan Grubb team + occasional client view)
  • 8–50 institutions in the pilot, scaling to ~350 banks (PRB-canonical list) when pilot graduates
  • Weekly refresh cadence
  • ~3,000–20,000 signal rows per year (8k at current rate, more as scrapers land)

SQLite is more than sufficient for this profile. No Postgres or external DB planned. If scale assumptions change materially, revisit — but nothing on the current roadmap pushes those bounds.


Separation from Vextor

This system is fully separate from Vextor. Different repo, different VM, different secrets, different Slack channel, different DNS. The shared thing is the operator. Operational patterns may look similar because those are Rob's habits — not because the projects are related.

Specifically:

  • Do not read Vextor's CLAUDE.md or canonical docs as authority for this project
  • Different Cloudflare account: mcmillangrubb
  • Different Azure RG: rg-esg-screening
  • Different VM: esg-screening-01
  • Different DNS zone: esg-screen.org

Roadmap, near-term

From the 19 May handoff, the cycle plan to ship a viable v1:

  1. Commit ADRs 0001–0005 + mock + ADR-0006 (this site)
  2. Migration 011 — coverage-weighted scoring
  3. Read interface — pages 1 and 2 (index + detail per mock v4)
  4. Migration 012 + /methodology page (source register)
  5. BHRRC scraper (S2, S5, E7, G5 + watchlist)
  6. UK MSA + UK GPG scrapers (low-effort government registers)
  7. BankTrack + PAX scrapers (financials-only fossil/weapons rules)

After these 6 cycles, coverage rises from 11% to ~35–45% on financials, banks stop scoring identically, watchlist populates with real findings. That's the threshold for a "viable web-based screening product."

Beyond that, deferred:

  • TCFD-shaped gap → IFRS S2 corporate-disclosure scraper
  • Profundo enrichment scraper
  • Corporate website / sustainability report parser (heavy engineering)
  • Comparison view, exception filters, client-facing summaries
  • PDF export
  • PRB seed expansion (~350 banks)
  • Stage 2 credit and returns dimensions