Architecture Overview
A new session's first read for understanding how the system is shaped. Single page, dense, end-to-end. Other pages drill in.
What the system does
- Collects public ESG data about a set of pilot institutions from structured upstream sources (NZBA membership list, SBTi targets dashboard, PRB signatories, etc.).
- Scores each institution against a rule catalogue (148 rules across pillars E / S / G, sub-criteria, and themes).
- Displays the scores, coverage, and underlying signals through a server-rendered web UI gated behind Cloudflare Access.
- Refreshes weekly via a cron-triggered scrape + score cycle.
Stage 1 (universal ESG) is live. Stage 2 (financials triangulation — ESG × credit × returns) has ESG in place; credit and returns dimensions are placeholders.
Two-stage architecture
The system is split into two scoring stages, recorded in ADR-style narrative in the design note.
| Stage | Scope | Composite | Status |
|---|---|---|---|
| Stage 1 — ESG | All institutions (financial + non-financial) | E + S + G pillar weighted average, coverage-weighted per ADR-0001 | Live |
| Stage 2 — Triangulation | Financial institutions only (GICS 40) | Stage 1 ESG + Credit + Returns | ESG in place; Credit/Returns awaiting |
The original v0.4 spec was banks-only single-stage; the two-stage split emerged in cycle 1 of the v0.5 work to honestly separate universal ESG screening from the financials-only triangulation thesis. See the design note for the full rationale.
High-level data flow
┌───────────────────────────────────────────────────────────────────────┐
│ │
│ Upstream sources │
│ ───────────── │
│ • NZBA, SBTi, PRB (membership lists — live) │
│ • BHRRC, UK MSA, UK GPG, BankTrack, PAX (planned) │
│ • Corporate disclosure / IFRS S2 (heavy engineering, future) │
│ │
│ │ │
│ ▼ (scrapers, src/scrapers/*.js) │
│ │
│ SQLite at data/esg.db │
│ ───────────────────── │
│ • Reference: rule, signal_source, peer_group, gics_classification │
│ • Institutions: institution × 8 pilot │
│ • Run records: scrape_run (one per scraper execution) │
│ • Signals: signal (one row per run × institution × rule when found) │
│ • Scores: score_sub_criterion, score_pillar, │
│ score_stage1_esg, score_stage2_composite │
│ │
│ │ │
│ ▼ (scoring engine, src/scoring/*.js) │
│ │
│ Scores written back to DB │
│ ───────────────────── │
│ • Per rule, per pillar, per stage, per run │
│ • Both raw (v0.4) and coverage-weighted (ADR-0001) │
│ │
│ │ │
│ ▼ (read interface, ADR-0002, pending) │
│ │
│ Web UI at esg-screen.org │
│ ─────────────────────── │
│ • Server-rendered EJS/Handlebars via Express │
│ • Cloudflare Access OTP gate │
│ • Five pages: institution index, institution detail, │
│ methodology, runs, watchlist │
│ │
└───────────────────────────────────────────────────────────────────────┘
For the data model in detail, see Data model. For the scoring methodology in prose, see Scoring. For the stack (hosting, deployment, secrets), see Stack. For the UI design, see UI pages and Design system.
Where each ADR's decision lives in the stack
| ADR | Decision | Where it lives |
|---|---|---|
| 0001 | Coverage-weighted scoring | Migration 011, scoring engine modules |
| 0002 | Server-rendered read interface | src/ui/, Express routes |
| 0003 | Source register + Methodology page | Migration 012, /methodology route |
| 0004 | RAG thresholds | src/config/rag.js, base stylesheet |
| 0005 | Theme-based rule grouping | src/config/rule-themes.js, detail page template |
What the system does NOT do
Recording what's out of scope so it doesn't keep getting raised:
- No portfolio analysis. Per-institution screening only. A portfolio view is a v2+ surface.
- No write actions. v1 is read-only. Manual intake (red flags, shareholder data) happens outside the UI.
- No client login. Cloudflare Access OTP only, 5-address allow list.
- No paid data sources. Only public data, structurally.
- No real-time scraping. Weekly cron only. There is no "scrape on demand" surface in v1.
- No automated email or notification. Watchlist surface in UI; no push.
- No mobile design. Desktop only. Screening happens at desks.
- No comparison view in v1. Per-institution depth, not cross-institution comparison. Comparison is a v2 surface.
Scale assumptions
The system is designed for:
- ~10 users (the McMillan Grubb team + occasional client view)
- 8–50 institutions in the pilot, scaling to ~350 banks (PRB-canonical list) when pilot graduates
- Weekly refresh cadence
- ~3,000–20,000 signal rows per year (8k at current rate, more as scrapers land)
SQLite is more than sufficient for this profile. No Postgres or external DB planned. If scale assumptions change materially, revisit — but nothing on the current roadmap pushes those bounds.
Separation from Vextor
This system is fully separate from Vextor. Different repo, different VM, different secrets, different Slack channel, different DNS. The shared thing is the operator. Operational patterns may look similar because those are Rob's habits — not because the projects are related.
Specifically:
- Do not read Vextor's
CLAUDE.mdor canonical docs as authority for this project - Different Cloudflare account:
mcmillangrubb - Different Azure RG:
rg-esg-screening - Different VM:
esg-screening-01 - Different DNS zone:
esg-screen.org
Roadmap, near-term
From the 19 May handoff, the cycle plan to ship a viable v1:
- Commit ADRs 0001–0005 + mock + ADR-0006 (this site)
- Migration 011 — coverage-weighted scoring
- Read interface — pages 1 and 2 (index + detail per mock v4)
- Migration 012 +
/methodologypage (source register) - BHRRC scraper (S2, S5, E7, G5 + watchlist)
- UK MSA + UK GPG scrapers (low-effort government registers)
- BankTrack + PAX scrapers (financials-only fossil/weapons rules)
After these 6 cycles, coverage rises from 11% to ~35–45% on financials, banks stop scoring identically, watchlist populates with real findings. That's the threshold for a "viable web-based screening product."
Beyond that, deferred:
- TCFD-shaped gap → IFRS S2 corporate-disclosure scraper
- Profundo enrichment scraper
- Corporate website / sustainability report parser (heavy engineering)
- Comparison view, exception filters, client-facing summaries
- PDF export
- PRB seed expansion (~350 banks)
- Stage 2 credit and returns dimensions