Glossary & Disambiguation
This page is canonical. The system has accumulated several overloaded
terms — source vs signal_source, "coverage" vs "confidence", "raw"
vs "displayed" scores. This page is the disambiguating reference; other
pages link here rather than re-explain.
The structure follows the Vextor ops-doc pattern: one row per distinct meaning, with the canonical form spelled out so confusion patterns can be retired by reference.
Disambiguation: Overloaded Terms
Each overloaded term appears below as one row per distinct meaning, making the overload explicit.
source — 2 meanings
| Meaning | Scope | Canonical form | Key distinguisher |
|---|---|---|---|
| Database table holding source records | DB schema | signal_source (table name in current schema) |
The actual table is named signal_source in migrations 001/002/006. ADR-0003 uses the shorter source in prose; both refer to the same thing. |
| Conceptual "data source" the system scrapes | Domain language | "source" or "data source" | Used in user-facing copy (/methodology page) and ADR language. Refers to the abstract upstream (e.g. "the BHRRC source"), not the DB row. |
Common confusion pattern: "Check the source table" is ambiguous —
prefer "check the signal_source table" for DB references; reserve
"source" for domain talk. Migration 012 may rename the table to source
in due course; until it does, the table name is signal_source.
coverage — 3 meanings
| Meaning | Scope | Canonical form | Key distinguisher |
|---|---|---|---|
| Sub-criterion-level binary | DB column | score_sub_criterion.coverage_pct (0.0 or 1.0) |
At sub-criterion grain, coverage_pct is binary: 1.0 if any signal exists for this rule × institution × run, 0.0 otherwise. EXISTS-based per migration 009. |
| Pillar / Stage 1 / Stage 2 ratio | DB column | score_pillar.coverage_pct and equivalents (0.0–1.0) |
At pillar+ grain, coverage_pct is the unweighted ratio of covered applicable rules to total applicable rules. |
| UI hero metric | Display | coverage (rendered as percentage with RAG band) |
What the user sees in the institution detail hero. Same number as Stage 1 coverage_pct, formatted as integer percent. |
Common confusion pattern: "Barclays is at 11% coverage" is the displayed Stage 1 coverage_pct (3 of 27 applicable rules covered). At the rule level individual rules are either covered (1.0) or not (0.0), not 11%. When discussing a rule, say "covered" or "uncovered"; when discussing an institution or pillar, give the percent.
confidence — 2 meanings
| Meaning | Scope | Canonical form | Key distinguisher |
|---|---|---|---|
| Per-signal quality measure | DB column | signal.confidence (0.0–1.0) |
How sure is the scraper about this finding. e.g. SBTi "not found" returns conf=0.5 because absence could be a name-match issue. |
| Aggregated propagated quality | DB column | score_*.confidence (0.0–1.0) |
Coverage × quality at sub-criterion, weighted mean propagating up. |
Confidence is NOT coverage. Confidence answers "how sure are we about this finding"; coverage answers "did we look at all?" A rule with high-confidence boolean=0 (e.g. NatWest's "Commitment removed" on SBTi) is fully covered, even though it's a negative finding.
score — 4 meanings
| Meaning | Scope | Canonical form | Key distinguisher |
|---|---|---|---|
| Raw v0.4 methodology composite | DB column | score_composite.composite_raw_v04 |
Faithful v0.4 calculation including 50/100 base values for uncovered rules. Stored for audit, not displayed. |
| Coverage-weighted displayed composite | DB column | score_composite.composite_coverage_weighted |
Per ADR-0001 — average only over rules with live signals. This is the headline number. |
| Existing alias column | DB column | score_composite.composite |
After migration 011, an alias of composite_coverage_weighted. Kept for backward compat. |
| Rule-level numeric | DB column | score_sub_criterion.score (0–100) and raw_score (0–5) |
Per-rule scoring; doesn't apply to a whole institution. |
Common confusion pattern: "Barclays scores 19.2" was the v0.4 raw number under low coverage. Per ADR-0001 the displayed number is now ~66.7 (coverage-weighted). Both exist in the DB. The displayed score is what the UI shows; raw is for audit and convergence checks.
signal_source / source_id — 3 meanings
| Meaning | Scope | Canonical form | Key distinguisher |
|---|---|---|---|
| Logical source identifier | DB column | signal_source.source_id (e.g. NZBA-MEMBERS) |
The string code. Used as FK throughout. |
| Display name of source | UI / docs | name column in signal_source (e.g. "UNEP FI Net-Zero Banking Alliance") |
The human-readable label. Don't use the source_id in UI copy. |
| Scraper module | Filesystem | src/scrapers/<name>.js (e.g. nzba.js) |
The code that fetches and parses that source. Module name is lowercase, source_id is SCREAMING-KEBAB. |
Pattern note: SBTI-DASHBOARD and SBTI-CORPORATE are legacy
misnomers (source is an Excel file, not a dashboard or CSV). Rename to
SBTI-VALIDATED deferred — known parked item.
run — 3 meanings
| Meaning | Scope | Canonical form | Key distinguisher |
|---|---|---|---|
| Scrape execution | DB row | scrape_run.run_id (integer) |
One row per scrape execution. Created by run.js runner. Holds start_at, finish_at, status, error_count. |
| Scoring execution | Conceptual | "score run" | Re-running the scoring engine against an existing scrape run. Doesn't create a new scrape_run row — overwrites score_* rows for the same run_id. |
| Cron-triggered combined | Conceptual | "weekly run" | Sunday 02:00 UTC scheduled execution that does scrape then score. |
Common confusion pattern: "Re-run the scores" doesn't create a new
run_id — it updates the score rows for the existing one. Only the
scraper run creates new run_ids.
institution — 2 meanings
| Meaning | Scope | Canonical form | Key distinguisher |
|---|---|---|---|
| The entity being screened | DB row | institution.institution_id (LEI) |
LEI-keyed. Pilot set is 8: 4 UK banks + 4 non-financials. |
| Legal entity identifier (general) | Domain language | LEI | A 20-character GLEIF-issued identifier. Always for the holding company, not operating subsidiaries (Barclays PLC, not Barclays Bank PLC). |
Pattern note: Default to holding-company LEI for any
institution. Operating subsidiary LEIs are used only for source-specific
lookups (e.g. UK Modern Slavery Act, which is filed by the subsidiary).
That sub-entity ID lives in scraper_config_json.
rule — 2 meanings
| Meaning | Scope | Canonical form | Key distinguisher |
|---|---|---|---|
| Atomic scoreable unit | DB row | rule.rule_id (e.g. E1.1, G3-PRB.1) |
The thing a scraper feeds and the scoring engine evaluates. 148 rows: 89 financial + 24 universal + 35 non-financial. |
| Sub-criterion (grouping above rules) | Conceptual / DB | sub_criterion (e.g. E1) |
A group of rules under a sub-criterion code. The v0.4 hierarchy is pillar → sub-criterion → rule. |
Pattern note: ADR-0005 adds a third grouping — themes — for UI presentation only. Themes don't appear in the DB or affect scoring.
applicability — 2 meanings
| Meaning | Scope | Canonical form | Key distinguisher |
|---|---|---|---|
| Which rules apply to which institution | DB column | rule.applicable_sectors (GICS code string) |
'40' = financials only; 'ALL' = universal; specific GICS codes = sector-scoped. |
| Which scrapers run for which institution | Conceptual | Routing logic in run.js |
Determined at scraper-runner level based on institution sector. Financials-only scrapers (NZBA, SBTi for banks) skip non-financial institutions. |
Disambiguation: Stages
Stage 1 vs Stage 2
| Stage 1 | Stage 2 | |
|---|---|---|
| Scope | All institutions (financial + non-financial) | Financial institutions only (GICS sector 40) |
| Combines | E pillar + S pillar + G pillar | Stage 1 ESG + Credit + Returns |
| DB table | score_stage1_esg |
score_stage2_composite |
| Status | Live | Stage 1 only — Stage 2 credit/returns dimensions placeholder ("awaiting") |
| Triangulation thesis | No | Yes — original framework justification |
The triangulation thesis (ESG × credit × returns) was the original banks-only justification. The two-stage split honestly separates the universal ESG-only screening from the financials-only triangulation.
Disambiguation: Documentation Surfaces
| Surface | What it is | What it's for |
|---|---|---|
ADRs (docs/adr/) |
Standing design decisions | "Why is the system shaped this way?" |
Design note (DESIGN_NOTE_v2_two_stage.md) |
Architecture rationale | "Why did we go to two-stage?" |
| Migrations | Schema record | "What's in the DB?" |
| v0.4 workbook | Framework spec | Historical reference. Framework lives on; implementation diverges. |
Slack #esg-screening |
Event log | "What changed in the last session?" |
CLAUDE.md (in repo) |
CC-on-VM operating protocol | How Claude behaves on the VM |
| Project instructions (claude.ai) | Chat-session operating protocol | How chat sessions behave |
ops.esg-screen.org (this site) |
Standing reference | The shape of the system, not what just happened |
Quick reference: terms that are NOT overloaded
For completeness — a few terms that get used loosely but have a single canonical meaning:
- GICS — Global Industry Classification Standard. Always the
MSCI/S&P 4-level hierarchy.
gics_classificationtable holds the taxonomy;institution.gics_*columns hold per-institution classification. - Peer group —
peer_grouptable. Per institution, the system finds peers via GICS fallback ladder: sub-industry → industry → industry group → sector. Thresholds 10/15/25. - Red flag — manual-at-intake binary per v0.4. Fossil financing,
weapons manufacturing, etc. Set on
institution, not derived from signals. Per ADR-0004, no amber state. - Watchlist — surface for findings not captured by score (e.g. NatWest's withdrawn SBTi commitment, BHRRC allegations). Schema pending; expected as part of BHRRC scraper cycle.
When you find a new disambiguation
This page is canonical. New overloads land here, not in scattered comments. Add a row, link to it, move on. The point is that future-you shouldn't have to re-derive the distinction.