PoliPrism Architecture
Pipeline, schema, API, and deploy reference for the development team
1. Architecture Overview
Congress.gov, Open States, FEC, USASpending → INGEST
13 scripts → Raw Tables
prism_bills, openstates_bills, raw_fec_* → PROCESS
8 scripts → Entity Tables + MVs
entities_*, mv_* → API
FastAPI on :8077 → Browser
JS apps fetch /api/*
4 scripts → Static Assets
CSS, hubs, search JSON, sitemap
2. Data Pipeline (Three Layers)
Master Runner
poliprism-pipeline.sh --ingest # Run all ingest scripts poliprism-pipeline.sh --process # Run all scoring/processing poliprism-pipeline.sh --build # Generate static assets poliprism-pipeline.sh --all # Ingest → Process → Build poliprism-pipeline.sh --all --only federal-bills,score-federal-bills,css,search
Ingest Layer
External APIs/files → raw/staging database tables. Each script is idempotent and independently runnable.
| Script | Source | Target Tables |
|---|---|---|
federal-legislators.sh | unitedstates JSON + Congress.gov | people, people_offices |
state-legislators.sh | Open States API / CSV | people, offices, people_offices |
ny-legislators.sh | NY Open Legislation API | people, people_offices |
federal-bills.sh | Congress.gov API v3 | prism_bills |
federal-bill-texts.sh | Congress.gov (text fetch) | raw_congress_bill_texts |
federal-enrichment.sh | Congress.gov + unitedstates YAML | people (committees, sponsored bills) |
state-bills.sh | Open States API v3 | openstates_bills |
state-bill-texts.sh | HTTP (bill text URLs) | openstates_bills.bill_text |
fec.sh | FEC API (4-phase) | raw_fec_*, campaign_contributions |
usaspending.sh | USASpending API v2 | raw_usaspending_by_district |
state-assets.sh | Wikimedia Commons | state_reference + images |
profile-photos.sh | HTTP (photo URLs) | people.image_path + images |
entities.sh | DB transform (runs last) | entities_legislators, entities_bills, committees, sponsorships |
Process Layer
Scoring, classification, and view refresh. Runs after ingest; reads from entity tables, writes scores and refreshes materialized views.
| Script | Purpose |
|---|---|
promote-bill-texts.sh | Promote raw bill texts → canonical bill_texts + enqueue scoring |
score-federal-bills.sh | Ollama LLM scoring of federal bills (Prism score + taxonomy) |
score-state-bills.sh | Ollama LLM scoring of state bills |
score-legislators.sh | 4 legislator scores: Prism, effectiveness, bipartisanship, donor alignment |
classify-organizations.sh | Ollama industry classification for PACs/organizations |
entity-resolver.sh | Fuzzy-link entities to FEC/OpenStates IDs (pg_trgm) |
standardize-profiles.sh | Backfill profile_page, external_ids, contact_json |
refresh-views.sh | REFRESH MATERIALIZED VIEW CONCURRENTLY (always last) |
Build Layer
Generates non-data static artifacts only. All data pages are served by the API at runtime.
| Script | Output |
|---|---|
css.sh | assets/css/poliprism.css (bundled from styles.css) |
hubs.sh | Home, /federal/, /states/ grid, /territories/, 404/50x + state legislature shells |
search.sh | legislators-search.json + bills-search.json (Fuse.js client-side search) |
sitemap.sh | sitemap.xml + robots.txt |
3. Database Schema
Data Flow
Raw / Ingestion Tables
| Table | Source | Purpose |
|---|---|---|
prism_bills | Congress.gov | Federal bills with Prism scores, sponsors, text |
openstates_bills | Open States API | State bills with classifications, sponsors, text |
raw_congress_bill_texts | Congress.gov | Bill text versions (XML/HTML/PDF → plain text) |
raw_fec_candidates | FEC API | FEC candidate master records |
raw_fec_committees | FEC API | FEC committee records (PACs, party committees) |
raw_fec_contributions | FEC API | Campaign contributions (partitioned by election cycle) |
raw_usaspending_by_district | USASpending API | Federal spending by congressional district |
people | Multiple sources | Base person records (name, bio, photo, terms) |
offices | Seed script | Elected offices (federal + state + county) |
people_offices | Sync scripts | Person ↔ office assignments (is_current flag) |
Canonical Entity Tables
| Table | Purpose |
|---|---|
entities_legislators | One row per legislator + role. All four scoring columns. Links to people via slug. |
entities_bills | Unified bill record (federal + state + NY). Prism score, taxonomy, status. |
entities_committees | Congressional and state committees. |
entities_organizations | PACs, party committees, industry-classified organizations. |
bill_taxonomy | LLM-scored taxonomy: primary/secondary categories, policy actions, flags. |
bill_texts | Canonical bill text versions (promoted from raw tables). |
Relationship Tables
| Table | Links | Purpose |
|---|---|---|
bill_sponsorships | legislator ↔ bill | Sponsor / cosponsor relationships |
committee_memberships | legislator ↔ committee | Committee assignments and roles |
campaign_contributions | legislator ↔ organization | FEC contribution records (PAC + individual) |
bill_votes | legislator ↔ bill | Roll call vote positions (yes/no/present/absent) |
usaspending_district_summary | legislator ↔ spending | Federal contracts/grants per district |
Materialized Views (Single Source of Truth)
Both the API and build scripts read from these pre-computed views. Refreshed by refresh_views.py after any Process step.
| View | Rows | Purpose |
|---|---|---|
mv_legislator_profiles | ~8,100+ | Full legislator cards: scores, office, finance, sponsored bills, votes |
mv_bills_list | ~400+ | Bills with sponsor info, taxonomy, text availability, CRS subjects |
mv_bill_detail | ~400+ | Individual bill pages with vote counts, cosponsor counts, text format |
mv_finance_ideology_gap | ~1,500+ | Donor industry vs legislator ideology alignment analysis |
4. API Endpoints
All endpoints served by FastAPI at /api/. Nginx reverse-proxies to 127.0.0.1:8077.
Federal Members
| Endpoint | Params | Data Source |
|---|---|---|
GET /api/members | chamber, state, party, q, page, per_page | mv_legislator_profiles (level=federal) |
GET /api/members/{bioguide_id} | — | mv_legislator_profiles + people (bio) |
GET /api/chambers/stats | — | mv_legislator_profiles (level=federal) |
All Legislators (Federal + State)
| Endpoint | Params | Data Source |
|---|---|---|
GET /api/legislators | level, chamber, state, party, q, score_min/max, sort, page | mv_legislator_profiles |
GET /api/legislators/{slug} | — | mv_legislator_profiles + people |
GET /api/legislators/stats | — | mv_legislator_profiles |
GET /api/legislators/facets | (same filters as list) | mv_legislator_profiles |
State Members
| Endpoint | Params | Data Source |
|---|---|---|
GET /api/states | — | mv_legislator_profiles (level=state) |
GET /api/states/{abbr}/stats | — | mv_legislator_profiles (state filter) |
GET /api/state-members | state (required), chamber, party, q, sort, page | mv_legislator_profiles (level=state) |
GET /api/state-members/{slug} | — | mv_legislator_profiles + people |
Bills (Federal)
| Endpoint | Params | Data Source |
|---|---|---|
GET /api/bills | congress, q, bill_q, bill_type, prism_category, crs_area, rating, sort, page | mv_bills_list (level=federal) |
GET /api/bills/{bill_id} | include_text | mv_bill_detail |
GET /api/bills/stats | — | mv_bills_list |
GET /api/bills/congresses | — | mv_bills_list |
GET /api/bills/facets | (same filters as list) | mv_bills_list |
State Bills
| Endpoint | Params | Data Source |
|---|---|---|
GET /api/state-bills | state_abbr (required), session, q, bill_q, sort | openstates_bills |
GET /api/state-bills/states | — | openstates_bills |
GET /api/state-bills/stats | state_abbr | openstates_bills |
Committees
| Endpoint | Params | Data Source |
|---|---|---|
GET /api/committees | chamber, q, page | entities_committees |
GET /api/committees/{id} | — | entities_committees + memberships + bills |
5. Site Pages & JS Apps
How Pages Work
Every data page is an API-driven shell: a static HTML file loads a JS app, which calls /api/ endpoints and renders the data client-side. No data is baked into the HTML at build time.
Page Inventory
| URL Path | JS App | API Endpoint | Purpose |
|---|---|---|---|
/legislators/ | legislators-app.js | /api/legislators | Browse all legislators (federal + state), filter, sort, search |
/legislators/{slug}/ | legislator-detail.js | /api/legislators/{slug} | Individual legislator profile with scores, committees, bills |
/bills/ | bills-app.js | /api/bills | Browse federal bills, filter by taxonomy/score/congress |
/federal/senate/ | federal-chamber.js | /api/members + /api/chambers/stats | U.S. Senate members grid with party stats |
/federal/house/ | federal-chamber.js | /api/members + /api/chambers/stats | U.S. House members grid with party stats |
/states/{slug}/ | state-app.js | /api/state-members + /api/states/{abbr}/stats | State legislature hub with all legislators |
/states/{slug}/senate/ | state-app.js | /api/state-members | State senate members |
/states/{slug}/house/ | state-app.js | /api/state-members | State house members |
Static Pages (not API-driven)
| URL Path | Generated By | Purpose |
|---|---|---|
/ | build_poliprism_hubs.py | Home page with navigation cards |
/states/ | build_poliprism_hubs.py | States grid with flags |
/federal/ | build_poliprism_hubs.py | Federal hub (links to Senate/House) |
/territories/ | build_poliprism_hubs.py | Territories hub |
/404.html | build_poliprism_hubs.py | Not found page |
6. File Taxonomy
poliprism/ api/ app.py # FastAPI application (all /api/* endpoints) requirements.txt # Python dependencies assets/ css/ poliprism.css # Bundled CSS (generated by build_poliprism_css.py) js/ site.js # Global: search, nav, dark mode legislators-app.js # /legislators/ page app legislator-detail.js # /legislators/{slug}/ detail page bills-app.js # /bills/ page app federal-chamber.js # /federal/senate/ and /federal/house/ state-app.js # /states/{slug}/ pages data/ legislators-search.json # Fuse.js search index (generated) bills-search.json # Fuse.js search index (generated) scripts/ # --- Ingest --- sync_federal_legislators.py # unitedstates JSON → people sync_state_legislators.py # Open States → people sync_prism_bills.py # Congress.gov → prism_bills sync_openstates_bills.py # Open States → openstates_bills sync_fec_data.py # FEC API → raw_fec_* sync_usaspending.py # USASpending → district spending sync_congress_bill_texts.py # Congress.gov → raw bill texts backfill_entities.py # Raw → entity tables populate_committees.py # Build committee entities populate_bill_sponsorships.py # Build sponsorship links # --- Process --- promote_bill_texts.py # Raw texts → canonical bill_texts score_prism_bills.py # Ollama LLM bill scoring score_legislators_prism.py # Legislator Prism score (bill avg) score_legislators_effectiveness.py # Bills passed ratio score_legislators_bipartisan.py # Cross-party cosponsor ratio score_legislators_donor_alignment.py # PAC $ vs bill taxonomy match classify_organizations_industry.py # Ollama org industry classification refresh_views.py # REFRESH MATERIALIZED VIEW # --- Build --- build_poliprism_css.py # styles.css → poliprism.css build_poliprism_hubs.py # Hub pages, territories, 404/50x build_search_index.py # Legislator search JSON build_bills_page.py # Bill search JSON build_sitemap_robots.py # sitemap.xml + robots.txt generate_state_shells.py # State page HTML shells # --- Shared --- html_layout.py # Render page chrome (header/footer) site_config.py # Branding, URLs, constants congress_gov_client.py # Congress.gov API helpers openstates_v3_client.py # Open States API client member_display.py # Member rendering helpers federal_chamber_data.py # Federal chamber helpers infra/deploy/ poliprism-pipeline.sh # Master runner (--ingest/--process/--build/--all) ingest/ # 13 ingestion wrapper scripts process/ # 8 processing wrapper scripts build/ # 4 build wrapper scripts poliprism-daily-bills.sh # Cron: daily federal bills (thin pipeline wrapper) poliprism-rebuild-docroot.sh # Full rebuild (migrations + pipeline --build) deploy/ publish-prism-updates-to-server.ps1 # Main deploy: Windows → VM (smart SCP + rebuild) publish-poliprism-minimal.ps1 # Minimal deploy: bills/API files only
7. Deploy Flow & Cron
Deploy Flow
Windows repo → publish-prism-updates-to-server.ps1
SCP files to VM → labvm-deploy-swamp-www.sh
nginx config + docroot copy → poliprism-rebuild-docroot.sh
migrations + pipeline --build → Live Site
nginx reload
Deploy Steps (Detail)
- Git analysis — publish script checks which poliprism/ files changed (selective vs full SCP)
- SCP to VM —
infra/,www/,poliprism/staged to~/SwampTechnology/ - Nginx config — copy
*.confto/etc/nginx/conf.d/, syntax test - Docroot copy —
poliprism/→/usr/share/nginx/html/poliprism/ - API restart —
pip install+systemctl restart poliprism-api - Migrations — all SQL migrations (idempotent, safe to re-run)
- Pipeline --build — CSS + hubs + state shells + search JSON + sitemap
- Ownership —
chown nginx:nginx+ SELinuxrestorecon - Nginx reload — if configs changed
Cron Jobs
| Job | Schedule | What It Runs |
|---|---|---|
poliprism-daily-bills.sh | Daily ~02:15 UTC | Federal bills ingest → Ollama scoring → CSS + search rebuild |
poliprism-sync-federal-enrichment.sh | Weekly | Committees + sponsored bills from Congress.gov |
poliprism-sync-openstates-bills.sh | On demand | State bills from Open States API |