PoliPrism Architecture

Pipeline, schema, API, and deploy reference for the development team

Pipeline SOP: Pipeline Architecture SOP — Staging-first architecture, promote patterns, scoring, and admin UI design for all data sources.
1. Architecture Overview
External APIs
Congress.gov, Open States, FEC, USASpending
INGEST
13 scripts
Raw Tables
prism_bills, openstates_bills, raw_fec_*
PROCESS
8 scripts
Entity Tables + MVs
entities_*, mv_*
API
FastAPI on :8077
Browser
JS apps fetch /api/*
BUILD
4 scripts
Static Assets
CSS, hubs, search JSON, sitemap
2. Data Pipeline (Three Layers)

Master Runner

poliprism-pipeline.sh --ingest              # Run all ingest scripts
poliprism-pipeline.sh --process             # Run all scoring/processing
poliprism-pipeline.sh --build               # Generate static assets
poliprism-pipeline.sh --all                 # Ingest → Process → Build
poliprism-pipeline.sh --all --only federal-bills,score-federal-bills,css,search

Ingest Layer

External APIs/files → raw/staging database tables. Each script is idempotent and independently runnable.

ScriptSourceTarget Tables
federal-legislators.shunitedstates JSON + Congress.govpeople, people_offices
state-legislators.shOpen States API / CSVpeople, offices, people_offices
ny-legislators.shNY Open Legislation APIpeople, people_offices
federal-bills.shCongress.gov API v3prism_bills
federal-bill-texts.shCongress.gov (text fetch)raw_congress_bill_texts
federal-enrichment.shCongress.gov + unitedstates YAMLpeople (committees, sponsored bills)
state-bills.shOpen States API v3openstates_bills
state-bill-texts.shHTTP (bill text URLs)openstates_bills.bill_text
fec.shFEC API (4-phase)raw_fec_*, campaign_contributions
usaspending.shUSASpending API v2raw_usaspending_by_district
state-assets.shWikimedia Commonsstate_reference + images
profile-photos.shHTTP (photo URLs)people.image_path + images
entities.shDB transform (runs last)entities_legislators, entities_bills, committees, sponsorships

Process Layer

Scoring, classification, and view refresh. Runs after ingest; reads from entity tables, writes scores and refreshes materialized views.

ScriptPurpose
promote-bill-texts.shPromote raw bill texts → canonical bill_texts + enqueue scoring
score-federal-bills.shOllama LLM scoring of federal bills (Prism score + taxonomy)
score-state-bills.shOllama LLM scoring of state bills
score-legislators.sh4 legislator scores: Prism, effectiveness, bipartisanship, donor alignment
classify-organizations.shOllama industry classification for PACs/organizations
entity-resolver.shFuzzy-link entities to FEC/OpenStates IDs (pg_trgm)
standardize-profiles.shBackfill profile_page, external_ids, contact_json
refresh-views.shREFRESH MATERIALIZED VIEW CONCURRENTLY (always last)

Build Layer

Generates non-data static artifacts only. All data pages are served by the API at runtime.

ScriptOutput
css.shassets/css/poliprism.css (bundled from styles.css)
hubs.shHome, /federal/, /states/ grid, /territories/, 404/50x + state legislature shells
search.shlegislators-search.json + bills-search.json (Fuse.js client-side search)
sitemap.shsitemap.xml + robots.txt
3. Database Schema

Data Flow

Raw Tables Entity Tables Relationship Tables Materialized Views API / Browser

Raw / Ingestion Tables

TableSourcePurpose
prism_billsCongress.govFederal bills with Prism scores, sponsors, text
openstates_billsOpen States APIState bills with classifications, sponsors, text
raw_congress_bill_textsCongress.govBill text versions (XML/HTML/PDF → plain text)
raw_fec_candidatesFEC APIFEC candidate master records
raw_fec_committeesFEC APIFEC committee records (PACs, party committees)
raw_fec_contributionsFEC APICampaign contributions (partitioned by election cycle)
raw_usaspending_by_districtUSASpending APIFederal spending by congressional district
peopleMultiple sourcesBase person records (name, bio, photo, terms)
officesSeed scriptElected offices (federal + state + county)
people_officesSync scriptsPerson ↔ office assignments (is_current flag)

Canonical Entity Tables

TablePurpose
entities_legislatorsOne row per legislator + role. All four scoring columns. Links to people via slug.
entities_billsUnified bill record (federal + state + NY). Prism score, taxonomy, status.
entities_committeesCongressional and state committees.
entities_organizationsPACs, party committees, industry-classified organizations.
bill_taxonomyLLM-scored taxonomy: primary/secondary categories, policy actions, flags.
bill_textsCanonical bill text versions (promoted from raw tables).

Relationship Tables

TableLinksPurpose
bill_sponsorshipslegislator ↔ billSponsor / cosponsor relationships
committee_membershipslegislator ↔ committeeCommittee assignments and roles
campaign_contributionslegislator ↔ organizationFEC contribution records (PAC + individual)
bill_voteslegislator ↔ billRoll call vote positions (yes/no/present/absent)
usaspending_district_summarylegislator ↔ spendingFederal contracts/grants per district

Materialized Views (Single Source of Truth)

Both the API and build scripts read from these pre-computed views. Refreshed by refresh_views.py after any Process step.

ViewRowsPurpose
mv_legislator_profiles~8,100+Full legislator cards: scores, office, finance, sponsored bills, votes
mv_bills_list~400+Bills with sponsor info, taxonomy, text availability, CRS subjects
mv_bill_detail~400+Individual bill pages with vote counts, cosponsor counts, text format
mv_finance_ideology_gap~1,500+Donor industry vs legislator ideology alignment analysis
4. API Endpoints

All endpoints served by FastAPI at /api/. Nginx reverse-proxies to 127.0.0.1:8077.

Federal Members

EndpointParamsData Source
GET /api/memberschamber, state, party, q, page, per_pagemv_legislator_profiles (level=federal)
GET /api/members/{bioguide_id}mv_legislator_profiles + people (bio)
GET /api/chambers/statsmv_legislator_profiles (level=federal)

All Legislators (Federal + State)

EndpointParamsData Source
GET /api/legislatorslevel, chamber, state, party, q, score_min/max, sort, pagemv_legislator_profiles
GET /api/legislators/{slug}mv_legislator_profiles + people
GET /api/legislators/statsmv_legislator_profiles
GET /api/legislators/facets(same filters as list)mv_legislator_profiles

State Members

EndpointParamsData Source
GET /api/statesmv_legislator_profiles (level=state)
GET /api/states/{abbr}/statsmv_legislator_profiles (state filter)
GET /api/state-membersstate (required), chamber, party, q, sort, pagemv_legislator_profiles (level=state)
GET /api/state-members/{slug}mv_legislator_profiles + people

Bills (Federal)

EndpointParamsData Source
GET /api/billscongress, q, bill_q, bill_type, prism_category, crs_area, rating, sort, pagemv_bills_list (level=federal)
GET /api/bills/{bill_id}include_textmv_bill_detail
GET /api/bills/statsmv_bills_list
GET /api/bills/congressesmv_bills_list
GET /api/bills/facets(same filters as list)mv_bills_list

State Bills

EndpointParamsData Source
GET /api/state-billsstate_abbr (required), session, q, bill_q, sortopenstates_bills
GET /api/state-bills/statesopenstates_bills
GET /api/state-bills/statsstate_abbropenstates_bills

Committees

EndpointParamsData Source
GET /api/committeeschamber, q, pageentities_committees
GET /api/committees/{id}entities_committees + memberships + bills
5. Site Pages & JS Apps

How Pages Work

Every data page is an API-driven shell: a static HTML file loads a JS app, which calls /api/ endpoints and renders the data client-side. No data is baked into the HTML at build time.

HTML Shell JS App (fetch) /api/* endpoint Materialized View Rendered Page

Page Inventory

URL PathJS AppAPI EndpointPurpose
/legislators/legislators-app.js/api/legislatorsBrowse all legislators (federal + state), filter, sort, search
/legislators/{slug}/legislator-detail.js/api/legislators/{slug}Individual legislator profile with scores, committees, bills
/bills/bills-app.js/api/billsBrowse federal bills, filter by taxonomy/score/congress
/federal/senate/federal-chamber.js/api/members + /api/chambers/statsU.S. Senate members grid with party stats
/federal/house/federal-chamber.js/api/members + /api/chambers/statsU.S. House members grid with party stats
/states/{slug}/state-app.js/api/state-members + /api/states/{abbr}/statsState legislature hub with all legislators
/states/{slug}/senate/state-app.js/api/state-membersState senate members
/states/{slug}/house/state-app.js/api/state-membersState house members

Static Pages (not API-driven)

URL PathGenerated ByPurpose
/build_poliprism_hubs.pyHome page with navigation cards
/states/build_poliprism_hubs.pyStates grid with flags
/federal/build_poliprism_hubs.pyFederal hub (links to Senate/House)
/territories/build_poliprism_hubs.pyTerritories hub
/404.htmlbuild_poliprism_hubs.pyNot found page
6. File Taxonomy
poliprism/
  api/
    app.py                    # FastAPI application (all /api/* endpoints)
    requirements.txt          # Python dependencies
  assets/
    css/
      poliprism.css           # Bundled CSS (generated by build_poliprism_css.py)
    js/
      site.js                 # Global: search, nav, dark mode
      legislators-app.js      # /legislators/ page app
      legislator-detail.js    # /legislators/{slug}/ detail page
      bills-app.js            # /bills/ page app
      federal-chamber.js      # /federal/senate/ and /federal/house/
      state-app.js            # /states/{slug}/ pages
    data/
      legislators-search.json # Fuse.js search index (generated)
      bills-search.json       # Fuse.js search index (generated)
  scripts/
    # --- Ingest ---
    sync_federal_legislators.py    # unitedstates JSON → people
    sync_state_legislators.py      # Open States → people
    sync_prism_bills.py            # Congress.gov → prism_bills
    sync_openstates_bills.py       # Open States → openstates_bills
    sync_fec_data.py               # FEC API → raw_fec_*
    sync_usaspending.py            # USASpending → district spending
    sync_congress_bill_texts.py    # Congress.gov → raw bill texts
    backfill_entities.py           # Raw → entity tables
    populate_committees.py         # Build committee entities
    populate_bill_sponsorships.py  # Build sponsorship links
    # --- Process ---
    promote_bill_texts.py          # Raw texts → canonical bill_texts
    score_prism_bills.py           # Ollama LLM bill scoring
    score_legislators_prism.py     # Legislator Prism score (bill avg)
    score_legislators_effectiveness.py  # Bills passed ratio
    score_legislators_bipartisan.py     # Cross-party cosponsor ratio
    score_legislators_donor_alignment.py # PAC $ vs bill taxonomy match
    classify_organizations_industry.py   # Ollama org industry classification
    refresh_views.py               # REFRESH MATERIALIZED VIEW
    # --- Build ---
    build_poliprism_css.py         # styles.css → poliprism.css
    build_poliprism_hubs.py        # Hub pages, territories, 404/50x
    build_search_index.py          # Legislator search JSON
    build_bills_page.py            # Bill search JSON
    build_sitemap_robots.py        # sitemap.xml + robots.txt
    generate_state_shells.py       # State page HTML shells
    # --- Shared ---
    html_layout.py                 # Render page chrome (header/footer)
    site_config.py                 # Branding, URLs, constants
    congress_gov_client.py         # Congress.gov API helpers
    openstates_v3_client.py        # Open States API client
    member_display.py              # Member rendering helpers
    federal_chamber_data.py        # Federal chamber helpers

infra/deploy/
  poliprism-pipeline.sh      # Master runner (--ingest/--process/--build/--all)
  ingest/               # 13 ingestion wrapper scripts
  process/              # 8 processing wrapper scripts
  build/                # 4 build wrapper scripts
  poliprism-daily-bills.sh   # Cron: daily federal bills (thin pipeline wrapper)
  poliprism-rebuild-docroot.sh # Full rebuild (migrations + pipeline --build)

deploy/
  publish-prism-updates-to-server.ps1      # Main deploy: Windows → VM (smart SCP + rebuild)
  publish-poliprism-minimal.ps1   # Minimal deploy: bills/API files only
7. Deploy Flow & Cron

Deploy Flow

Developer
Windows repo
publish-prism-updates-to-server.ps1
SCP files to VM
labvm-deploy-swamp-www.sh
nginx config + docroot copy
poliprism-rebuild-docroot.sh
migrations + pipeline --build
Live Site
nginx reload

Deploy Steps (Detail)

  1. Git analysis — publish script checks which poliprism/ files changed (selective vs full SCP)
  2. SCP to VMinfra/, www/, poliprism/ staged to ~/SwampTechnology/
  3. Nginx config — copy *.conf to /etc/nginx/conf.d/, syntax test
  4. Docroot copypoliprism//usr/share/nginx/html/poliprism/
  5. API restartpip install + systemctl restart poliprism-api
  6. Migrations — all SQL migrations (idempotent, safe to re-run)
  7. Pipeline --build — CSS + hubs + state shells + search JSON + sitemap
  8. Ownershipchown nginx:nginx + SELinux restorecon
  9. Nginx reload — if configs changed

Cron Jobs

JobScheduleWhat It Runs
poliprism-daily-bills.shDaily ~02:15 UTCFederal bills ingest → Ollama scoring → CSS + search rebuild
poliprism-sync-federal-enrichment.shWeeklyCommittees + sponsored bills from Congress.gov
poliprism-sync-openstates-bills.shOn demandState bills from Open States API