Ballot Box Show me the Ballot!

Methodology

Data Collection

Geographic Data Processing

  1. Geographic Boundary Files: Collected boundary files from the U.S. Census Bureau's TIGER/Line database (2024) for:
    • Counties
    • ZIP codes (ZCTA)
    • Congressional districts
  2. Area Intersection Analysis:
    • Used GIS tools to compute intersections between county, ZIP code, and congressional district boundaries
    • Calculated minimum width thresholds for valid areas
    • Generated unique identifiers for each intersection area
  3. Coordinate Generation:
    • Calculated centroid coordinates for each intersection area
    • Transformed coordinates from NAD83/Conus Albers (EPSG:5070) to WGS84 (EPSG:4326)

Ballot Data Collection

API Integration

Data Processing

Geographic Processing

  1. Area Classification:
    • Categorized areas by administrative boundaries
    • Generated district identifiers
  2. Visualization:
    • Generated interactive maps for split districts
    • Created area-specific visualizations using Folium
    • Implemented professional color schemes for district differentiation

Ballot Information Processing

  1. Content Structuring:
    • Parsed API responses into structured format
    • Organized races by jurisdiction level
    • Standardized office and candidate information
  2. Complexity Analysis:
    • Calculated ballot complexity scores based on:
      • Number of races and decisions
      • Information density
      • Language complexity (Flesch-Kincaid Grade Level)
      • Presence of non-partisan contests
      • Total options per decision

Technical Implementation

Development Stack

Source Code

The data processing pipeline is implemented in two main Jupyter notebooks:

Data sourced from Ballotpedia.