A structured diagnostic bibliography.

This reading list consolidates the foundational research, legal precedents, and technical methodologies necessary for rigorous institutional defense. It is designed as an independent reference document for researchers, civic practitioners, and legal analysts working to map structural exclusion at the community scale.

Curated by Kevin Matthews at Matthews Geographics, LLC.

Module 1: Architect and Builder

  • The essential theoretical anchor. Benjamin demonstrates how machine learning algorithms, computational models, and digital infrastructure effortlessly encode historic racial biases into seemingly "neutral" mathematical outputs. As we begin generating computational redistricting ensembles in later modules, this text reminds the builder that their algorithms can easily launder historical suppression if not carefully parameterized.
  • A critical analysis of opaque modeling. O'Neil defines a "Weapon of Math Destruction" (WMD) as a model that is widespread, highly damaging, and completely opaque to the people it affects. Many of the partisan mapping algorithms historically used by state legislatures fit this definition perfectly. The goal of this course is to teach you how to build "anti-WMDs"—models that are transparent, interpretable, and used defensively.
  • Crayton's work bridges the gap between the law, the data, and the community. By examining how quantitative expert testimony is actively prepared for trial, these readings highlight the tension between what the mathematics say and what community sovereignty demands. The builder must learn to align complex statistical findings with the lived, qualitative realities of the people on the map. [Community sovereignty lens]
  • GIS Certification Institute, GISCI Rules of Conduct / Code of Ethics
    The literal professional standards. Before running data analysis for litigation or policy work, you must adhere to a strict ethical framework. This document outlines the obligations of the geospatial analyst specifically regarding data integrity, the transparent reporting of statistical error, and the duty to society over the duty to an employer.

Module 2: Scale 1 - Census Infrastructure

  • Brennan Center for Justice / Michael Li, The Redistricting Data (PL 94-171) Release: What to Know
    A high-level primer on the single most important dataset in the American political system. It explains what specific variables the Census is legally required to deliver, the timeline for delivery, and how state legislatures instantly load this data into GIS software to begin drawing maps.
  • The official explanation directly from the source. The Bureau explains that due to modern algorithmic matching, traditional data anonymization no longer works. To protect privacy, they inject mathematical noise at the granular block level (e.g., reporting 5 people in a block that actually has 3). This concept is crucial for analysts defending data in court.
  • Kenny, Kuriwaki, McCartan, Rosenman, Simko, and Stewart, The Use of Differential Privacy for Census 2020 and its Impact on Redistricting (Science Advances, 2021)
    A highly technical, essential paper. The authors tested how the injected "noise" ripples upward geographically. Because states are legally required to draw congressional districts with *exact* population equality (zero variance), attempting to balance districts using artificially "noisy" census blocks creates massive, unresolvable headaches. The authors evaluate whether this noise actually impacts the partisan outcomes of the generated maps. [Scale lens]
  • The literal codebook. As an analyst, you are going to be interacting directly with the raw tabular files (e.g., Table P1: Race, Table P2: Hispanic or Latino, and not Hispanic or Latino by Race). Practitioners must hold this documentation to understand precisely how the variables are coded and how multi-race citizens are aggregated.

Module 3: Scale 1 - Data Degradation and Reconstruction

  • The essential primer. The Bureau explains why the ACS exists and its primary tradeoff: the 1-year estimates are highly up-to-date but statistically noisy (and only available for large populations), while the 5-year estimates are reliable but structurally lagging. Analysts must justify which tier they select based on the granular needs of their specific litigation.
  • Brian Amos, Michael P. McDonald, and Michael Herron, Estimating the Effects of Redistricting (2017)
    A high-level overview of the spatial mismatch problem. When the state draws a new State Senate map that splinters an old voting precinct down the middle, how do you know which half of the precinct's population lives in the new district? This text defines the necessity of mathematical estimation in dynamic election administration.
  • Metric Geometry and Gerrymandering Group (MGGG Lab), Approaches to Precinct Data Disaggregation / Data Prorating
    A highly applied methodological walkthrough. MGGG explains the mathematical workflows required to disaggregate aggregated vote counts back down to the block level, and then re-aggregate them up into new hypothetical districts to test for the Zonation Effect. This is the cornerstone of modern computational redistricting workflows. [Scale lens]
  • QGIS / Esri Documentation, Areal Interpolation Techniques
    The literal software operations. When performing spatial reconstruction, the analyst must choose between simple areal weighting (assuming population is evenly spread across a zone) or dasymetric mapping (using satellite/housing data to mask out lakes and uninhabited areas). This documentation is critical for justifying the chosen GIS geoprocessing tools to a judge.

Module 4: Scale 2 - Geography of Participation

  • Brady and McNulty, Turning Out to Vote: The Costs of Finding and Getting to the Polling Place (American Political Science Review, 2011)
    The essential behavioral baseline. The authors prove that the act of voting involves measurable spatial costs. When Los Angeles dramatically consolidated polling places, the authors found a statistically significant drop-off in turnout specifically correlated to the increased distance voters had to travel. Every extra mile functionally reduces the likelihood of participation.
  • Haspel and Knotts, Location, Location, Location: Precinct Placement and the Costs of Voting (Journal of Politics, 2005)
    Building on the overarching "calculus of voting," this piece analyzes how precinct placement specifically interacts with socioeconomic status. The distance penalty does not apply equally; a two-mile polling place move is structurally irrelevant to an affluent voter with a car, but devastating to a low-income voter relying on irregular public transit.
  • The transition to applied analytics. Review case studies that explicitly draw "catchment zones" or "isochrones" (areas reachable within a 15-minute drive or transit ride) around polling locations. Analysts use this method to visually and statistically isolate "turnout deserts"—populated neighborhoods abandoned by polling closures. [Scale lens]
  • OpenStreetMap (OSRM) / Esri Network Analyst, Network Routing Algorithms and API Documentation
    The tools required for the job. You cannot use straight-line Euclidean distance ("as the crow flies") in court because courts know citizens travel on roads. You must specify the use of a network routing engine (like OSRM or ArcGIS Network Analyst) to calculate Manhattan distance or drive-time vectors to legally quantify the voting burden.

Module 5: Scale 2 - Measuring Suppression

  • U.S. Department of Justice (Civil Rights Division), Investigation into the 2016 Maricopa County Presidential Preference Election
    We return to the DOJ report, but this time exclusively evaluating the technical methodology section. Study the dataset the DOJ used: they did not just measure the number of closed locations. They explicitly calculated the ratio of registered voters per physical polling location in white-heavy jurisdictions versus Latino-heavy jurisdictions, establishing a mathematical baseline for disparate impact.
  • Stephen Pettigrew, The Racial Gap in Wait Times: Why Minority Voters Wait Longer (Political Science Quarterly, 2017)
    Bypassing the abstract entirely, focus on Pettigrew's technical appendix. This paper demonstrates the regression analysis required to control for variables like precinct density, ballot length, and income. It proves computationally that even when all other geographical realities are held equal, the racial makeup of a precinct statistically predicts wait time. [Scale lens]
  • A granular, precinct-level tracking model. Herron and Smith utilize timestamp data from electronic poll books to measure exactly when voters arrived and departed, analyzing the specific decay rate of the voting queue. This shows analysts how to exploit administrative metadata to prove suppression.
  • Charles Stewart III (Caltech/MIT Voting Technology Project), Managing Polling Place Resources
    The literal queueing theory manual. Stewart provides the mathematical formulas derived from operations research (similar to line management in supermarkets or server requests) used to allocate voting machines. If a county administrator deviates from these standard resource allocation algorithms in a minority precinct, it serves as direct mathematical evidence of intent to suppress.

Module 6: Scale 3 - Redistricting Fundamentals

  • Aaron Kaufman, Gary King, and Mina Komina, Measure for Measure: An Evaluation of Compactness (American Journal of Political Science, 2021)
    A critical theoretical foundation. The authors empirically test whether the mathematical measurements of "compactness" actually align with how humans visually perceive a "fair" shape. They find that standard mathematical scores can be highly misleading when applied to jagged coastlines or uneven river borders, warning analysts not to rely on blind geometry without human topological context. [Scale lens]
  • Nicholas Stephanopoulos, The Spaces of Gerrymandering (Texas Law Review, 2018)
    Stephanopoulos bridges the gap between geometry and political consequence. He demonstrates how mapping software is used to dynamically calculate compactness scores while simultaneously tracking the Efficiency Gap. This illustrates exactly how state legislatures draw maps: by tweaking Polsby-Popper scores just enough to pass legal muster while maximizing the wasted votes of the opposition.
  • MGGG Lab / Metric Geometry and Gerrymandering Group, Compactness Metrics Documentation (Polsby-Popper, Reock, Convex Hull formulas)
    The literal mathematical formulas. The Polsby-Popper score compares a district's area to the area of a circle with the same perimeter. The Reock score compares the district's area to the smallest bounding circle that encompasses it. The Convex Hull ratio compares the district to a rubber band stretched tightly around it. You must insert these exact formulas into your analytic tools.

Module 7: Scale 3 - Computational Redistricting

  • A high-level explanation of the baseline problem. Duchin argues that you cannot simply compare a map to strict proportionality (e.g., "50% vote should equal 50% seats") because physical geography limits what is possible. Instead, you must compare a map to the universe of possible maps for that specific state. Thus, the algorithm acts as the baseline for fairness. [Scale lens]
  • Gregory Herschlag, Jonathan Mattingly, et al., Quantifying Gerrymandering in North Carolina
    An accessible summary from the Duke mathematics team that pioneered much of the use of MCMC in state supreme courts. This introduces the concept of the "bell curve" of maps. They plot 24,000 random district configurations and place the enacted NC legislature's map entirely off the far edge of the curve, visually demonstrating the extreme statistical unlikelihood of the result.
  • The hard mechanics. Older MCMC methods swapped single precincts one at a time on the border of a district, which often led to wildly non-compact shapes. The ReCom method solves this by fusing two adjacent districts together, drawing a random spanning tree through the fused super-district, and cutting it back into two mathematically compact pieces. This is the industry standard for modern modeling.
  • GerryChain is the open-source Python ecosystem developed specifically for running ReCom ensembles. As a technical practitioner, you must review the documentation to understand the programmatic structure of a Markov Chain run: defining the initial partition (seed map), setting the constraints (population limits, VRA limits), and declaring the updaters (tracking partisan shifts).

Module 8: Scale 3 - Running Case Block I

  • A masterclass in presentation. Dr. Chen generated 1,000 completely random North Carolina maps using only the state's required non-partisan criteria (compactness, equal population, county preservation). He then plotted the Republican legislature's enacted map against the 1,000 random ones. The visual result showed that the enacted map was more hostile to Democratic voters than 99.9% of the random, neutrally drawn maps. Read to understand how algorithms answer legal questions of intent.
  • Analyzing how MCMC is uniquely applied to the scale of race. In Texas, the state claimed that it was impossible to draw a certain number of Hispanic-majority districts while remaining geographically compact. Dr. Duchin's ensemble proved the opposite: the algorithm easily generated thousands of maps that were both highly compact and proportional to the surging Hispanic population density, proving the state intentionally cracked the demographic. [Scale lens]
  • Various (MGGG / Princeton Gerrymandering Project), GitHub Repositories: Texas & North Carolina Ensembles
    The transparency requirement. In litigation, if you use an algorithm, you must provide your script and your seed files to the opposition so they can attempt to reproduce your test. Reviewing the open-source repositories from these trials shows exactly how the GerryChain parameters (discussed in Module 7) were hard-coded for the courtroom.

Module 9: Scale 4 - VRA Framework

  • U.S. Supreme Court, Thornburg v. Gingles (1986)
    The legal anchor of modern voting rights analysis. Familiarize yourself with the three non-negotiable preconditions the court established: 1) The minority group must be sufficiently large and geographically compact to constitute a majority in a single-member district. 2) The minority group must be politically cohesive. 3) The white majority must vote sufficiently as a bloc to normally defeat the minority's preferred candidate.
  • NAACP Legal Defense Fund (LDF), The Gingles Preconditions: A Practitioner's Guide
    An accessible translation of the court's demands. LDF breaks down exactly how civil rights advocates must gather mapping data to prove Gingles 1 (drawing a hypothetical "demonstrator map") and statistical data to prove Gingles 2 and 3 (analyzing past elections).
  • A highly specific methodological constraint. To prove "Gingles 1," you cannot just look at total population (as discussed in Course 2, Module 3). You must prove the minority class constitutes over 50% of the Citizen Voting Age Population (CVAP) in a theoretical district. Because citizenship is tracked in the ACS, not the Decennial Census, analysts must use interpolation methods to safely estimate CVAP margins of error. [Scale lens]
  • The foundational dataset required for any Section 2 lawsuit. Analysts must pull this specific dataset, which relies on 5-year American Community Survey estimates, to calculate whether a minority group meets the 50%+1 threshold required by Gingles 1. Holding this technical documentation is required to defend your data against opposing counsel.

Module 10: Scale 4 - Ecological Inference

  • We return to King's foundational text introduced in Course 2. Focus this time purely on the mathematics. King bounded the statistical possibilities (e.g., if a precinct is 80% Black and Candidate A won 90% of the vote, mathematically at least *some* Black voters had to vote for Candidate A). By cross-referencing these numerical boundaries across hundreds of precincts, the model computationally zeros in on true voting rates.
  • A study of the methodological evolution. Before King's EI models, courts accepted "Homogeneous Precinct Analysis"—literally only looking at precincts that were 90%+ of a single race and assuming the entire demographic voted that way. Today, courts require advanced EI cross-checking against Ecological Regression (ER).
  • An applied guide to taking EI into court. The authors show how to structure the analysis to ensure statistical significance. They highlight the scale problem: if you attempt to run EI on a local city council district where there are only 4 precincts, the algorithm will fail due to lack of geometric data points. The analyst must know the physical constraints of their models. [Scale lens]
  • Loren Collingwood / R Core Team, eiCompare R Package Documentation
    The literal software used to win cases. eiCompare is a standard open-source R package explicitly designed to seamlessly compare Ecological Inference estimates with Ecological Regression to prove RPV for court. Data scientists must familiarize themselves with this codebase to process their CVAP data efficiently.

Module 11: Scale 4 - Running Case Block II

  • U.S. Supreme Court, Allen v. Milligan (2023)
    Read the majority opinion authored by Chief Justice Roberts. Uniquely for a Supreme Court decision, the opinion spends immense time describing the specific mapping algorithms and data estimates used by the plaintiffs. Roberts forcefully defends the rigors of the Gingles test, validating the decades of statistical methodology we have studied in this course.
  • Dr. Baodong Liu, Expert Report of Baodong Liu in Milligan v. Merrill (District Court Filing)
    The Ecological Inference masterclass. Dr. Liu processed dozens of endogenous Alabama elections through EI algorithms to prove Racially Polarized Voting. The state attempted to claim that Black Alabamians were simply voting for Democrats, not "minority-preferred" candidates. Dr. Liu's statistical outputs rigorously destroyed the "party, not race" defense.
  • Dr. Moon Duchin, Expert Report of Moon Duchin in Milligan v. Merrill (District Court Filing)
    The Ensemble masterclass. Dr. Duchin demonstrated that if you simply program an MCMC computer to draw thousands of random Alabama maps that roughly mirror the racial demographics of the state, almost all of them produce at least two Black opportunity districts. Because the state's enacted map only produced one, it was mathematically revealed as an artificial intent to crack the Black Belt. [Scale lens]

Module 12: Synthesis

  • U.S. Supreme Court, Rucho v. Common Cause (2019)
    The ultimate limitation of method. In Rucho, the data scientists brought flawless MCMC ensemble modeling to the Supreme Court, proving conclusively that North Carolina's map was heavily gerrymandered for partisan (not racial) gain. The Court did not dispute the math. The Court simply ruled that partisan gerrymandering is a "political question" beyond the reach of the federal judiciary. The math was perfect, and it lost.
  • Catherine D'Ignazio and Lauren F. Klein, Data Feminism (Conclusion)
    We return to data feminism to close the series. The authors remind us that "data is a double-edged sword." The same mapping tools that state legislatures use to fracture communities (gerrymandering) are the tools we must use to reconstruct them. But the goal of democratic analysis is not just to run regressions—it is to hand those regressions back to the community so they can reclaim their sovereignty. [Community sovereignty lens]

Democratic Analysis Key Methods

What is the ReCom algorithm in computational redistricting?

ReCom is a Markov chain Monte Carlo algorithm that generates ensembles of legally valid redistricting plans by repeatedly merging adjacent districts and re-splitting them along spanning trees, producing a statistically representative sample of the universe of possible maps.

How does Gary King's Ecological Inference (EI) model work?

King's EI model uses Bayesian statistics to estimate racial group voting behavior from aggregate precinct data, producing posterior probability distributions while respecting ballot secrecy. It is the primary quantitative method for demonstrating racially polarized voting in federal court.

What is the PL 94-171 redistricting data file?

The Census Bureau's official redistricting dataset mandated by Public Law 94-171, providing population counts by race, ethnicity, and voting age at the census block level—the sole authoritative source for drawing district boundaries.

What is the Efficiency Gap metric for measuring gerrymandering?

The Efficiency Gap measures the difference between two parties' wasted votes divided by total votes. A large gap indicates systematic packing or cracking of one party's votes to reduce their electoral efficiency.