6 minute read

Before attempting a replication of Gerrymandering: Maps and political representation, a lab used in Human Geography with GIS, an introductory vector GIS course for undergraduates at Middlebury College and taught by Assistant Professor of Geography Niwaeli Kimambo in fall 2021, I spent some time thinking about threats to the validity of my replication as well as the original lab.

In the original lab, Kimambo outlines several assumptions made:

1. We are simplifying voting results as votes cast at a polling station for the Democratic (dem) or Republican (rep) candidate for president in 2016.

This presents several threats to validity. First, using votes from a presidential election as a proxy for votes cast in a congressional election can be misleading because some voters, especially in swing states like North Carolina, might vote for a candidate from one party in the presidential election and another in a congressional race. This may be particularly relevant for the 2016 presidential election, which was a contest between two extremely polarizing figures: Hillary Clinton (Dem) and Donald Trump (Rep). This strategy also ignores votes cast for independent or write in candidates in the presidential election, thereby not assigning them to a congressional vote party (which would admittedly, be quite difficult), as well as independent or write in candidates in congressional elections, who get no votes attributed to them because of the simplified two party model. However, for the purposes of teaching GIS to beginning researchers, this is a logical simplification. One way to investigate this threat to validity might be to compare the predicted party winner in each district with the party of the serving Congressmember. However, this will not be undertaken in this replication study.

2. We are assuming that voters are evenly distributed throughout voting precincts.

The original study uses area-weighted re-aggregation to assign votes cast in presidential voting precincts to congressional districts based on the percentage of precinct area contained by a given congressional district. This assumes that voters for both parties are evenly distributed throughout the precincts. However, there are multiple possible scenarios in which this would not be the case. For example, the distribution of votes for each party may be evenly distributed throughout the voting population in a given precinct, while the population itself may not be evenly distributed spatially. This could result in the assignment of votes cast in one district to a neighboring district in the same precinct, thereby inflating the vote count in one and decreasing it in another. Depending on the makeup of the voters, this could change the make-up of the voter base in effected districts. Similarly, the opposite scenario could be true: voters might be evenly distributed spatially in the precinct, but their votes aren’t. For example, there might be a precinct with a very democratic area and a very republican area. The methodology used in this paper would blend these two areas together and proportionally assign partisan voters to congressional district, which also runs the risk of changing the voter composition in effected districts. And the two are not mutually exclusive. Voters may be unevenly distributed both spatially and politically. In fact, this might be the most likely scenario, as democratic voters tend to be clustered in cities and republican voters in more rural areas. Area weighted reaggregation already aims to address the modifiable areal unit problem, where the scale of analysis significantly influences the results. However, it is an imperfect method. One way to improve this analysis further might be to create a population gradient based on census block or block group population data such that a population based weighted re-aggregation might be implemented. However, this improvement would only be able to address the uneven spatial distribution of voters and not the uneven political distribution. Even if this population gradient was not used to make an alternate weighted re-aggregation, it might be useful in at least identifying congressional districts that may have been effected by this threat to validity. However, this methodological change will not be implemented in this reproduction study.

Additionally, there are several more threats to validity not discussed in the original lab:

3. Using a compactness score as a proxy for electoral fairness may be impacted by boundary effects.

Compactness scores in general compare the perimeter of a given polygon to its area. The formula used in the original study essentially compares the district to how circular it is, such that a very circular district would be considered the most fair. Identifying gerrymandering is, as exemplified by numerous courtcases and an ongoing discourse about it, an inherently difficult thing to define. So, the measure of compactness begs the question, what is the geometry of the ideal district? Whatever the answer to that question, measures of compactness are designed to flag districts with long, snaky, or complex borders. However, this means they may also flag otherwise ‘compact’ districts that have complex borders due to geographic features such as coastlines, islands, or rivers. In this case, the complexity of the border is not indicative of gerrymandering per se, but merely the physical reality of the landscape. Some congressional maps may be drawn with borders that extend into the sea, for example, and are thus more simple than if they hugged the coastline, but this varies by map and is not always possible. Districts flagged for being potentially gerrymandered on the basis of compactness score could be inspected by hand to determine what if any portion of the non-compactness may be attributable to an acceptable level of complexity due to physical geography. This is not a part of the existing methodology and regardless, is a sub-optimal solution to this issue.

4. Theoretical grounds for a compactness score

Compactness scores in general pose a threat to the validity of gerrymandering research in that they are better at identifying packing as opposed to cracking. Packing occurs when a district is drawn such that a particular group of voters (be it based on race, partisanship, or another factor) is selected for inclusion in a district. This may contribute to the creation of a district that is not compact. Packing thus crams potential voting blocs into one district so that the selected group can exert influence over fewer districts. Conversely, cracking occurs when a district is drawn such that a geographically compact area or contiguous spatial grouping of a particular group of voters is divided into multiple districts. Cracked districts may or may not be compact, thus making them harder for a compactness score to identify. Cracking results in the dilution of the group’s voting power; where they may have been the majority group if placed in one district, they are now the minority group in several different districts. Just like packing, cracking reduces the number of seats a particular voting group is likely to win; each type of gerrymandering just uses a different strategy.

5. We are assumming that a more circular district would be more fair

By using compactness to identify potential gerrymanders, we run the risk of false positives: districts idendified as having a compactness score indicative of gerrymandering, when their may not be an observable difference in voter makeup whether the district is more compact or not. This is one manifestation of an edge effect; in which the state of the region bordering the study region influences the results attributed to that reason. Joe Holler, Assistant Professor of Geography at Middlebury College, has suggested that the area weighted reaggregation of voters from the 2016 presidential election to the 2016 and 2019 congressional districts could be expanded so that in addition to calculating the likely voter base in each district, one would also calculate the likely voter base in the ‘minimum bounding circle’ of each district from each year. Comparing the predicted election results for congressional districts with the predicted election results of the corresponding minimum bounding circle would help identify the districts in which compactness is a driving factor of election results (and thus a stronger indicator of a potential gerrymander). This additional analysis could help identify both instances of packing and cracking, though it is outside the scope of this reproduction study.

Updated: