Mini-Project #03: Do Proportional Electoral College Allocations Yield a More Representative Presidency?

Introduction

In this project, I will be writing a political fact-check and investigate the claim that the US Electoral College systematically biases election results away from the popular vote.

Before we begin, we must download the code here. This code shows us how many votes each candidate got througout the years, for both the Presidential Elections and the House of Representatives Elections.

Click here to see how the data was downloaded into R
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)
library(readr)

X1976_2022_house <- read_csv("1976-2022-house.csv")
Rows: 32452 Columns: 20
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): state, state_po, office, stage, candidate, party, mode
dbl (8): year, state_fips, state_cen, state_ic, district, candidatevotes, to...
lgl (5): runoff, special, writein, unofficial, fusion_ticket

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
X1976_2020_president <- read_csv("1976-2020-president.csv")
Rows: 4287 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): state, state_po, office, candidate, party_detailed, party_simplified
dbl (7): year, state_fips, state_cen, state_ic, candidatevotes, totalvotes, ...
lgl (2): writein, notes

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Now, I must download all the shape files into R.

Click here to see how the data was downloaded into R
download.file("https://cdmaps.polisci.ucla.edu/shp/districts097.zip", 
              destfile = "districts097.zip" , mode='wb')
unzip("districts097.zip", exdir = ".")
file.remove("districts097.zip")
district097 <- st_read("districtShapes/districts097.shp")

download.file("https://cdmaps.polisci.ucla.edu/shp/districts098.zip", 
              destfile = "districts098.zip" , mode='wb')
unzip("districts098.zip", exdir = ".")
file.remove("districts098.zip")
district098 <- st_read("districtShapes/districts098.shp")


download.file("https://cdmaps.polisci.ucla.edu/shp/districts101.zip", 
              destfile = "districts101.zip" , mode='wb')
unzip("districts101.zip", exdir = ".")
file.remove("districts101.zip")
district101 <- st_read("districtShapes/districts101.shp")


download.file("https://cdmaps.polisci.ucla.edu/shp/districts102.zip", 
              destfile = "districts102.zip" , mode='wb')
unzip("districts102.zip", exdir = ".")
file.remove("districts102.zip")
district102 <- st_read("districtShapes/districts102.shp")

download.file("https://cdmaps.polisci.ucla.edu/shp/districts103.zip", 
              destfile = "districts103.zip" , mode='wb')
unzip("districts103.zip", exdir = ".")
file.remove("districts103.zip")
district103 <- st_read("districtShapes/districts103.shp")

download.file("https://cdmaps.polisci.ucla.edu/shp/districts106.zip", 
              destfile = "districts106.zip" , mode='wb')
unzip("districts106.zip", exdir = ".")
file.remove("districts106.zip")
district106 <- st_read("districtShapes/districts106.shp")


download.file("https://cdmaps.polisci.ucla.edu/shp/districts108.zip", 
              destfile = "districts108.zip" , mode='wb')
unzip("districts108.zip", exdir = ".")
file.remove("districts108.zip")
district108 <- st_read("districtShapes/districts108.shp")



download.file("https://cdmaps.polisci.ucla.edu/shp/districts111.zip", 
              destfile = "districts111.zip" , mode='wb')
unzip("districts111.zip", exdir = ".")
file.remove("districts111.zip")
district111 <- st_read("districtShapes/districts111.shp")


download.file("https://cdmaps.polisci.ucla.edu/shp/districts112.zip", 
              destfile = "districts112.zip" , mode='wb')
unzip("districts112.zip", exdir = ".")
file.remove("districts112.zip")
district112 <- st_read("districtShapes/districts112.shp")



download.file("https://cdmaps.polisci.ucla.edu/shp/districts106.zip", 
              destfile = "districts106.zip" , mode='wb')
unzip("districts106.zip", exdir = ".")
file.remove("districts106.zip")
district106 <- st_read("districtShapes/districts106.shp")


download.file("https://www2.census.gov/geo/tiger/TIGER2020/CD/tl_2020_us_cd116.zip", 
              destfile = "districts116.zip" , mode='wb')
unzip("districts116.zip", exdir = ".")
file.remove("districts116.zip")
district116 <- st_read("tl_2020_us_cd116.shp")


download.file("https://www2.census.gov/geo/tiger/TIGER2016/CD/tl_2016_us_cd115.zip", 
              destfile = "districts115.zip" , mode='wb')
unzip("districts115.zip", exdir = ".")
file.remove("districts115.zip")
district115 <- st_read("tl_2016_us_cd115.shp")

Exploration of Vote Count Data

Which states have gained and lost the most seats in the US House of Representatives between 1976 and 2022?

Firstly, I want to find out which states have gained and lost the most seats in the US House of Representatives between 1976 and 2022. To do this, I need to filter the House Seats data set to only show 1976 and 2022, then group it by year and state. Once I do this, I can find out how many distinct districts are in each state. For the purpose of this project, I will asign one seat per district in each state. Then, I can use these numbers to find the difference between the two years. Since many states actually had no change in seat numbers, I decided to exlude them from the data set and only show the states we are interested in. Thus, below is a bar graph that depicts either the gain or loss of House Seats from 1976 to 2022, per state.

Click here to see how the data was manipulated
house_seats <- X1976_2022_house |>
  filter(year %in% c(1976, 2022)) |>
  group_by(year, state) |> 
  summarize(seats=n_distinct(district))
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
house_seats_spread <- house_seats |>
  pivot_wider(names_from = year, values_from = seats) |>
  mutate(difference = (`2022` - `1976`)) |>
  filter(difference != 0)


diff_seats <- house_seats_spread |>
  select(state, difference)

ggplot(diff_seats, aes(x = state, y = difference, fill = difference > 0)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  scale_fill_manual(values = c("tomato", "springgreen2")) +
  coord_flip() +
  labs(title = "Each State's Gain/Loss of Seats in the US House of Representatives from 1976 - 2022", x = "State", y = "Difference of Seats") +
  theme_minimal() +
  theme(plot.title = element_text(size = 10),
        axis.text.y = element_text(size = 8),
        plot.margin = margin(3, 3, 3))

Difference_in_seats <- ggplot(diff_seats, aes(x = state, y = difference, fill = difference > 0)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  scale_fill_manual(values = c("tomato", "springgreen2")) +
  coord_flip() +
  labs(title = "Each State's Gain/Loss of Seats in the US House of Representatives from 1976 - 2022", x = "State", y = "Difference of Seats") +
  theme_minimal() +
  theme(plot.title = element_text(size = 10),
        axis.text.y = element_text(size = 8),
        plot.margin = margin(3, 3, 3))
print(Difference_in_seats)

Looking at this graph, we can see that Texas has gained the most seats and New York has lost the most seats since 1976. Notably, Pennsylvania, Michigan, Ohio, and Illinois all also lost over 5 seats, while California, Florida, and Arizona gained 5 or more seats. It is also interesting to see that more states lost seats than states that gained seats, with 19 states losing seats and 15 states gaining seats.

Are there any elections in our data where the election would have had a different outcome if the “fusion” system was not used and candidates only received the votes their received from their “major party line” (Democrat or Republican) and not their total number of votes across all lines?

Before computing this, we should understand what “The Fusion System” is. This system means that one candidate is allowed to appear on multiple “lines” on a ballot and their total votes are counted, rather than just appearing once. This gives this candidate an advantage, as they are practically running against themselves in other parties, but their total votes is counted against other candidates.

Now, I want to know if there are any elections in our data set that would have had a different outcome had the “fusion system” not been used. To do this, I am going to evaluate our data set at two different points: the first being with each candidates total votes, and the second using only “major party lines,” thus “DEMOCRAT” or “REPUBLICAN”. Then, I will find the winner of each side, and see if they line up. If the do line up, then the candidate who won was the true winner of the election. However, if they do not, then the “fusion voting” is what helped this person win over their competitor. This can be seen in my code below.

Click here to see how the data was manipulated
fusion_voting <- X1976_2022_house |>
  group_by(year, state, district, candidate) |>
  summarize(total_votes = sum(candidatevotes), .groups = 'drop') |>
  group_by(year, state, district) |>
  filter(total_votes == max(total_votes)) |>
  ungroup()

colnames(fusion_voting)[colnames(fusion_voting) == "total_votes"] <- "total_fusion_votes"


no_fusion_voting <- X1976_2022_house |>
  filter(party %in% c("REPUBLICAN", "DEMOCRAT")) |>
  group_by(year, state, district, candidate) |>
  summarize(total_votes = sum(candidatevotes), .groups = 'drop') |>
  group_by(year, state, district) |>
  filter(total_votes == max(total_votes)) |>
  ungroup()

house_voting <- left_join(fusion_voting, no_fusion_voting, by = (c("state","district")))
Warning in left_join(fusion_voting, no_fusion_voting, by = (c("state", "district"))): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 1 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
house_voting <- fusion_voting |>
  left_join(no_fusion_voting, by = c("year", "state", "district", "candidate")) |>
  mutate(difference = (`total_fusion_votes` - `total_votes`)) |>
  filter(difference != 0) |>
  arrange(desc(difference))


instances <- nrow(house_voting)
print(instances)
[1] 601

Before showing the table, I want to note that there were 601 instances where this occurred! That is a lot of times that the fusion vote aided in a candidate’s win!

Below is a table that shows each of these instances.

library(DT)
datatable(house_voting) 

In this table, you can see the the difference in votes span from 61,938 to 559 votes over the 601 instances.

Chloropleth Visualization

Now, I will make a Chloropleth Visualization of the 2000 Presidential Election Electoral College Results.

Click here to see how the map was made
install.packages('plyr', repos = "http://cran.us.r-project.org")
Installing package into 'C:/Users/laure/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
package 'plyr' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\laure\AppData\Local\Temp\RtmpYlhbID\downloaded_packages
options(repos = list(CRAN="http://cran.rstudio.com/"))

library(utils)
install.packages("sf")
Installing package into 'C:/Users/laure/AppData/Local/R/win-library/4.4'
(as 'lib' is unspecified)
package 'sf' successfully unpacked and MD5 sums checked

The downloaded binary packages are in
    C:\Users\laure\AppData\Local\Temp\RtmpYlhbID\downloaded_packages
library(sf)
Warning: package 'sf' was built under R version 4.4.2
Linking to GEOS 3.12.2, GDAL 3.9.3, PROJ 9.4.1; sf_use_s2() is TRUE
library(sf)
s.sf <- st_read("districtShapes/districts106.shp")
Reading layer `districts106' from data source 
  `C:\Users\laure\OneDrive\Documents\STA9750-2024-FALL\districtShapes\districts106.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 436 features and 15 fields (with 1 geometry empty)
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1473 ymin: 18.9177 xmax: 179.7785 ymax: 71.35256
Geodetic CRS:  GRS 1980(IUGG, 1980)
head(s.sf, n=4)
Simple feature collection with 4 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -118.5992 ymin: 26.94532 xmax: -82.05435 ymax: 34.33793
Geodetic CRS:  GRS 1980(IUGG, 1980)
   STATENAME           ID DISTRICT STARTCONG ENDCONG DISTRICTSI COUNTY PAGE
1 California 006103107026       26       103     107       <NA>   <NA> <NA>
2 California 006103107029       29       103     107       <NA>   <NA> <NA>
3 California 006103107030       30       103     107       <NA>   <NA> <NA>
4    Florida 012105107013       13       105     107       <NA>   <NA> <NA>
   LAW NOTE BESTDEC RNOTE FROMCOUNTY                 LASTCHANGE
1 <NA> <NA>    <NA>  <NA>          F 2016-05-29 16:44:10.857626
2 <NA> <NA>    <NA>  <NA>          F 2016-05-29 16:44:10.857626
3 <NA> <NA>    <NA>  <NA>          F 2016-05-29 16:44:10.857626
4 <NA> <NA>    <NA>  <NA>          F 2016-05-29 16:44:10.857626
                   FINALNOTE                       geometry
1 {"From US Census website"} MULTIPOLYGON (((-118.5075 3...
2 {"From US Census website"} MULTIPOLYGON (((-118.354 34...
3 {"From US Census website"} MULTIPOLYGON (((-118.184 34...
4 {"From US Census website"} MULTIPOLYGON (((-82.42332 2...
president_2000 <- X1976_2020_president |>
  filter(year == '2000',
         candidate %in% c("BUSH, GEORGE W.", "GORE, AL")) |>
  group_by(state) |>
  summarize(
    highest_votes = max(candidatevotes),
    party = party_simplified[which.max(candidatevotes)]
  )


Shape_2000 <- s.sf |>
  mutate(STATENAME = toupper(trimws(STATENAME))) |>
  left_join(president_2000, join_by(STATENAME == state)) 
  

USA2000Map <- ggplot(Shape_2000, aes(geometry = geometry, fill = party),
       color = "black") +
  geom_sf() + 
  scale_fill_manual(values = c("REPUBLICAN" = "firebrick2", "DEMOCRAT" = "royalblue2")) +
  theme_minimal() +
  coord_sf(xlim = c(-180, -50), ylim = c(10,80), expand = FALSE) +
  labs(title = "2000 Presidential Election Electoral College Results", fill = "Winning Party") 
print(USA2000Map)

Advanced Chloropleth Visualization of Electoral College Results

Below is a faceted map of the presidential party winners per state from 1976 to 2020.

Click here to see how the map was made
election_results <- X1976_2020_president |>
  filter(party_simplified %in% c("DEMOCRAT", "REPUBLICAN")) |>
  group_by(year, state) |>
  summarize(
    highest_votes = max(candidatevotes),
    party = party_simplified[which.max(candidatevotes)]
  )
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
state_2020 <- s.sf |>
  mutate(NAME = toupper(STATENAME))

election_outcomes <- left_join(state_2020, election_results, join_by("NAME" == "state"))
Warning in sf_column %in% names(g): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 5 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
election_outcomes |>
  st_shift_longitude() |>
  ggplot(aes(geometry = geometry,
             fill = party)) +
  geom_sf() +
  coord_sf(xlim = c(170, 300)) +
  scale_fill_manual(name = "Party", values = c("DEMOCRAT" = "royalblue2", "REPUBLICAN" = "firebrick2")) +
  theme_bw() +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.title.y=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank()) +
  labs(title = "Presidential Election Outcomes, 1976-2020") +
  facet_wrap(~year)

Comparing the Effects of the ECV Allocation Rules

Though things have changed over time due to amendment, statue, and technology, the basic outline of allocation has remained the same:

Notably, the Constitution sets no rules on how the electoral college votes for a particular state is allocated. In the past, states have:

To complete this fact check, we will compare the effects of all of the Electoral College Votes allocation rules. For this, we will assume that the District of Columbia has 3 Electoral College Votes.

First, let’s see who actually won each election throughout the years and how many electoral college votes they received to get this win.

Click here to see how the code was done
actual_results <- data.frame(
  year = c(1976, 1980, 1984, 1988, 1992, 1996, 2000, 2004, 2008, 2012, 2016, 2020),
  president = c("CARTER, JIMMY", "REAGAN, RONALD", "REAGAN, RONALD", "BUSH, GEORGE H.W.",
                "CLINTON, BILL", "CLINTON, BILL", "BUSH, GEORGE W.", "BUSH, GEORGE W.",
                "OBAMA, BARACK H.", "OBAMA, BARACK H.", "TRUMP, DONALD J.", "BIDEN, JOSEPH R. JR"),
  party = c("DEMOCRAT", "REPUBLICAN", "REPUBLICAN", "REPUBLICAN", "DEMOCRAT", "DEMOCRAT",
            "REPUBLICAN", "REPUBLICAN", "DEMOCRAT", "DEMOCRAT", "REPUBLICAN", "DEMOCRAT"),
  Electoral_college_votes = c(297, 489, 525, 426, 370, 379, 271, 286, 365, 332, 304, 306))
datatable(actual_results)

Allocation of all ECVs to winner of state-wide popular vote

A state-wide approach would be that all electoral college votes of each state goes to whoever has the popular vote within that state.

Click here to see how the code was done
state_ecv_win <- X1976_2022_house |>
  group_by(year, state) |>
  summarize(housereps = n_distinct(district)) |>
  mutate(ecv = ifelse(state == "DISTRICT OF COLUMBIA", 3, housereps + 2)) |>
  select(year, state, ecv)
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
state_winner<- X1976_2020_president |>
    group_by(year, state, candidate) |>
    summarize(votes_total = sum(candidatevotes)) |>
    ungroup() |>
    group_by(year, state) |>
    slice_max(order_by = votes_total, n = 1, with_ties = FALSE) 
`summarise()` has grouped output by 'year', 'state'. You can override using the
`.groups` argument.
winner_takeall<- state_winner |>
  left_join(state_ecv_win, join_by(year, state))

country_winner <- winner_takeall |>
  group_by(year, candidate) |>
  summarize(ecv_total = sum(ecv)) |>
  slice_max(order_by = ecv_total, n = 1, with_ties = FALSE) |>
  ungroup()
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
datatable(country_winner)

Using this method, we can see that some elections would have turned out differently than what happened. For example, Gerald Ford would have won over Jimmy Carter in 1976, George H. W. Bush would have won over Bill Clinton in 1992, Robert Dole would have won over Bill Clinton in 1996, John McCain would have won over Barack Obama in 2008, and Mitt Romney would have won over Barack Obama in 2012. That is five elections over the past 46 years that would have been swayed due to this type of voting system.

I do not believe that this way of voting is fair, as it does not properly represent the voters of the minority party in each state. That is, if 51% of voters vote for Party A, while 49% vote for Party B, all of Party B’s voices will not be heard. That is a large majority of people who are being misheard. This also affects the election more, as the states who have less residents, but a good amount of electoral college votes can sway the election in their favor.

Thus, let’s go to the next voting option.

Allocation of all ECVs to winner of nation-wide popular vote 2

For the second option, this means that all Electoral College Votes are given to whoever wins the popular vote of the whole country.

Click here to see how the code was done
state_winner2 <- X1976_2020_president |>
  group_by(year, candidate) |>
  summarize(votes_total = sum(candidatevotes)) |>
  ungroup() |>
  group_by(year) |>
  slice_max(order_by = votes_total, n = 1, with_ties = FALSE) 
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
winner_country_takeall<- state_winner2 |>
  left_join(state_ecv_win, join_by(year))

country_winner2 <- winner_country_takeall |>
  group_by(year, candidate) |>
  summarize(ecv_total = sum(ecv)) |>
  slice_max(order_by = ecv_total, n = 1, with_ties = FALSE) |>
  ungroup()
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
datatable(country_winner2)

Using this method, there were only two instances where the winners here differ from the actually winners of each elections. The two scenarios are as follows: Al Gore would have won over George W. Bush in 2000 and Hillary Clinton would have won over Donald Trump in 2020.

Though this voting process produces similar outcomes to the actual election, I believe it is unfair due to the same reasoning as the first voting process. That is, again, if 51% of voters vote for Party A, while 49% vote for Party B, all of Party B’s voices will not be heard. That is a large majority of people who are being misheard.

Let’s go to the final method.

Allocation of R ECVs to popular vote winner by congressional district + allocation of remaining 2 ECVs to the state-wide popular vote winner

Click here to see how the code was done
district_winners <- X1976_2022_house |>
  group_by(year, state, district) |>
  slice_max(order_by = candidatevotes, n = 1, with_ties = FALSE) |>
  ungroup() |>
  group_by(year, state, candidate, party) |>
  summarize(ecv_district = n(), .groups = "drop")


state_winners <- X1976_2022_house |>
  group_by(year, state, candidate, party) |>
  summarize(total_votes = sum(candidatevotes), .groups = "drop") |>
  group_by(year, state) |>
  slice_max(order_by =  total_votes, n = 1, with_ties = FALSE) |>
  mutate(at_large = 2)


combined <- district_winners |>
  left_join(state_winners, by = c("year", "state", "candidate", "party")) |>
  mutate(at_large = replace_na(at_large, 0)) |>
  mutate(total_ecv = ecv_district + at_large) |>
  select(-total_votes, -candidate)



final_method_winners <- combined |>
  left_join(X1976_2020_president, by = c("year", "state", "party" = "party_simplified")) |>
  select(year, state, candidate, total_ecv) |>
  group_by(year, candidate) |>
  summarize(total_ecv = sum(total_ecv)) |>
  slice_max(order_by = total_ecv, n = 1, with_ties = FALSE) |>
  filter(!is.na(candidate)) |>
  ungroup()
Warning in left_join(combined, X1976_2020_president, by = c("year", "state", : Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 6271 of `x` matches multiple rows in `y`.
ℹ Row 2 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
`summarise()` has grouped output by 'year'. You can override using the
`.groups` argument.
datatable(final_method_winners)

Using this method, we can see six elections would have had different outcomes than what would have actually occurred. These elections are as follows: Jimmy Carter would have won over Ronald Reagan in 1980, Walter Mondale would have won over Ronald Reagan in 1984, Michael Dukakis would have won over George H. W. Bush in 1988, Robert Dole would have won over Bill Clinton in 1996, Mitt Romney would have won over Barrack Obama in 2012, and Donald Trump would have won over Joseph Biden in 2020.

I do not believe this is fair, as it still gives the majority voters a competitive advantage in the voting process. Not only are they getting the majority of the votes from being the popular vote, but they are also getting two electoral college votes on top of that. Thus, I do not believe this is fair.

So What is the Best?

I believe the most fair way to divide the Electoral College Votes up in each state is on a proportional scale. Thus, each party is being accurately represented and no one party’s voice is being heard more than the other.


Citations:
  • MIT Election Data and Science Lab, 2017, “U.S. House 1976–2022”, Harvard Dataverse, V13

  • MIT Election Data and Science Lab, 2017, “U.S. President 1976–2020”, Harvard Dataverse, V8