Final Project by Garrett Burgess

Problem 1

Create a tibble that shows how many respondents are in each wave of the survey.

# Load required packages
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
# Load ANES data
library(readr)
ANES <- read_csv("~/ANES.csv")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 68224 Columns: 1030
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr    (3): Version, VCF0900c, VCF0901b
dbl (1019): VCF0004, VCF0006, VCF0006a, VCF0009x, VCF0010x, VCF0011x, VCF000...
lgl    (8): VCF0391c, VCF0391d, VCF1030a, VCF1030b, VCF1031a, VCF1031b, VCF1...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(ANES)

# Group data by wave and count observations in each group
wave_counts <- ANES %>% 
  filter(!is.na(VCF0004)) %>% 
  group_by(VCF0004) %>% 
  summarize(count = n())

# Display results
wave_counts
# A tibble: 32 × 2
   VCF0004 count
     <dbl> <int>
 1    1948   662
 2    1952  1899
 3    1954  1139
 4    1956  1762
 5    1958  1450
 6    1960  1181
 7    1962  1297
 8    1964  1571
 9    1966  1291
10    1968  1557
# … with 22 more rows

We can now see the ouputs with two distinct columns: VCF0004 (the total wave number) and count (the number of actual respondents).

I decided to use kable function to clean it up.

# Load the knitr packages that hold the kable function 
library(knitr)

# Rename columns
colnames(wave_counts) <- c("Wave", "Respondents")

# Print
kable(wave_counts, align = "c")
Wave Respondents
1948 662
1952 1899
1954 1139
1956 1762
1958 1450
1960 1181
1962 1297
1964 1571
1966 1291
1968 1557
1970 1507
1972 2705
1974 1575
1976 2248
1978 2304
1980 1614
1982 1418
1984 2257
1986 2176
1988 2040
1990 1980
1992 2485
1994 1795
1996 1714
1998 1281
2000 1807
2002 1511
2004 1212
2008 2322
2012 5914
2016 4270
2020 8280

Problem 2

Python

How are survey respondents distributed across the major geographic regions of the US in the 1996 wave of the survey? (i.e., how many respondents per region)?

import pandas as pd

# read the data file and coerce all values to numeric
anes = pd.read_csv("~/ANES.csv")
<string>:3: DtypeWarning: Columns (4,5,6,7,8,9,13,14,17,18,19,21,22,23,24,25,26,27,28,29,30,31,32,33,34,38,39,40,41,42,43,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,233,254,275,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,361,382,403,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,776,787,798,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013,1014,1015,1016,1017,1018,1019,1020,1021,1022,1023,1024,1025,1026,1027,1028,1029) have mixed types. Specify dtype option on import or set low_memory=False.
anes2 = anes.apply(pd.to_numeric,errors="coerce")

# Filter out all rows where the respondent answered "Don't know" or "Refused"
anes = anes[anes["VCF0009x"].isin([1, 2])]

# Group the data by region and count the number of respondents in each region
region_counts = anes.groupby("VCF0004").size()

# Print the result
print(region_counts)
VCF0004
1948     662
1952    1899
1954    1139
1956    1762
1958    1450
1960    1009
1962    1297
1964    1571
1966    1291
1968    1557
1970    1507
1972    2705
1974    1101
1976    1005
1978    2304
1980    1614
1982    1418
1984    2257
1986    2176
1988    2040
1990    1980
1992    1153
2012       1
2016       2
dtype: int64

Problem 3

R

Considering the 2008 wave and subsequent waves, what percent of these interviews in each wave were partially or entirely translated to Spanish? Don’t forget to account for both pre and post election interviews (ANES surveys include pre or post election interviews).

# Subset the data for the 2008 wave and subsequent waves
ANES_sub <- subset(ANES, VCF0004 >= 2008)

# Create frequency tables for pre and post election interviews separately
pre_freq <- table(ANES_sub$VCF0018a)
post_freq <- table(ANES_sub$VCF0018b)

# Calculate the percentage of interviews that were partially or entirely translated to Spanish
pre_spanish_pct <- round(pre_freq[1]/sum(pre_freq) * 100, 2)
post_spanish_pct <- round(post_freq[1]/sum(post_freq) * 100, 2)

# Print the results
cat("Percentage of interviews translated to Spanish (PRE):", pre_spanish_pct, "%\n")
Percentage of interviews translated to Spanish (PRE): 97.33 %
cat("Percentage of interviews translated to Spanish (POST):", post_spanish_pct, "%\n")
Percentage of interviews translated to Spanish (POST): 85.86 %

Problem 4

Python

One of the questions on this survey has the interviewer read a list of words and phrases that people use to describe political figures. Then, the interviewer asks the interviewee to think about a given political figure, and the interviewer asks whether a given phrase describes that political figure extremely well, quite well, not too well or not well at all. So, for example, the interviewer might say, “Think about Ronald Reagan. In your opinion, does the phrase or word ‘intelligent’ describe Ronald Reagan extremely well, quite well, not too well, or not well at all?” Based on the survey data between 1980 and 2008, which president did women under the age of 40 think was the most knowledgeable? Which president was the least knowledgeable in the eyes of this group? You can average across all surveys during a president’s term. Some presidents will be included in more waves than others - that’s fine, use the average regardless of the number of terms.

import pandas as pd
 
# read the data file and coerce all values to numeric
anes = pd.read_csv("~/ANES.csv")
<string>:3: DtypeWarning: Columns (4,5,6,7,8,9,13,14,17,18,19,21,22,23,24,25,26,27,28,29,30,31,32,33,34,38,39,40,41,42,43,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,233,254,275,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,361,382,403,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,776,787,798,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013,1014,1015,1016,1017,1018,1019,1020,1021,1022,1023,1024,1025,1026,1027,1028,1029) have mixed types. Specify dtype option on import or set low_memory=False.
anes2 = anes.apply(pd.to_numeric,errors="coerce")

# filter the data to include only women under the age of 40
df_women_under_40 = anes2[(anes2['VCF0101'] >= 18) & (anes2['VCF0101'] <= 39) & (anes2['VCF0004'] >= 1980) & (anes2['VCF0004'] <= 2008) & (anes2['VCF0534'] == 1)]

# create a dictionary to map president names 
president_columns = {
    'Reagan': 'VCF0841',
    'Bush, G.H.W.': 'VCF0842',
    'Clinton': 'VCF0843',
    'Bush, G.W.': 'VCF0844',
    'Obama': 'VCF0845'
}

# create a DataFrame to hold the results 
df_president = pd.DataFrame(columns=['president', 'knowledgeable'])

# compute the average knowledge level
for president, column in president_columns.items():
    # filter the data for responses to this question for this president
    df_president_responses = df_women_under_40[df_women_under_40[column].isin([1, 2, 3, 4])]
    
    # compute the average knowledge level for this president
    avg_knowledge = df_president_responses[column].mean()
    
    # add the result
    df_president_result = pd.DataFrame({'president': president, 'knowledgeable': avg_knowledge}, index=[0])
    df_president = pd.concat([df_president, df_president_result], ignore_index=True)

# sort the DataFrame by knowledge level in descending order
df_president = df_president.sort_values(by='knowledgeable', ascending=False)

# print the results
print(f"Women under 40 thought {df_president.iloc[0]['president']} was the most knowledgeable president.")
Women under 40 thought Reagan was the most knowledgeable president.
print(f"Women under 40 thought {df_president.iloc[-1]['president']} was the least knowledgeable president.")
Women under 40 thought Obama was the least knowledgeable president.

Problem 5

R These days, the evidence suggests that higher levels of education are associated with more liberal political attitudes, as measured on a traditional seven-point ideology scale. Track this pattern over time. Use respondents from 1980, 1992, 2000, and 2020. What is the average political ideology of survey respondents with a college degree or greater vs. the political ideology of respondents without a college degree? (Note: some college doesn’t count) In addition, repeat this, but compare how this breaks down on along racial lines. Is the pattern the same for whites and non-whites?

library(readr)
ANES <- read_csv("~/ANES.csv")
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
Rows: 68224 Columns: 1030
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr    (3): Version, VCF0900c, VCF0901b
dbl (1019): VCF0004, VCF0006, VCF0006a, VCF0009x, VCF0010x, VCF0011x, VCF000...
lgl    (8): VCF0391c, VCF0391d, VCF1030a, VCF1030b, VCF1031a, VCF1031b, VCF1...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(ANES)


# select data for the years 1980, 1992, 2000, and 2020
anes_selected <- ANES %>% 
filter(VCF0004 %in% c(1980, 1992, 2000, 2020))

# create a new variable
anes_selected$education <- factor(anes_selected$VCF0110, levels = c(1,2,3,4), labels = c("Grade school or less", "High school", "Some college", "College or advanced degree"))

# filter for respondents with college&advanced degree
anes_selected_college <- anes_selected %>% 
  filter(VCF0110 == 4) %>% 
  group_by(VCF0004) %>% 
  summarise(avg_ideology = mean(VCF0503, na.rm = TRUE)) 

# filter for respondents w/o college degree
anes_selected_no_college <- anes_selected %>% 
  filter(VCF0110 != 4) %>% 
  group_by(VCF0004) %>% 
  summarise(avg_ideology = mean(VCF0503, na.rm = TRUE)) 

# filter for white respondents
anes_selected_white <- anes_selected %>% 
  filter(VCF0071a == 1 & VCF0071b == 1) %>% 
  group_by(VCF0004) %>% 
  summarise(avg_ideology = mean(VCF0503, na.rm = TRUE))

# filter for non-white respondents
anes_selected_nonwhite <- anes_selected %>% 
  filter(VCF0071a != 1 & VCF0071b != 1) %>% 
  group_by(VCF0004) %>% 
  summarise(avg_ideology = mean(VCF0503, na.rm = TRUE)) 

# categorize the averages by political ideology
anes_selected_college$category <- cut(anes_selected_college$avg_ideology, breaks = c(-Inf,1.5,2.5,3.5,4.5,5.5,6.5,Inf),
labels = c("Extremely liberal", "Liberal", "Slightly liberal", "Moderate", 
          "Slightly conservative", "Conservative", "Extremely conservative"))
anes_selected_no_college$category <- cut(anes_selected_no_college$avg_ideology, breaks = c(-Inf,1.5,2.5,3.5,4.5,5.5,6.5,Inf), labels = c("Extremely liberal", "Liberal", "Slightly liberal", "Moderate", "Slightly conservative", "Conservative", "Extremely conservative"))
anes_selected_white$category <- cut(anes_selected_white$avg_ideology, breaks = c(-Inf,1.5,2.5,3.5,4.5,5.5,6.5,Inf),
labels = c("Extremely liberal", "Liberal", "Slightly liberal", "Moderate", 
"Slightly conservative", "Conservative", "Extremely conservative"))
anes_selected_nonwhite$category <- cut(anes_selected_nonwhite$avg_ideology, breaks = c(-Inf,1.5,2.5,3.5,4.5,5.5,6.5,Inf),
labels = c("Extremely liberal", "Liberal", "Slightly liberal", "Moderate", 
"Slightly conservative", "Conservative", "Extremely conservative"))

#Print
anes_selected_college
# A tibble: 4 × 3
  VCF0004 avg_ideology category        
    <dbl>        <dbl> <fct>           
1    1980         2.75 Slightly liberal
2    1992         2.83 Slightly liberal
3    2000         2.54 Slightly liberal
4    2020         2.22 Liberal         
anes_selected_no_college
# A tibble: 4 × 3
  VCF0004 avg_ideology category        
    <dbl>        <dbl> <fct>           
1    1980         2.22 Liberal         
2    1992         3.45 Slightly liberal
3    2000         3.27 Slightly liberal
4    2020         2.52 Slightly liberal
anes_selected_white
# A tibble: 2 × 3
  VCF0004 avg_ideology category        
    <dbl>        <dbl> <fct>           
1    1992         3.31 Slightly liberal
2    2000         3.05 Slightly liberal
anes_selected_nonwhite
# A tibble: 2 × 3
  VCF0004 avg_ideology category        
    <dbl>        <dbl> <fct>           
1    1992         3.30 Slightly liberal
2    2000         2.81 Slightly liberal

Problem 6

Python

Several questions on this survey are related to social trust. I’m talking here about questions VCF0619-VCF0621. Let’s just look at the 2004 survey responses. Construct a scale that adds together the responses to these three questions so that higher values indicate greater social trust. Set the scale so it runs from zero (the minimum amount of trust). Now, consider how this scale relates to respondents’ partisan identity (strong Democrat to strong Republican). Do you see any evidence that greater social trust is associated with partisan identity?

import pandas as pd
import numpy as np

anes = pd.read_csv("~/ANES.csv")
<string>:2: DtypeWarning: Columns (4,5,6,7,8,9,13,14,17,18,19,21,22,23,24,25,26,27,28,29,30,31,32,33,34,38,39,40,41,42,43,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,233,254,275,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,361,382,403,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,776,787,798,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013,1014,1015,1016,1017,1018,1019,1020,1021,1022,1023,1024,1025,1026,1027,1028,1029) have mixed types. Specify dtype option on import or set low_memory=False.
anes2 = anes.apply(pd.to_numeric, errors="coerce")

anes_2004 = anes2[anes2["VCF0004"] == 2004]
anes_2004_socialtrust = anes_2004[["VCF0619", "VCF0620", "VCF0621"]]

anes_2004_socialtrust.replace([9], np.nan, inplace=True)
<string>:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
anes_2004_socialtrust["social_trust"] = anes_2004_socialtrust.sum(axis=1, skipna=True)
<string>:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
anes_2004["party"] = anes_2004["VCF0302"].replace({
    1: "Republican",
    2: "Independent",
    3: "No preference",
    4: "Other",
    5: "Democrat"
})
<string>:2: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
anes_2004_grouped = anes_2004_socialtrust.groupby(anes_2004["party"]).mean()
print(anes_2004_grouped)
                VCF0619   VCF0620   VCF0621  social_trust
party                                                    
8.0            1.250000  1.625000  1.500000      4.375000
9.0            1.500000  2.000000  1.500000      5.000000
Democrat       1.238220  1.396825  1.417989      4.023560
Independent    1.239899  1.395939  1.378788      3.977444
No preference  1.033333  1.254237  1.186441      3.433333
Other          1.285714  1.642857  1.500000      4.428571
Republican     1.375723  1.555233  1.542029      4.446686

The table presents some evidence that political affiliation is connected with one’s view on social trust… at least based on the social trust questions. Nevertheless, the study has certain limitations due to the information that was provided.

Firstly the only information provided in the table is each party’s or group’s mean score on the social trust scale. Even while it is helpful to compare these averages, it is as essential to take into account the diversity that exists within political groups. Some Republicans, for example, could have low levels of social trust, while some Democrats might have high levels of social trust. To have a full grasp of the connection between political identification and social faith, it is essential to examine the entire range of social trust ratings held within each group and to evaluate how these scores stack up against one another.

Second, while the mean social trust score for Republicans and Democrats is greater than that of Independents and those who don’t have a preference the gaps between the three groups are not very large. Only 4.45 out of 10 people who identify as Republicans have the greatest mean social trust score, while 3.43 out of 10 people with no preference have the lowest mean social trust score. These differences are likely insignificant in the grand scheme of things, particularly considering the small range for possible scores. 

Lastly, there is no information on the statistical significance of the differences in social trust between the groups provided in the table. A statistical test would need to be carried out to determine whether or not there is a difference in the mean levels of social trust between the groups that may be considered statistically significant.

In general, while the table presents some data that suggests a connection between political identification and social trust, further research would be necessary to comprehend the link between these factors all together.

Problem 7

R

A common type of question on political surveys is the “feeling thermometer” where respondents are asked how warm/cold they feel about certain topics or political groups or figures. It is widely believed that political polarization today is worse than in the past - Republicans have more negative feelings about Democrats today than they did in years past, and vice versa. Use these survey data to assess this claim.

library(dplyr)

#clean up the data to get rid of those who answered NA
anes_clean <- ANES %>%
  mutate(lib_thermo = ifelse(VCF0211 < 97, VCF0211, NA),
         con_thermo = ifelse(VCF0212 < 97, VCF0212, NA)) %>%
  filter(VCF0302 %in% c(1, 5))
#aggregate the data 
anes_agg <- anes_clean %>%
  group_by(VCF0004, VCF0302) %>%
  summarise(mean_lib_thermo = mean(lib_thermo, na.rm = TRUE),
            mean_con_thermo = mean(con_thermo, na.rm = TRUE))
`summarise()` has grouped output by 'VCF0004'. You can override using the
`.groups` argument.
#plot it out 
library(ggplot2)

#Plot one: feelings towards "Liberals"
ggplot(anes_agg, aes(x = VCF0004, y = mean_lib_thermo, color = factor(VCF0302))) +
geom_line() +
scale_color_manual(values = c("#D7191C", "#2C7BB6"), 
labels = c("Republicans", "Democrats")) +
labs(x = "Year", y = "Mean Thermometer Rating",
title = "Thermometer Ratings for Liberals by Party Identification",
color = "Party Identification") +
theme_minimal()

#Plot two: feelings towards "Conservatives"
ggplot(anes_agg, aes(x = VCF0004, y = mean_con_thermo, color = factor(VCF0302))) +
geom_line() +
scale_color_manual(values = c("#D7191C", "#2C7BB6"), 
labels = c("Republicans", "Democrats")) +
labs(x = "Year", y = "Mean Thermometer Rating",
title = "Thermometer Ratings for Conservatives by Party Identification",
color = "Party Identification") +
theme_minimal()

Part 2: Exploration

R

As a gay man in the military, I have a particular interest in how the views of LGBT+ veterans has evolved over time, and to the degree in which Americans held these ideologies. 2011 is when officially Dont Ask Dont Tell was repealed by the Obama administration. The purpose of this next section aims at comparing the following Data Points:
- VCF0877 FAVOR OR OPPOSE GAYS IN THE MILITARY

-VCF0877a STRENGTH OF POSITION ON GAYS IN THE MILITARY

- VCF0004 YEAR OF STUDY

- VCF0302 PARTY IDENTIFICATION OF RESPONDENT- INITIAL PARTY ID RESPONSE

library(dplyr)
library(ggplot2)

# Recode variables
anes_cleanup <- ANES %>%
  mutate(gay_military = ifelse(VCF0877 == 1, 1, ifelse(VCF0877 == 5, 0, NA)),
         party_id = case_when(
           VCF0302 == 1 ~ "Republican",
           VCF0302 == 2 ~ "Independent",
           VCF0302 == 5 ~ "Democrat",
           TRUE ~ NA_character_
         )) %>%
  filter(!is.na(gay_military) & !is.na(party_id))

# Aggregate the data
anes_agg_military <- anes_cleanup %>%
  group_by(VCF0004, party_id) %>%
  summarise(supportive = mean(gay_military, na.rm = TRUE))
`summarise()` has grouped output by 'VCF0004'. You can override using the
`.groups` argument.
# Create the plot
ggplot(anes_agg_military, aes(x = VCF0004, y = supportive, color = party_id)) +
  geom_line() +
scale_color_manual(values = c("#2C7BB6", "#F0E442", "#D7191C"), 
labels = c("Democrats", "independents", "Republicans")) +
  labs(x = "Year", y = "Supportiveness for gays in the military",
       title = "Support for gays in the military over time by party identification",
       color = "Party Identification") +
  theme_minimal() +
  scale_y_continuous(limits = c(0, 1), 
                      breaks = c(0, 0.25, 0.5, 0.75, 1), 
                      labels = c("Least supportive", " ", " ", " ", "Most supportive"))

What we gather from this first graphic is the relative support for gay servicemembers within the military based on poltiical party and year. The next graphic aims at showing the relative strength that members of each political party felt about the issue. Exploring the graphic above, we can see that within all year groups, Republican identifying respondents held the least supportive views of gay servicemembers, while democrats held the most supportive outlook within all year groups. Of note, all 3 categories have showed a positive slope in their support for gay servicemembers. Additionally, independents tended to have a relatively shared viewpoint of gay servicemembers as Democrats, but with a smaller margin (exception:2008).

… Now lets take a look at how “strongly” Americans held their positions on the subject using the data below and a stacked bar chart.

-VCF0877a STRENGTH OF POSITION ON GAYS IN THE MILITARY

library(tidyr)
library(dplyr)
library(ggplot2)
anes_filtered <- ANES %>%
  filter(VCF0302 %in% c(1, 2, 5)) %>%
  mutate(VCF0877a = ifelse(VCF0877a == 7, NA, VCF0877a)) %>%
  drop_na(VCF0877a) %>%
  mutate(Party_ID = case_when(VCF0302 == 1 ~ "Republican",
                              VCF0302 == 2 ~ "Independent",
                              VCF0302 == 5 ~ "Democrat")) 

anes_filtered$VCF0877a <- factor(anes_filtered$VCF0877a,
                                 labels = c("Strongly Allow",
                                            "Not Strongly Allow",
                                            "Not Strongly Disallow",
                                            "Strongly Disallow",
                                            "Don't Know / Depends"))
ggplot(anes_filtered, aes(x = VCF0004, fill = VCF0877a)) +
geom_bar() +
labs(x = "Year", y = "Count", 
title = "Strength of Position on Gays in the Military by Party Identification",
fill = "Strength of Position") +
facet_wrap(~ Party_ID, ncol = 1, scales = "free_y") +
theme_minimal()

What we learn from this graphic is that all three party categories had a significant growth in the “Strongly allow section”. The most surprising aspect of this data was the growth of “Not strongly allow” selections from respondents of the republican and independent party. The last survey of this question occurred in 2012, with large margins of gain in the Not Strong Allow, meaning many respondents of these parties developed a so-so attitude to the subject. What I found to be the most fascinating was the overall stark decrease in the number of “Strongly Disallow” responses of all three parties. This shows how far the nation has come to their view and treatment of LGBTQ servicemembers. While we have a while to go, it is progress.

Of note: These questions did not account for Trans and other queer categories of servicemembers. I would be incredibly interested in viewing that data.

Python:

The second part of the project also takes aim at the American support of homosexuality protections. I wanted to take a look at the support for a law that would protect LGBTQ+ Americans based on two main factors: religion and region. Typically we have a view in the US that the majority of people who actively have a negative view of LGBTQ protections are from the South and are typically religious. I wanted to test those assumptions within this project with the data indexes below:

VCF0112 CENSUS REGION Question Region - U.S. Census Valid

VCF0846 IS RELIGION IMPORTANT TO RESPONDENT 369

(VCF0876) LAW TO PROTECT HOMOSEXUALS AGAINST DISCRIMINATION.

Outline of regions/states:

1. Northeast (CT, ME, MA, NH, NJ, NY, PA, RI, VT)

2. North Central (IL, IN, IA, KS, MI, MN, MO, NE, ND, OH, SD, WI)

3. South (AL, AR, DE, D.C., FL, GA, KY, LA, MD, MS, NC, OK, SC,TN, TX, VA, WV)

4. West (AK, AZ, CA, CO, HI, ID, MT, NV, NM, OR, UT, WA, WY)

import pandas as pd

# read the data file and coerce all values to numeric
anes = pd.read_csv("~/ANES.csv")
<string>:3: DtypeWarning: Columns (4,5,6,7,8,9,13,14,17,18,19,21,22,23,24,25,26,27,28,29,30,31,32,33,34,38,39,40,41,42,43,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,233,254,275,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,361,382,403,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,776,787,798,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013,1014,1015,1016,1017,1018,1019,1020,1021,1022,1023,1024,1025,1026,1027,1028,1029) have mixed types. Specify dtype option on import or set low_memory=False.
anes2 = anes.apply(pd.to_numeric,errors="coerce")

# Filter for the two variables of interest and remove missing data
df = anes2[['VCF0112', 'VCF0876', 'VCF0004']]
df = df.dropna()
df = df[df['VCF0876'] != 8] # Remove "Don't know" or "Refused" responses

# Group by CENSUS REGION and YEAR OF STUDY and calculate percentage support
df_grouped = df.groupby(['VCF0112', 'VCF0004']).agg({'VCF0876': 'mean'})
df_grouped = df_grouped.reset_index()
df_grouped['VCF0876'] = df_grouped['VCF0876'] * 100

import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))

# Define colors for each region
colors = {
    1: 'blue',
    2: 'green',
    3: 'red',
    4: 'purple'
}

# Iterate over regions and plot line for each
for region, color in colors.items():
    region_data = anes2[anes2['VCF0112'] == region].groupby('VCF0004')['VCF0876'].mean().reset_index()
    ax.plot(region_data['VCF0004'], region_data['VCF0876'], color=color, label=f'Region {region}')

# Set axis labels and legend
ax.set_xlabel('Year')
ax.set_ylabel('Support for Law to Protect Homosexuals')
ax.set_title('Support for Law to Protect Homosexuals by Census Region and Year')
ax.legend()
plt.show()

What we are able to discern from this overall graphic above is the confirmation of our regional assumptions…to a certain extent. Within the data the majority of those who opposed the passage of LGBTQ protections were from the south. Second place went to “region 2” or the North central. Regions 1 and 4 (Northeast and West respectively) were the most supportive of the legislation, intersecting multiple times within the year groups (2005-2020). All regions saw a decrease in the amount of non-supporters, meaning a national rise of support for these protections can be observed.

Now let’s take a look at religion and how it factors in to support for LGBTQ+ protections:

import matplotlib.pyplot as plt
import pandas as pd

# read the data file and coerce all values to numeric
anes = pd.read_csv("~/ANES.csv")
<string>:3: DtypeWarning: Columns (4,5,6,7,8,9,13,14,17,18,19,21,22,23,24,25,26,27,28,29,30,31,32,33,34,38,39,40,41,42,43,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136,137,138,139,140,141,142,143,144,145,146,147,148,149,150,151,152,153,154,155,156,157,158,159,160,161,162,163,164,165,166,167,168,169,170,171,172,173,174,175,176,177,178,179,180,181,182,183,184,185,186,187,188,189,190,191,192,193,194,195,196,197,198,199,200,201,202,203,204,205,206,207,208,209,210,211,212,233,254,275,296,297,298,299,300,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,327,328,329,330,331,332,333,334,335,336,337,338,339,340,361,382,403,424,425,426,427,428,429,430,431,432,433,434,435,436,437,438,439,440,441,442,443,444,445,446,447,448,449,450,451,452,453,454,455,456,457,458,459,460,461,462,463,464,465,466,467,468,469,470,471,472,473,474,475,476,477,478,479,480,481,482,483,484,485,486,487,488,489,490,491,492,493,494,495,496,497,498,499,500,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,524,525,526,527,528,529,530,531,532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,559,560,561,562,563,564,565,566,567,568,569,570,571,572,573,574,575,576,577,578,579,580,581,582,583,584,585,586,587,588,589,590,591,592,593,594,595,596,597,598,599,600,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,622,623,624,625,626,627,628,629,630,631,632,633,634,635,636,637,638,639,640,641,642,643,644,645,646,647,648,649,650,651,652,653,654,655,656,657,658,659,660,661,662,663,664,665,666,667,668,669,670,671,672,673,674,675,676,677,678,679,680,681,682,683,684,685,686,687,688,689,690,691,692,693,694,695,696,697,698,699,700,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,721,722,723,724,725,726,727,728,729,730,731,732,733,734,735,736,737,738,739,740,741,742,743,744,745,746,747,748,749,750,751,752,753,754,755,756,757,758,759,760,761,762,763,764,765,776,787,798,809,810,811,812,813,814,815,816,817,818,819,820,821,822,823,824,825,826,827,828,829,830,831,832,833,834,835,836,837,838,839,840,841,842,843,844,845,846,847,848,849,850,851,852,853,854,855,856,857,858,859,860,861,862,863,864,865,866,867,868,869,870,871,872,873,874,875,876,877,878,879,880,881,882,883,884,885,886,887,888,889,890,891,892,893,894,895,896,897,898,899,900,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,917,918,919,920,921,922,923,924,925,926,927,928,929,930,931,932,933,934,935,936,937,938,939,940,941,942,943,944,945,946,947,948,949,950,951,952,953,954,955,956,957,958,959,960,961,962,963,964,965,966,967,968,969,970,971,972,973,974,975,976,977,978,979,980,981,982,983,984,985,986,987,988,989,990,991,992,993,994,995,996,997,998,999,1000,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,1013,1014,1015,1016,1017,1018,1019,1020,1021,1022,1023,1024,1025,1026,1027,1028,1029) have mixed types. Specify dtype option on import or set low_memory=False.
anes2 = anes.apply(pd.to_numeric,errors="coerce")

# Extract relevant columns
religion_guidance = anes2["VCF0847"].dropna()
homosexual_law_support = anes2["VCF0876"].dropna()

# Convert column values to meaningful labels
religion_guidance = religion_guidance.replace({1: "Some", 2: "Quite a bit", 3: "A great deal", 5: "Religion not important"})
homosexual_law_support = homosexual_law_support.replace({1: "Favor", 5: "Oppose", 8: "DK/Depends"})

# Compute crosstab and plot
ct = pd.crosstab(religion_guidance, homosexual_law_support)
ct.plot(kind="bar", stacked=True)
plt.xlabel("Guidance from Religion")
plt.ylabel("Support for Law to Protect Homosexuals")
plt.show()

For the ease of reading the graphic, the respondents were asked how much religion factors in to their “day to day living”. Of important distinction, no specific religion was measured for this question. Overall the purpose of this exploration is to see how ALL religions and HOW religious someone is, affects their views of LGBTQ protections. The results showed that a majority of those who are religious, still supported the LGBTQ protections. Those who did not support, were almost 3x more likely to identify as religion having a “great deal” of influence within their daily lives.

All together the purpose of this section was to explore how factors of religion and geography can predispose an American to not supporting LGBTQ+ codified rights. That being said, within both cases, the majority of the data reflects an overall positive reception to the idea of codified rights.