DATA 607 - Final Project

Final Project Overview and Motivation
Our Datasets
- Dataset 1 – 2016 Presidential Election Data
- Dataset 2 – Public Religion Research Institute (PRRI)
Data Wrangling
- Reading data files from Github
- Subsetting the data
2016 Presidential Election Data
- Percentages
- Broader Trend
American Values Survey
Discussion
Conclusion
Addendum – Utilizing JSON Data for this Project

Final Project Overview and Motivation

As the presidential election cycle starts to ramp up again, we thought it would be interesting to take a look back at the election data from 2016 in order to think more about potential factors that could affect the outcome of next year’s vote. Labeled by Politico as the “biggest upset in U.S. history”, a large narrative about a “divided” America continued to develop in the days following the 2016 presidential election. Many would argue that this narrative continues to dominate current news headlines and will be an influential factor in the way candidates run their campaigns over the next 12 months.

For our project, we are curious about what factors seem to “divide” America. We’ll explore questions such as:

Which values are more characteristic of voters that decided to vote for Donald Trump in 2016?
Are there trends in values on a statewide level? And if so, do they favor one candidate over the other?
Were certain values more predictive than others in favoring votes for a particular candidate?

Our Datasets

We’ll be utilizing two different datasets for this project.

Dataset 1 – 2016 Presidential Election Data

The first contains the 2016 presidential election results for every United States county, among other past election data.

knitr::opts_chunk$set(echo = TRUE)
require(tidyverse)
require(dplyr)
require(tidyr)
require(plyr)
require(knitr)
require(kableExtra)

election_data <- read.csv('https://raw.githubusercontent.com/zachalexander/data607_cunysps/master/FinalProject/election_data.csv')
kable(head(election_data, n = 15L), align = rep('c', 5)) %>% 
  kable_styling(bootstrap_options = c("striped", "responsive", "condensed"), full_width = FALSE) %>% 
  kableExtra::scroll_box(width = "100%", height = "500px")

X	state	county	fips	trump16	clinton16	otherpres16	romney12	obama12	otherpres12	demsen16	repsen16	othersen16	demhouse16	rephouse16	otherhouse16	demgov16	repgov16	othergov16	repgov14	demgov14	total_population	cvap	white_pct	black_pct	hispanic_pct	nonwhite_pct	foreignborn_pct	female_pct	age29andunder_pct	age65andolder_pct	median_hh_inc	clf_unemploy_pct	lesshs_pct	lesscollege_pct	lesshs_whites_pct	lesscollege_whites_pct	rural_pct	ruralurban_cc
1	Alabama	Autauga	1001	18172	5936	865	17379	6363	190	6331	18220	62	7544	14315	2258	NA	NA	NA	9427	3638	55049	40690	75.68348	18.370906	2.5722538	24.316518	1.8383622	51.17622	40.03706	13.97846	53099	5.591657	12.417046	75.40723	10.002112	74.06560	42.00216	2
2	Alabama	Baldwin	1003	72883	18458	3874	66016	18424	898	19145	74021	248	0	76995	1991	NA	NA	NA	37650	8416	199510	151770	83.17879	9.225603	4.3666984	16.821212	3.2695103	51.19493	35.47441	18.71485	51365	6.286843	9.972418	70.45289	7.842227	68.40561	42.27910	3
3	Alabama	Barbour	1005	5454	4871	144	5550	5912	47	4777	5436	16	5297	4286	463	NA	NA	NA	3111	3651	26614	20375	45.88562	47.888329	4.3097618	54.114376	2.8593973	46.49808	37.66439	16.52889	33956	12.824738	26.235928	87.13221	19.579752	81.36475	67.78963	6
4	Alabama	Bibb	1007	6738	1874	207	6132	2202	86	2082	6612	17	1971	6670	15	NA	NA	NA	3525	1368	22572	17590	74.76520	21.212121	2.2239943	25.234804	1.3512316	46.46465	37.32943	14.88570	39776	7.146827	19.301587	88.00000	15.020490	87.47177	68.35261	1
5	Alabama	Blount	1009	22859	2156	573	20757	2970	279	2980	22169	48	2390	22367	47	NA	NA	NA	12074	2178	57704	42430	87.65770	1.557951	8.7272979	12.342299	4.2718009	50.48524	37.24005	17.19292	46212	5.953833	19.968585	86.95024	16.643368	86.16361	89.95150	1
6	Alabama	Bullock	1011	1140	3530	40	1251	4061	10	3364	1167	6	3517	889	80	NA	NA	NA	747	2440	10552	8195	21.68309	75.502274	0.1231994	78.316907	1.5163002	45.80174	37.93594	15.13457	29335	13.258520	33.437883	89.74499	11.780384	79.15778	51.37438	6
7	Alabama	Butler	1013	4901	3726	105	5087	4374	35	3663	4840	7	4088	3918	321	NA	NA	NA	3148	2741	20280	15425	52.78107	43.515779	1.2475345	47.218935	1.3954635	53.04734	37.05621	18.12623	34315	9.804827	18.940426	83.91999	14.604180	80.07600	71.23216	6
8	Alabama	Calhoun	1015	32865	13242	1757	30278	15511	468	14152	32976	69	14000	33175	75	NA	NA	NA	17688	9082	115883	88525	72.99777	20.331714	3.4362245	27.002235	2.3937937	51.89804	38.78308	15.93072	41954	11.681822	17.663137	82.34703	16.399511	81.64061	33.69683	3
9	Alabama	Chambers	1017	7843	5784	273	7626	6871	114	5845	7865	18	5796	7907	16	NA	NA	NA	3635	4587	34018	26480	56.74349	40.478570	0.4350638	43.256511	0.9906520	52.08713	35.87219	18.38732	36027	7.489945	19.736732	87.51572	15.295351	83.85879	49.14803	6
10	Alabama	Cherokee	1019	8953	1547	233	7506	2132	141	1915	8636	7	1702	8707	12	NA	NA	NA	5007	1868	25897	20505	91.87165	4.606711	1.5638877	8.128355	0.7259528	50.25292	32.71035	20.32282	38925	5.855731	18.717236	86.03830	19.256407	85.78324	85.73627	6
11	Alabama	Chilton	1021	15081	2911	377	13932	3397	133	3327	14582	25	2996	14723	12	NA	NA	NA	9028	2419	43817	31555	80.61027	10.009813	7.6248944	19.389735	5.5572038	50.79764	38.83424	15.24979	42594	7.938580	19.902472	85.14545	17.849792	84.38194	86.74472	1
12	Alabama	Choctaw	1023	4106	3109	77	4152	3786	30	2992	4035	8	4332	0	94	NA	NA	NA	2380	1947	13287	10515	56.20531	42.229247	0.5569354	43.794686	0.1655754	52.21645	33.40107	20.74960	32622	13.642814	21.147333	88.03980	17.688938	87.45893	100.00000	9
13	Alabama	Clarke	1025	7140	5749	142	7470	6334	47	5558	7158	14	5214	2848	127	NA	NA	NA	5000	3524	24847	19195	53.17745	45.494426	0.1730591	46.822554	0.4668572	52.75888	36.60402	18.06254	32735	17.083333	18.957819	87.86277	14.136998	82.08411	75.98034	7
14	Alabama	Clay	1027	5245	1237	142	4817	1777	68	1377	5147	4	1317	5231	4	NA	NA	NA	3199	1215	13483	10360	80.26404	14.744493	3.1076170	19.735964	1.8912705	51.55381	34.84388	19.40221	38815	6.591530	25.386407	88.93712	21.804320	87.52224	100.00000	9
15	Alabama	Cleburne	1029	5764	684	145	5272	971	62	847	5554	6	769	5644	4	NA	NA	NA	3129	717	14991	11295	92.79568	2.361417	2.3280635	7.204323	1.7477153	50.48362	36.14169	18.16423	36316	7.173601	25.815165	88.47193	24.262969	88.65948	100.00000	8

The dataset was found on GitHub, here: https://github.com/tonmcg/US_County_Level_Election_Results_08-16

Information about this dataset: The 2016 election results were obtained from Townhall.com by utilizing a web-scraping package – beautifulsoup. This Python package is referenced in the GitHub readme file of the user above. Beautifulsoup is an HTML parser that assists in scraping data from websites. In this case, the GitHub user located the published election results here, and due to it’s easy-to-scrape html table format, was able to obtain the results quite easily using this package. Websites like Townhall.com have designated data teams that help produce the real-time election results on the night of the election.

Dataset 2 – Public Religion Research Institute (PRRI)

The second dataset that we’ll work with contains data related to values, with a respondent identifier that captures their state of residence – which we can use to connect back to our election results. This dataset was found on the Public Religion Research Institute (PRRI) website and contains a large number of questions related to values ranging from respondent’s views on immigration, gun control laws, health care, and much more.

Information about this dataset: This survey was completely designed and conducted by PRRI. Their website states that it is the “eighth annual multi-issue survey of it’s kind”.

Survey Methodology

The survey was conducted between October 18th and October 30th, 2017.
Data collection was based on stratified, single-stage, random-digit-dialing (RDD) sampling of landline telephone households and randomly generated cell phone numbers.
In the end, they were able to survey 2,019 individuals (810 landline and 1,209 cell phone)
The sample represents responses from adults 18 years or older living in the United States.

Data Wrangling

Reading data files from Github

require(haven)
spss_file <- file.path('https://github.com/zachalexander/data607_cunysps/blob/master/FinalProject/PRRI-2017-American-Values-Survey.sav?raw=true')
avs <- read_sav(spss_file)
kable(head(avs, n = 15L), align = rep('c', 5)) %>% 
  kable_styling(bootstrap_options = c("striped", "responsive", "condensed"), full_width = FALSE) %>% 
  kableExtra::scroll_box(width = "100%", height = "500px")

case_id	week	weight	state	metro	region	division	ownhome	marital	employ	hhnum	adults	kids1217	kids611	kids06	parent	age	age2	educ2	educ	income	race	partyrot	party	ideorot	ideo	regvote	othtel	sex	ident	l1	l2a	c1	c2	c1a	usborn	relig	denom	born	jewid	secular	jdenom	year	demh1	demh2	date	sexorient_1	sexorient_5	formrot	q1	q2	q3rot	q3	q4rot	q4	q5rot	q5	q6rot	q6a	q6b	q6c	q6d	q6e	q6f	partyln	partyst1	partyst2	q7	q8	q9	q10	q11	q12	q13rot	q13	q14rot	q14a	q14b	q14c	q14d	q15	q16	q17rot	q17a	q17c	q17b	q17d	q17e	q17f	q17g	q17h	q18	q19	q20	q21	q22rot	q22a	q22b	q22c	q22d	q22e	q22f	q22g	q23	q24	q25rot	q25
10000001	842	0.3804	SC	3	3	5	1	6	3	2	2	NA	NA	NA	2	69	NA	6	4	2	1	1	2	1	4	1	1	1	1	2	NA	NA	NA	NA	NA	90	NA	NA	2	1	NA	17	NA	NA	171018	0	1	1	3	4	1	2	NA	NA	1	4	ACB	2	3	2	NA	NA	NA	NA	NA	1	4	1	3	NA	NA	NA	1	2	ABCD	4	4	1	3	3	1	EGHCDAFB	2	2	3	3	3	2	3	3	1	3	1	2	DBEFGCA	1	3	3	3	4	4	1	2	1	1	1
10000002	842	0.5964	GA	1	3	5	1	6	3	2	2	NA	NA	NA	2	65	NA	4	3	98	1	1	3	1	1	2	0	2	1	1	NA	NA	NA	8	NA	99	NA	NA	NA	NA	NA	17	NA	NA	171018	1	0	2	3	4	1	2	NA	NA	4	3	FED	NA	NA	NA	1	1	2	2	NA	NA	4	1	NA	3	NA	NA	1	2	DBCA	3	4	1	4	3	1	DBAFHECG	2	3	2	3	2	2	3	3	3	1	2	1	GBCDEFA	3	4	3	3	2	4	2	1	1	1	2
10000003	842	0.8891	FL	1	3	5	1	3	4	2	2	NA	NA	NA	2	99	4	4	3	4	2	2	1	2	3	1	0	2	1	1	NA	NA	NA	1	NA	1	NA	1	NA	NA	NA	17	NA	NA	171018	1	0	2	5	4	2	2	NA	NA	2	5	EDF	NA	NA	NA	1	2	3	NA	2	NA	4	2	NA	3	2	2	2	9	ADBC	3	4	1	3	3	1	ACFBHGED	3	2	3	3	3	3	3	3	1	1	1	1	BDFCGEA	2	2	2	1	3	3	2	1	1	2	2
10000004	842	0.9169	FL	1	3	5	1	3	3	2	2	NA	NA	NA	2	71	NA	5	3	4	1	2	2	2	4	1	0	1	1	1	NA	NA	NA	1	NA	96	NA	NA	2	NA	NA	17	NA	NA	171018	1	0	1	1	4	2	2	NA	NA	5	5	ACB	1	1	1	NA	NA	NA	NA	NA	1	4	2	3	NA	NA	NA	2	2	DACB	4	4	1	4	3	1	DHGCEABF	3	3	3	4	3	2	3	4	2	1	1	2	GACEBFD	1	3	3	3	2	3	2	2	1	2	1
10000005	842	0.8904	NJ	3	1	2	2	1	6	1	1	NA	NA	NA	2	33	NA	3	2	1	2	1	2	1	4	1	0	1	1	1	NA	NA	NA	1	NA	22	13	2	NA	NA	NA	17	1	4	171018	1	0	2	4	2	NA	NA	1	1	3	3	DFE	NA	NA	NA	3	3	2	NA	NA	2	3	2	NA	3	NA	NA	1	2	BDCA	4	4	1	4	3	2	BHCFAEGD	2	2	4	3	2	4	1	2	1	1	1	2	ADEBFGC	1	2	2	1	4	4	1	2	1	1	1
10000006	842	0.6692	OR	5	4	9	2	4	7	1	1	NA	NA	NA	2	55	NA	3	2	1	1	1	2	1	4	1	0	2	1	1	NA	NA	NA	1	NA	90	NA	NA	2	2	NA	17	1	4	171018	1	0	1	4	3	1	2	NA	NA	2	4	ACB	9	2	1	NA	NA	NA	NA	NA	2	3	1	3	NA	NA	NA	1	2	CADB	9	2	9	4	9	1	CGFEDHAB	2	1	4	3	3	2	2	3	1	1	2	1	GABCDEF	1	1	1	4	4	4	1	1	1	1	1
10000007	842	0.3147	OH	3	2	3	1	3	1	1	1	NA	NA	NA	2	50	NA	4	3	8	1	2	3	2	1	1	4	1	1	1	NA	NA	NA	7	NA	99	NA	NA	NA	NA	NA	17	1	2	171018	1	0	1	1	1	NA	NA	2	1	4	2	ABC	1	9	9	NA	NA	NA	1	NA	NA	1	9	9	NA	9	9	2	9	BCDA	1	1	1	1	9	2	BGHEAFCD	2	3	1	1	2	4	9	1	1	2	9	1	ECBDFGA	4	1	1	3	3	2	3	9	9	2	2
10000008	842	0.8399	MD	2	3	5	1	3	1	2	2	NA	NA	NA	2	61	NA	7	5	12	1	1	2	1	3	1	0	1	1	1	NA	NA	NA	1	NA	99	NA	NA	NA	NA	NA	17	1	1	171018	1	0	2	1	4	1	2	NA	NA	4	4	EDF	NA	NA	NA	3	3	2	NA	NA	1	4	1	NA	3	NA	NA	1	2	ABCD	4	4	1	4	3	1	ECDHFBAG	1	9	3	3	3	2	4	3	1	1	1	2	CAFGEBD	1	3	9	4	4	4	1	2	1	1	1
10000009	842	1.3808	FL	2	3	5	1	3	1	4	3	1	0	0	1	56	NA	5	3	13	1	1	3	1	3	1	0	2	1	1	NA	NA	NA	1	NA	2	NA	NA	NA	NA	NA	17	1	1	171018	1	0	2	4	4	1	2	NA	NA	1	5	EFD	NA	NA	NA	1	3	1	2	NA	NA	4	2	NA	3	NA	NA	1	2	DACB	4	3	1	3	3	1	HEDGACBF	2	1	4	1	2	4	1	4	4	3	1	2	GACEBFD	1	3	2	1	3	1	2	2	1	1	1
10000011	842	0.6906	MI	3	2	3	1	3	3	2	2	NA	NA	NA	2	68	NA	4	3	6	1	2	3	2	4	1	0	2	1	1	NA	NA	NA	1	NA	95	NA	NA	2	NA	NA	17	NA	NA	171018	1	0	1	1	4	2	2	NA	NA	1	4	CBA	3	2	3	NA	NA	NA	2	NA	NA	4	1	3	NA	NA	NA	2	2	BDCA	4	4	1	4	3	1	BCHAEGFD	1	1	4	4	4	2	4	4	1	1	2	2	AFEGBDC	2	4	3	3	4	4	1	2	1	2	1
10000012	842	1.4924	NY	3	1	2	1	1	6	9	9	NA	NA	NA	2	41	NA	3	2	1	1	2	2	2	5	1	0	1	1	1	NA	NA	NA	1	NA	2	NA	NA	NA	NA	NA	17	1	4	171018	0	1	2	5	3	2	1	NA	NA	4	3	DEF	NA	NA	NA	1	1	1	NA	NA	2	4	3	NA	2	NA	NA	2	3	CDBA	2	2	2	2	3	1	CDGBHFEA	4	4	4	4	4	4	1	4	2	3	2	2	CDABEFG	4	4	4	4	4	4	4	2	2	2	1
10000013	842	0.9353	PA	3	1	2	1	2	6	2	2	NA	NA	NA	2	74	NA	4	3	5	1	1	2	1	3	1	0	1	1	2	NA	NA	NA	NA	NA	15	3	2	NA	NA	NA	17	NA	NA	171018	1	0	1	2	4	1	1	NA	NA	4	5	CAB	3	3	1	NA	NA	NA	NA	NA	1	3	1	2	NA	NA	NA	1	2	CBAD	4	3	1	3	3	2	GFCDAEBH	4	3	3	4	9	4	3	4	1	1	1	2	ABFGECD	2	3	2	4	2	1	1	2	2	1	1
10000014	842	0.6506	NY	1	1	2	2	6	3	1	1	NA	NA	NA	2	62	NA	3	2	1	8	2	2	2	3	1	0	2	1	1	NA	NA	NA	1	NA	25	NA	1	NA	NA	NA	17	1	4	171018	1	0	2	4	3	2	2	NA	NA	3	5	FED	NA	NA	NA	2	1	1	NA	NA	2	3	2	NA	3	NA	NA	2	2	CBAD	4	1	1	3	2	2	GBCHAFED	4	4	3	4	4	1	2	4	1	3	2	2	GACEBFD	2	2	2	3	2	3	3	2	1	2	2
10000015	842	0.6039	NY	4	1	2	1	2	2	5	4	0	1	0	1	27	NA	4	3	6	1	1	3	1	3	1	0	1	1	1	NA	NA	NA	2	NA	20	NA	NA	NA	NA	NA	17	1	4	171018	1	0	2	3	4	1	2	NA	NA	5	4	FED	NA	NA	NA	3	3	1	2	NA	NA	4	3	NA	3	NA	NA	1	2	DBCA	4	4	1	4	3	1	HADGECFB	1	1	4	4	4	1	4	4	1	1	2	1	CBEFAGD	1	4	3	3	4	4	2	2	1	1	1
10000016	842	1.7472	NY	4	1	2	1	3	4	4	4	NA	NA	NA	2	55	NA	3	2	6	1	2	3	2	4	1	0	2	1	1	NA	NA	NA	1	NA	2	NA	NA	NA	NA	NA	17	1	1	171018	1	0	2	2	4	2	2	NA	NA	5	4	DFE	NA	NA	NA	2	2	1	2	NA	NA	4	1	NA	3	NA	NA	2	2	ACBD	3	3	1	3	3	1	EBADGFHC	2	2	4	4	2	2	3	3	2	3	1	2	ECDFGAB	1	3	1	3	3	1	1	1	1	2	1

Subsetting the data

To start, we subsetted the election data and selected columns that were relevant to the 2016 election, including the state column and the election totals for Republicans, Democrats, and Independents by county. Then, we created a total column that sums the total votes per county. Next, we used the group_by() and summarise_all() functions to group and sum the votes by state. At this point we had a data frame with each row contianing voter information for a state. Finally, we added a percentage column to get the percent of voters that voted for Donald Trump broken down by each state by dividing the number or Trump votes by the total number of votes in each state.

Trump_sub<- election_data[,c(2,5,6,7)]
Trump_sub$total16 <- Trump_sub$trump16 + Trump_sub$clinton16 + Trump_sub$otherpres16
Trump_map <- Trump_sub %>% group_by(state) %>% summarise_all(sum)
Trump_map$trump_per <- Trump_map$trump16/Trump_map$total16

For our next analyses and plots, we had to load in a few mapping packages, as well as the tidyverse package for more data wrangling.

require(sf)
require(maps)
require(RColorBrewer)
require(tools)

In order to create our first map, we needed to use the maps package to help load in the necessary geometry data that allowed us to create a map of the United States. We were able to load in the proper geometries to draw the state boundaries, as well as project the map to the correct coordinate plane. Then, we needed to subset our election data a bit more to remove the vote totals for Alaska and Hawaii, since there was a very low number of respondents from these states that contributed to the American Values Survey – we felt that any analyses and comparisons would not be representative of these states.

states <- st_as_sf(map("state", plot = FALSE, fill = TRUE))
states$ID <- toTitleCase(states$ID)
USA_votes <- Trump_map[which(Trump_map$state != "Alaska" & Trump_map$state != "Hawaii"),]
states$trump_per <- USA_votes$trump_per

for(i in 1:length(states$trump_per)){
  if(states$trump_per[i]-0.5>0.05){
    states$color[i] <- "1 - Greater than 55% voted for Trump"
  }
  if(states$trump_per[i]-0.5< 0.05 & states$trump_per[i]- 0.5 >0){
    states$color[i] <- "2 - Between 50% and 55% voted for Trump"
  }
  if(states$trump_per[i]-0.5> (-0.05) & states$trump_per[i]-0.5<0){
    states$color[i] <- "3 - Between 45 and 50% voted for Trump"
  }
  if(states$trump_per[i]-0.5< (-0.05)){
    states$color[i] <- "4 - Less than 45% voted for Trump"
  }
}

2016 Presidential Election Data

After getting the data into a state where we could start to plot out the percentages, we decided to use ggplot to create a map to show the percentage of voters that voted for Donald Trump in the 2016 election. Since we were focusing on Trump votes, our inital pass focused on how many percent more or less than 50% did Donal Trump receive in 2016 in each particular state. At this point we saw that the differences in many states were fairly low, so the contrast in our image was not ideal. To improve this we created a second plot where we binned the percent of Trump votes into 4 groups - ‘more than 55% for Trump’, ‘between 50% and 55% for Trump’, ‘between 45% and 50% for Trump’ and ‘less than 45% for Trump’. This map provided a much better picture of the broader statewise trends. Click between the two tabs below to see both options.

From these maps, we determined that the percentage of votes per state varied quite drastically across different states and regions of the United States. In the next phase, we thought it would be interesting to see if certain values can be attributed to these differences in voting percentages for Donald Trump.

Percentages

Broader Trend

American Values Survey

Next, we needed to do some data wrangling in order to isolate the responses that we used later for data analysis and mapping. We decided to create a for() loop to create multiple data frames that have the state of the respondent as well as their numerical answer. Additionally, we decided to subset these datasets to only include definitive answers. In other words, we removed responses that indicated “Don’t know” or “Refuse to answer”.

question_list <- c("q17a", "q17b", "q17c", "q17d", "q17e", "q17f", "q17g", "q17h")
df_list <- c("Temp_df_a", "Temp_df_b", "Temp_df_c", "Temp_df_d", "Temp_df_e", "Temp_df_f", "Temp_df_g", "Temp_df_h")
q_num <- c(90,92,91,93,94,95,96,97)

for(i in 1:length(q_num)){
 assign(df_list[i], avs[which(avs[[paste(question_list[i])]]< 5), c(4, q_num[i])])
}

After the data had been tidy’d into temporary dataframes and cleaned up, we were able to use some of the tidyverse functions in order to group and calculate average response values for each question by state. We did this by using the lapply(), group_by(), and summarise_all() functions.

df_state_list <- c("state_q17a", "state_q17b", "state_q17c", "state_q17d", "state_q17e", "state_q17f", "state_q17g", "state_q17h")
dfs <- list(Temp_df_a, Temp_df_b, Temp_df_c, Temp_df_d, Temp_df_e, Temp_df_f, Temp_df_g, Temp_df_h)

grouped_data <- lapply(dfs, function(x){ x %>% group_by(state) %>% summarise_all(mean)})

for(i in 1:length(grouped_data)){
  assign(df_state_list[i], as.data.frame(grouped_data[[i]]))
}

Next, we merged all of the answers to question #17 into one dataframe in order to set up our file for mapping. By then joining these answers to the coordinate and geometric information, we were able to create some maps of answer responses to these questions broken out by U.S. state.

library(openintro)
q17_df <- Reduce(function(x, y) merge(x, y, all=TRUE), list(state_q17a, state_q17b, state_q17c, state_q17d, state_q17e, state_q17f, state_q17g, state_q17h))

q17_df <- q17_df[which(q17_df$state != "HI" & q17_df$state != "AK"),]
q17_df$state <- abbr2state(q17_df$state)
names(q17_df) <- c("ID", "q17a", "q17b", "q17c", "q17d", "q17e", "q17f", "q17g", "q17h")

states <- merge(states,q17_df,by="ID")

Interpretation of American Values Survey Questions

So, after all of the data wrangling we performed, how can we interpret it? We decided to take a look at a few questions that seemed to be pertinent and relevant topics to see if they also varied across states. The first question we looked at was Question 17.

Question 17 – Do you strongly favor, favor, oppose or strongly oppose the following?

A

B

C

D

E

F

G

H

First analysis and regression model

After creating maps for each component of question #17, we could visibly see that some items seemed to draw starker contrasts across states than others. For instance, there seemed to generally be pretty broad support in favor of allowing DACA recipients brought to the U.S. as children to gain legal resident status if they attend college or join the military. However, there seemed to be more disagreement across states for support over withdrawing from the nuclear agreement with Iran or temporarily preventing people from some majority Muslim countries from entering the United States.

At this point we decided to build our first regression model using the parts of question 17 as the variables and the percentage of votes won by Trump as the response.

elec_regr1 <- lm(trump_per ~  q17a + q17b + q17c + q17d + q17e + q17f + q17g + q17h , data = states)
summary(elec_regr1)

## 
## Call:
## lm(formula = trump_per ~ q17a + q17b + q17c + q17d + q17e + q17f + 
##     q17g + q17h, data = states)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.192144 -0.057717 -0.006362  0.040472  0.185307 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.10387    0.35658   0.291   0.7723  
## q17a         0.02848    0.07726   0.369   0.7144  
## q17b         0.05903    0.07537   0.783   0.4381  
## q17c         0.17792    0.06744   2.638   0.0118 *
## q17d        -0.15185    0.07908  -1.920   0.0620 .
## q17e        -0.10146    0.06234  -1.628   0.1115  
## q17f         0.07681    0.05615   1.368   0.1790  
## q17g         0.04865    0.07621   0.638   0.5269  
## q17h         0.07845    0.05163   1.519   0.1366  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08921 on 40 degrees of freedom
## Multiple R-squared:  0.529,  Adjusted R-squared:  0.4348 
## F-statistic: 5.616 on 8 and 40 DF,  p-value: 8.602e-05

The first model resulted in an \(R^{2}\) value of 0.529. Not too shabby for a first pass, but something that could hopefully be improved on. Before moving on to additional variables, we decided to apply some backward elimination to reduce our focus. This resulted in the following model with a slightly lower \(R^{2}\) of 0.5173, but fewer variables and a higher adjusted \(R^{2}\).

elec_regr2 <- lm(trump_per ~  q17c + q17d + q17e + q17f + q17h , data = states)
summary(elec_regr2)

## 
## Call:
## lm(formula = trump_per ~ q17c + q17d + q17e + q17f + q17h, data = states)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.203295 -0.045223 -0.009582  0.043034  0.205947 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.33976    0.20697   1.642   0.1080   
## q17c         0.17803    0.05465   3.258   0.0022 **
## q17d        -0.11308    0.05841  -1.936   0.0595 . 
## q17e        -0.09842    0.05951  -1.654   0.1055   
## q17f         0.08291    0.05149   1.610   0.1147   
## q17h         0.07806    0.04322   1.806   0.0779 . 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0871 on 43 degrees of freedom
## Multiple R-squared:  0.5173, Adjusted R-squared:  0.4612 
## F-statistic: 9.217 on 5 and 43 DF,  p-value: 5.102e-06

At this point we decided to expand our field of variables and look at more questions. The next step was scrubbing the data from questions 22, 24 and 25 and incorporating them into our regression model.

Question 22 – Now as I read a few statements please tell me whether you completely agree, mostly agree, mostly disagree or completely disagree with each one?

A

B

C

D

E

F

G

Adding results from Q22 and Q25

As you can see from above, we created maps for each component of question #22 as well. Again, there seemed to be noticeable contrasts in responses to these value-based questions across states. For instance, there seemed to generally be pretty broad support that people were not bothered when they come in contact with immigrants who speak little or no English. However, there seemed to be more disagreement across states for attribution of the severity of recent natural disasters to global climate change or whether professional athletes should be required to stand during the national anthem at sporting events.

These responses were also helpful when building out our regression models later on in our investigation. We decided to map out one more question from the survey (Q24) and adding the data for another (Q25) before creating another model.

Question 25:

“When you think about what it means to be American, which of the following comes closer to your own view?” (we ignored responses of “neither”, “both” and “Don’t Know/Refused”)

Having a mix of different cultures and values from around the world
Having a single culture grounded in Christian values

Question 24 – Do you think recent stories about women being sexually harassed and assaulted in the workplace are isolated incidents or are they part of a broader pattern of how women are often treated?

Second regression model set

Once we’d aggregated the data from these questions, we decided to take a look at an expanded regression model incorporating our new variables.

elec_regr3 <- lm(trump_per ~  q17c + q17d + q17e + q17f + q17h + q22a + q22b + q22c + q22d + q22e + q22f + q24 + q25, data = states)
summary(elec_regr3)

## 
## Call:
## lm(formula = trump_per ~ q17c + q17d + q17e + q17f + q17h + q22a + 
##     q22b + q22c + q22d + q22e + q22f + q24 + q25, data = states)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.22147 -0.04476  0.01023  0.04069  0.18626 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.60758    0.44145   1.376   0.1775  
## q17c         0.17187    0.06492   2.647   0.0121 *
## q17d        -0.18641    0.09576  -1.947   0.0596 .
## q17e        -0.08579    0.07242  -1.185   0.2441  
## q17f         0.03859    0.06500   0.594   0.5566  
## q17h         0.05032    0.05260   0.957   0.3453  
## q22a        -0.01970    0.04869  -0.404   0.6883  
## q22b         0.02579    0.04730   0.545   0.5890  
## q22c         0.01259    0.05970   0.211   0.8342  
## q22d        -0.03885    0.04599  -0.845   0.4039  
## q22e         0.03615    0.05672   0.637   0.5280  
## q22f         0.03407    0.06094   0.559   0.5797  
## q24         -0.17640    0.17272  -1.021   0.3141  
## q25          0.14444    0.15583   0.927   0.3603  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0888 on 35 degrees of freedom
## Multiple R-squared:  0.5917, Adjusted R-squared:   0.44 
## F-statistic: 3.901 on 13 and 35 DF,  p-value: 0.0006468

Adding our 8 new variables yielded a slight increase in the \(R^{2}\) value (about 0.09), but many of the variables showed high p-values and could therefore be eliminated from the model without significant losses to the \(R^{2}\) value. Our backward elimination process here led to us removing all of the responses to the parts of question 22. It appears that the responses to the parts of that question were not strong predictors of a vote for Trump in 2016, though question 24 did seem to have a significant effect.

elec_regr4 <- lm(trump_per ~  q17c + q17d + q17e + q24 + q25, data = states)
summary(elec_regr4)

## 
## Call:
## lm(formula = trump_per ~ q17c + q17d + q17e + q24 + q25, data = states)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.249979 -0.050020  0.007008  0.050064  0.180692 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  0.80889    0.26212   3.086  0.00354 **
## q17c         0.18846    0.05385   3.500  0.00110 **
## q17d        -0.08684    0.05515  -1.575  0.12267   
## q17e        -0.11323    0.05644  -2.006  0.05113 . 
## q24         -0.33525    0.11133  -3.011  0.00434 **
## q25          0.21083    0.12797   1.647  0.10675   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.08329 on 43 degrees of freedom
## Multiple R-squared:  0.5586, Adjusted R-squared:  0.5072 
## F-statistic: 10.88 on 5 and 43 DF,  p-value: 8.307e-07

At this point we took another look at the questions from the American Values Survey. We decided to avoid questions directly related to Donald Trump and party affiliation due to the potential bias they may have introduced. From what remained we decided to incorporate the responses to question 6 into our model as well. Below are plots of the responses organized by state.

Question 6 – Should the following topic be the highest priority, high but not the highest, or a lower priority?

A

B

C

D

E

F

Incorporating Q6 into the regression

This next regression model incorporated all of the responses to questions 6a-f.

elec_regr5 <- lm(trump_per ~  q17c + q17d + q17e + q24 + q25 + q6a + q6b + q6c + q6d + q6e + q6f, data = states)
summary(elec_regr5)

## 
## Call:
## lm(formula = trump_per ~ q17c + q17d + q17e + q24 + q25 + q6a + 
##     q6b + q6c + q6d + q6e + q6f, data = states)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.165018 -0.042201  0.000819  0.034778  0.206267 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.681799   0.448701   1.519 0.137621    
## q17c         0.197609   0.054766   3.608 0.000953 ***
## q17d        -0.077017   0.064896  -1.187 0.243304    
## q17e        -0.046973   0.066633  -0.705 0.485498    
## q24         -0.158915   0.127393  -1.247 0.220519    
## q25          0.034444   0.138029   0.250 0.804401    
## q6a          0.033999   0.049307   0.690 0.495024    
## q6b          0.149202   0.050256   2.969 0.005366 ** 
## q6c         -0.123089   0.055270  -2.227 0.032474 *  
## q6d         -0.076825   0.061716  -1.245 0.221472    
## q6e         -0.077962   0.045761  -1.704 0.097304 .  
## q6f         -0.004134   0.044065  -0.094 0.925787    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.0752 on 35 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.6951, Adjusted R-squared:  0.5992 
## F-statistic: 7.253 on 11 and 35 DF,  p-value: 2.925e-06

Finally, a respectable \(R^{2}\) value! We cleaned up the model a little by removing some of the less impactful variables and generated some new models based on the results of our regressions analyses. At this point we added the responses to two additional questions:

Q20 - Do you see monuments to Confederate soldiers more as symbols of Southern pride or more as symbols of racism?
Q21 - Do you think most reporters have a personal or political agenda, OR are most reporters trying to report the news fairly and accurately?

Additionally, based on our highest performing model, we calculated predictive values of the percent that voted for Trump in 2016 by state, and took a look at the residuals.

require(ggiraph)
require(ggiraphExtra)

states <- states %>% dplyr::select(ID, trump_per, q17a, q17b, q17d, q17f, q22e, q22f, q6b, q6c, q6e, q6f, q20, q21)

# tested the model based on our highest performing lm that you created
for(i in 1:length(states$ID)){
  states$model[i] <- 0.83630 + 
    (0.07863 * states$q17a[i]) + 
    (-0.08399 * states$q17b[i]) + 
    (-0.14377 * states$q17d[i]) + 
    (0.09759 * states$q17f[i]) + 
    (-0.06089 * states$q22e[i]) + 
    (0.07426 * states$q22f[i]) + 
    (0.14610 * states$q6b[i]) +
    (-0.12655 * states$q6c[i]) +
    (-0.08417 * states$q6e[i]) +
    (-0.04930 * states$q6f[i]) +
    (-0.41984 * states$q20[i]) +
    (0.40825 * states$q21[i])
  
  states$diff[i] <- states$model[i] - states$trump_per[i]
}

model <- lm(formula = trump_per ~ q17a + q17b + q17d + q17f+ q22e+ q22f+ q6b+ q6c+ q6e+ q6f + q20+ q21, data = states)
summary(model)

## 
## Call:
## lm(formula = trump_per ~ q17a + q17b + q17d + q17f + q22e + q22f + 
##     q6b + q6c + q6e + q6f + q20 + q21, data = states)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.125656 -0.033505 -0.005109  0.027585  0.107003 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.50031    0.29191   1.714 0.095643 .  
## q17a         0.09861    0.04776   2.065 0.046642 *  
## q17b        -0.08374    0.05652  -1.482 0.147676    
## q17d        -0.11875    0.05411  -2.194 0.035132 *  
## q17f         0.11878    0.04961   2.394 0.022312 *  
## q22e        -0.06999    0.03854  -1.816 0.078203 .  
## q22f         0.08089    0.04001   2.022 0.051139 .  
## q6b          0.16623    0.03726   4.462 8.47e-05 ***
## q6c         -0.11416    0.04553  -2.508 0.017096 *  
## q6e         -0.08910    0.03589  -2.482 0.018146 *  
## q6f         -0.01926    0.03247  -0.593 0.556898    
## q20         -0.30816    0.07645  -4.031 0.000296 ***
## q21          0.36764    0.10182   3.611 0.000973 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05765 on 34 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.8259, Adjusted R-squared:  0.7645 
## F-statistic: 13.44 on 12 and 34 DF,  p-value: 1.33e-09

step(model, direction = "both", trace=FALSE )

## 
## Call:
## lm(formula = trump_per ~ q17a + q17b + q17d + q17f + q22e + q22f + 
##     q6b + q6c + q6e + q20 + q21, data = states)
## 
## Coefficients:
## (Intercept)         q17a         q17b         q17d         q17f         q22e         q22f          q6b          q6c          q6e          q20          q21  
##     0.41340      0.10186     -0.08226     -0.10941      0.11533     -0.06833      0.07729      0.16358     -0.11869     -0.08723     -0.28826      0.36884

Using backward elimination, we decided to remove one extra variable from the model to boost the \(R^{2}\) value a little more.

# tested the model again
model <- lm(formula = trump_per ~ q17a + q17b + q17d + q17f+ q22e+ q22f+ q6b+ q6c+ q6e + q20+ q21, data = states)
summary(model)

## 
## Call:
## lm(formula = trump_per ~ q17a + q17b + q17d + q17f + q22e + q22f + 
##     q6b + q6c + q6e + q20 + q21, data = states)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.12519 -0.02898 -0.00432  0.02730  0.10780 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.41340    0.25014   1.653 0.107344    
## q17a         0.10186    0.04700   2.167 0.037129 *  
## q17b        -0.08226    0.05594  -1.470 0.150364    
## q17d        -0.10941    0.05129  -2.133 0.040005 *  
## q17f         0.11533    0.04881   2.363 0.023825 *  
## q22e        -0.06833    0.03808  -1.794 0.081443 .  
## q22f         0.07729    0.03918   1.973 0.056482 .  
## q6b          0.16358    0.03665   4.464 8.01e-05 ***
## q6c         -0.11869    0.04446  -2.669 0.011440 *  
## q6e         -0.08723    0.03542  -2.463 0.018857 *  
## q20         -0.28826    0.06806  -4.235 0.000157 ***
## q21          0.36884    0.10085   3.657 0.000831 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05711 on 35 degrees of freedom
##   (2 observations deleted due to missingness)
## Multiple R-squared:  0.8241, Adjusted R-squared:  0.7688 
## F-statistic: 14.91 on 11 and 35 DF,  p-value: 3.914e-10

for(i in 1:length(states$ID)){
  states$model_two[i] <- 0.41340 + 
    (0.10186 * states$q17a[i]) + 
    (-0.08226 * states$q17b[i]) + 
    (-0.10941 * states$q17d[i]) + 
    (0.11533 * states$q17f[i]) + 
    (-0.06833 * states$q22e[i]) + 
    (0.07729 * states$q22f[i]) + 
    (0.16358 * states$q6b[i]) +
    (-0.11869 * states$q6c[i]) +
    (-0.08723 * states$q6e[i]) +
    (-0.28826 * states$q20[i]) +
    (0.36884 * states$q21[i])
  
  states$diff_two[i] <- states$model_two[i] - states$trump_per[i]
}

After calculating the predicted values using the updated model, we noticed that we still had some percent total calculations for some states that were much higher than the actual values. To adjust for this, we decided to remove these states and re-run the model one final time.

## the residuals show that these were very far from 0, so decided to remove them to see if it would improve the model
states_out <- states %>% 
  filter(ID != "California" & ID != "Maryland" & ID != "Oklahoma" & ID != "Montana" & ID != "North Dakota")

## last model
model_three <- lm(formula = trump_per ~ q17a + q17b + q17d + q17f+ q22e+ q22f+ q6b+ q6c+ q6e + q20+ q21, data = states_out)
summary(model_three)

## 
## Call:
## lm(formula = trump_per ~ q17a + q17b + q17d + q17f + q22e + q22f + 
##     q6b + q6c + q6e + q20 + q21, data = states_out)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.06845 -0.02835 -0.01096  0.03545  0.08189 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.35615    0.20578   1.731  0.09313 .  
## q17a         0.08077    0.03892   2.075  0.04608 *  
## q17b        -0.07477    0.04593  -1.628  0.11336    
## q17d        -0.12467    0.04261  -2.926  0.00627 ** 
## q17f         0.12577    0.04121   3.052  0.00455 ** 
## q22e        -0.08823    0.03179  -2.776  0.00913 ** 
## q22f         0.06634    0.03237   2.049  0.04869 *  
## q6b          0.15742    0.03008   5.233 1.01e-05 ***
## q6c         -0.09137    0.03711  -2.462  0.01938 *  
## q6e         -0.06588    0.02989  -2.204  0.03484 *  
## q20         -0.26037    0.05611  -4.640 5.64e-05 ***
## q21          0.42384    0.08423   5.032 1.81e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.04671 on 32 degrees of freedom
## Multiple R-squared:  0.8778, Adjusted R-squared:  0.8357 
## F-statistic: 20.89 on 11 and 32 DF,  p-value: 1.452e-11

for(i in 1:length(states_out$ID)){
  states_out$model_three[i] <- 0.35615 + 
    (0.08077 * states_out$q17a[i]) + 
    (-0.07477 * states_out$q17b[i]) + 
    (-0.12467 * states_out$q17d[i]) + 
    (0.12577 * states_out$q17f[i]) + 
    (-0.08823 * states_out$q22e[i]) + 
    (0.06634 * states_out$q22f[i]) + 
    (0.15742 * states_out$q6b[i]) +
    (-0.09137 * states_out$q6c[i]) +
    (-0.06588 * states_out$q6e[i]) +
    (-0.26037 * states_out$q20[i]) +
    (0.42384 * states_out$q21[i])
  
  states_out$diff_three[i] <- states_out$model_three[i] - states_out$trump_per[i]
}

model_data <- as.data.frame(states_out) %>% dplyr::select(ID, trump_per, model_three, diff_three)

With our final model yielding a relatively strong \(R^{2}\), we mapped out the residuals between the actual and predicted values for states with enough data:

Model against data for all of the USA

Finally, we were able to run our model against the data as a whole and come up with an estimate of what the popular vote comes out to for Donald Trump for the 2016 election:

## [1] 0.4895869

48.95% is not so far off from the 46.1% Trump got in the 2016 Presidential Election (https://en.wikipedia.org/wiki/2016_United_States_presidential_election)

Discussion

When we started building our regression model, it was interesting to see that certain values and questions from the survey seemed to be more predictive of the amount of votes for Trump than others. Here are the questions and their corresponding values that made it into our final model:

Do you strongly favor, favor, oppose, or strongly oppose…

Allowing gay and lesbian couples to marry legally?
Allowing a small business owner in your state to refuse to provide products or
services to gay or lesbian people, if doing so violates their religious beliefs?
Building a wall along the U.S. border with Mexico?
Allowing illegal immigrants brought to the U.S. as children to gain legal resident status if they join the military or go to college?

Do you completely agree, mostly agree, mostly disagree, or completely disagree…

America today sets a good moral example for the world?
Professional athletes should be required to stand during the national anthem at sporting events?

Do you think that the following should be the highest priority, high but not highest priority, or lower priority?…

Enacting stricter gun control laws.
Reducing health care costs.
Changing the federal income tax system to make it more fair.

Do you see monuments to Confederate soldiers more as a symbol of Southern pride or more as symbols of racism?

Do you think most reporters have a personal or political agenda, OR are most reporters trying to report the news fairly and accurately?

As you can see, it is interesting that these value-based questions seemed to span a wide range of topics, and didn’t focus solely on one polarizing value. This seems to indicate that there truly was strong polarization across many different types of values across the country during the time of the 2016 election.

Conclusion

Elections often come down to the wire and are decided by a relatively small number of votes. Candidates spend millions of dollars on polling to get an idea of just how tight these margins are. We aimed to create a model that could predict polling preferences based on a select number of questions pertaining to major policy points and diversity. From a statistical point of view, the final result was a fairly strong model with an \(R^{2}\) above 0.85. This is a prime example, however, of a case where the margins in the real world are much finer than in the mathematical world. The residuals of our model at times approached 0.1, which would mean a miscalculation of the amount of votes Trump would get by 10%. 10% is the difference between a close race and a landslide.

While working through this process, we were able to examine values that seemed to be quite predictive of favoring votes for a particular candidate. For instance, it is interesting that responses to questions related to gun control, health care, our tax system, immigration, and building a wall along the southern border with Mexico all made it into our model. These topics have been at the forefront of the Trump campaign and his time in office.

When thinking about building out a similar model to predict the votes garnered by Hillary Clinton, we could re-examine these values that showed to be quite predictive of votes for Trump, and see if values outside the scope of this project could be added in later on that tend to hold more weight among left-leaning voters than right-leaning voters.

As mentioned at the outset, this election proved to be a very polarizing one, with people claiming at times that they would vote against a particular candidate instead of voting for one. The long list of values that make up our final model prove that this is true, given that there were ample opportunities for voters to project why they felt their candidate was the right choice to lead the United States.

Addendum – Utilizing JSON Data for this Project

To utilize more types of data for our project, we also decided to make a webpage that shows a representation of the performance of our model relative to the actual percentage of votes for Trump by state in the 2016 election.

This webpage was made using a Javascript framework (Angular), and the visualization was created using d3.js – a Javascript charting library.

We converted the data to json format utilizing the rjson() library:

library(rjson)
library(jsonlite)

webdata <- toJSON(unname(split(model_data, 1:nrow(model_data))))

# # to view the json data uncomment below
# prettify(webdata)

Then, we were able to utilize this dataset to create a d3.js graphic and webpage. Here is the final product.