First we read the comma-separated data file into R as a dataframe, let's call it “nes”. Make sure it is saved inside of whatever folder you have designated in R as your “working directory.”
nes <- read.csv("anes2008.csv")
In RStudio, you'll know it worked if “nes” pops up in your “Workspace” area. In plain R, it's less obvious. You can find the data editor in the drop-down menu at the time, or to check what's in your workspace, just type:
ls()
## [1] "nes"
Now, to begin, let's summarize the variable respondents' “education” in our dataframe “nes”. It's a factor (or categorical) variable, so it displays frequencies within each factor rather than mean, median, etc.
summary(nes$education)
## 0. No degree earned
## 460
## 1. Bachelor's degree
## 333
## 2. Master's degree
## 121
## 3. PhD, LIT, SCD, DFA, DLIT, DPH, DPHIL, JSC, SJD
## 16
## 4. LLB, JD
## 5
## 5. MD, DDS, DVM, MVSA, DSC, DO
## 8
## 6. JDC, STD, THD
## 2
## 7. Associate degree (AA)
## 260
## NA's
## 1117
But the feeling thermometer for attitude toward big business, the variable “bigbiz” in our dataframe “nes”, is numeric, so we do get mean, median, mode, etc.
summary(nes$bigbiz)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 40.0 50.0 55.3 70.0 100.0 259
Get the standard deviation also.
sd(nes$bigbiz, na.rm = TRUE)
## [1] 22.58
Visualize the whole distribution of attitudes toward big business by generating a boxplot.
boxplot(nes$bigbiz)
Alternatively, visualize this distribution using a histogram.
hist(nes$bigbiz)
Alternatively alternatively, visualize the distribution using a kernel density plot.
plot(density(nes$bigbiz, na.rm = TRUE))
Perhaps most usefully and definitely most attractively, let's use juxtaposed violin plots to compare how people feel about poor people to how they feel about people on welfare.
library(wvioplot)
## Loading required package: Hmisc
## Loading required package: survival
## Loading required package: splines
## Hmisc library by Frank E Harrell Jr
##
## Type library(help='Hmisc'), ?Overview, or ?Hmisc.Overview') to see overall
## documentation.
##
## NOTE:Hmisc no longer redefines [.factor to drop unused levels when
## subsetting. To get the old behavior of Hmisc type dropUnusedLevels().
## Attaching package: 'Hmisc'
## The following object(s) are masked from 'package:survival':
##
## untangle.specials
## The following object(s) are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
par(mfrow = c(1, 2)) #these are graphical parameters for 1 row, 2 columns
wvioplot(nes$poorppl, col = "magenta", names = "Poor People") #violin plot
wvioplot(nes$welfareppl, col = "cyan", names = "People on Welfare") #violin plot
Now, let's explore the relationship between two variables. As an easy example, there are obvious reasons we might expect there to be a positive relationship between feelings toward Christians and Christian fundamentalists. If someone really likes Christians in general, they probably feel warmer toward Christian fundamentalists than those who have cold feelings toward Christians in general. We generate the correlation coefficient, r, with
cor(nes$christians, nes$cfundamentals, use = "complete.obs")
## [1] 0.5477
We see there is a positive correlation, although it's not extremely strong. We have better tools for deciding what's a systematic relationship and what's just random noise, but the little r isn't one of them. All we can say is that, yea, sure, the variation of one variable moves somewhat in the same direction as the other variable… but we can't say too much more than that. Not yet…
Now we load the package “car”, which you'll need to install if you haven't already. This package let's us make a real fancy scatterplot to visualize the correlation. We'll learn more about what's going on here later…
library(car)
## Warning: package 'car' was built under R version 2.15.1
## Loading required package: MASS
## Loading required package: nnet
## Attaching package: 'car'
## The following object(s) are masked from 'package:Hmisc':
##
## recode
scatterplot(nes$christians, nes$cfundamentals)