Statistical inference with the GSS data

Setup

Load packages

Load data

Since 1972, the General Social Survey (GSS) has been monitoring societal change and studying the growing complexity of American society. The GSS aims to gather data on contemporary American society in order to monitor and explain trends and constants in attitudes, behaviors, and attributes; to examine the structure and functioning of society in general as well as the role played by relevant subgroups; to compare the United States to other societies in order to place American society in comparative perspective and develop cross-national models of human society; and to make high-quality data easily accessible to scholars, students, policy makers, and others, with minimal cost and waiting. http://www.norc.org/Research/Projects/Pages/general-social-survey.aspx

The data we will be using is an extract of the General Social Survey (GSS) Cumulative File 1972-2012 set aside for the goal of providing a convenient data resource for students learning statistical reasoning using the R language. The data is available for download for students of coursera taking the inferenctial statistics course.

load("gss.RData")

We find that the dataset provided consists of 57061 rows and 114 column variables. In order not to distract the reader from the analysis, I have placed all the codes in the appendix section in the interest of reproducible research and as a learning tool for others who have little background in computer programming like myself. in some instances the codes are visible to provide understanding of the output.

Part 1: Data

When examining our data, we must be mindful that random allocation was not performed. A survey is an obsevational study. Causality cannot be inferred as the conditions present does not afford an experimental design that create equal conditions for a control group and an experimental group.

The GSS is an area-probability sample that uses the National Opinion Research Center (NORC) National Sampling Frame for an equal probability multi-stage cluster sample of housing units for the entire United States. The results of the survey can therefore be generalized to the general population. To develop good research questions let us take at the look at data briefly.

We now examine the data provided for trends in education

The plot above shows the declining confidence in educational institutions from 1972 to 2012. There was a 10% decline among those who say they have a great deal of confidence. This is mirrored by a reciprocal, almost symmetrical increase in those who say they hardly have any confidence in educational institutions. The number of respondents who say that they only have some confidence have remained steady around 55-60%.

On the other hand, there is an increasing trend in respondents who believe that too little money is spent by the government for education. Rising from about 50% in 1972 to almost 75% in 2012. Those who think that the amount of spending for education is about right also decreased from 40% to less than 20%. Those who think spending for education is too much remained steady around 10%.

It seems that the decline in confidence in educational institutions is associated with the rise in the belief that Government spending for education is too little.

We will limit our analysis to the following variables:

Brief Desription of Variables in our subset data frame

Table 1. selected variables

Variable	Question	DataType
`year`	GSS year for this respondent	Numeric
`coninc`	Total family income in constant dollars	Numeric
`coneduc`	confidence in educational institutions	Factor
`nateduc`	spending to improve nation’s education system	Factor

While it is tempting to analyze the data across the years to have more observations to work with, it might introduce errors in the analysis when data from the early years are lumped together with data from recent years. Surveys are like snapshots in time and therefore the results are unique for the particular time period the survey was taken. Furthermore, according to the website of gss, survey methodology changed across the years. We will limit our analysis to the year 2012.

The data from the year 2012 show that the moderates, or those who are in the middle of the two ends, dominate the extreme views with regard to confidence in educational institutions.

In terms of spending for education, the extreme view that too little money is spent by government for education is the dominant group. Its dominance is more marked compared to the dominance of the moderate view with regard to confidence in educational institutions.

We briefly take a look at the distribution of constant income among the groups of responses to the spending for education query.

We see that constant income seems to have roughly the same distribution among the different opinions on spending for education. The red line representing the respective means of each category.

Part 2: Research question

Education has been touted as an avenue to escape poverty. It has been referred to as an inheritance that parents can bequeath to their children which cannot be stolen or reduced by decay. However, when access to good education is restricted by financial capacity, it becomes yet another structure that promotes oppression and inequality in society.

This is why the efforts of those behind Coursera is praiseworthy because of the almost quixotic ideals that it fosters and the unbelievable courage it has for taking the challenge of reversing the trend in education not only in the United States but in the whole world.

We ask the questions:

Are opinions on government spending for education associated with mean total family income.
Are levels of confidence in educational institutions and opinions on whether enough money is spent on education independent.

Part 3: Exploratory data analysis

## # A tibble: 3 x 8
##       nateduc  sum_inc mean_inc   sd_inc max_inc iqr_inc n_inc median_inc
##        <fctr>    <int>    <dbl>    <dbl>   <int>   <dbl> <int>      <int>
## 1  Too Little 21219756 47050.46 44509.77  178712   46917   451      34470
## 2 About Right  3534847 36441.72 37501.02  178712   41172    97      21065
## 3    Too Much  2678918 56998.26 51350.20  178712   56971    47      42130

We compare the summary statistics for total family income among the different opinions on spending for education. There is a large discrepancy in the number of observations per group, with the group too little having 4-10 times the number of observations compared to the other groups.

We visually compare the spread of observations among the different degrees of spending on education by comparing the size of the box in the boxplots which represents the inter-quartile range. The inter-quartile range of those with a great deal of confidence appears slightly larger compared to the other two. Outliers are present in all three and appear to be of the same magnitude.

Although the point estimates for the means are quite separated from each other the respective 95% confidence intervals show considerable overlap. We next perform a statistical inference to determine if the means are really different.

Part 4: Inference

To answer our questions we set up a hypothesis test.

Hypothesis testing for question 1

Our null hypothesis: The mean total family income is the same among those who believe that there is: too little, about right, and too much spending for education.

Our alternative hypothesis: The mean total constant family income is different in at least one group among those who believe that there is: too little, about right, and too much spending for education.

Since we are dealing with a numeric response variable and a categorical dependent variable with 3 levels, we will use anova or analysis of variance to test our hypothesis. The ANOVA is the preferred approach for our data because it controls for so-called “study-wide error rate”. Multiple testing increase the chance of finding a statistically significant finding by chance.

let us see if the conditions for performing anova are first met.

Conditions for Anova

Independence - Since random sampling without replacement was performed, we can assume the requirement for independence both within and between groups are met. The data is not paired.
Approximate normality- distributions should be nearly normal within each group. The histogram of constant income above for each of the groups is moderately right skewed. Our sample size is comparatively small to overcome the skewness of the data.We will however proceed with the test for the sake of this exercise.
Equal variance - Groups should have roughly equal variability. Based on the boxplots of the different groups, this condition is roughly met.

We use the aov function in R to perform the anova test.

##              Df    Sum Sq   Mean Sq F value Pr(>F)  
## nateduc       2 1.504e+10 7.519e+09   3.878 0.0212 *
## Residuals   592 1.148e+12 1.939e+09                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion for the Anova Test

The resulting pvalue of our anova test is less than the 5% significance level. We therefore reject the null hypothesis. Conclusion: The data provide convincing evidence that at least one pair of population means is different among the groups: too little, about right, and too much spending for education.

we need to perform multiple testing to test which among the means are different. First we have to adjust our significance level. With each statistical test that we perform, the probability that we find a statistically significant effect that is due to chance is expected to occur in 5% of all cases.

Conditions for the t-test

Independence - Since random sampling without replacement was performed, we can assume the requirement for independence both within and between groups are met. The data is not paired.
Sample size/skew - n ??? 30, The number of observations in each group is greater than 30. However, based on the histogram we saw earlier the distribution each group of observations is moderately right-skewed. This condition is not met.

By performing many tests, we greatly increase the chances of finding at least one statistically significant result in our data that, in fact, is a chance result. To reduce the possibility of this error we adjust our significance level.

We compute our adjusted significance level from what was taught in class.

number_of_tests <- (3*2)/2
number_of_tests

## [1] 3

new_significance_level <- 0.05/number_of_tests
new_significance_level

## [1] 0.01666667

round(new_significance_level, 3)

## [1] 0.017

We compute for the number of tests we need to perform by multiplying the number of factors in spending for education by the same number less 1 and divide the result by 2.

Next we divide our original significance level of 0.05 by the number of test we will perform to find our adjusted significance level.

The results of our t-test are shown in the table below. With an adjusted significance level of 0.017, we find two pvalues that are barely smaller than our adjusted significance level. Rounding these numbers may cause inaacuracy in our computation.

##                  comparison    P.Value   lower_int  mean_diff   upper_int
## 1 Too Little vs About Right 0.01574538    125.4894  10608.735 21091.98087
## 2    Too Little vs Too Much 0.20642813 -29110.5374  -9947.799  9214.94028
## 3   About Right vs Too Much 0.01691875 -41097.2885 -20556.534   -15.77881

Comparing the means between Too Little vs About Right

null hypothesis: The population mean total family income in the group too little and the group about right are equal.

alternative hypothesis: The population mean total family income in the group too little and the group about right are not equal.

conclusion: If the there was no difference in population mean between the group too little and the group about right the chances of getting a mean difference of 10608.74 (or a difference more extreme) is 16 in 1,000. Since our p-value of 0.016 is less than our significance level of 0.017, we reject the null hypothesis. The mean of the group too little is significantly greater than the mean of the group too much.

The confidence interval does not contain the null value of 0. We are 95% confident that the difference between the mean total family income in the group too little and the group about right based on the same sample size is between 125.4894 and 21091.98087 95% of the time. The confidence interval and the p-value are in agreement.

Comparing the means between Too Little vs Too Much

null hypothesis: The population mean total family income in the group too little and the group too much are equal.

alternative hypothesis: The population mean total family income in the group too little and the group too much are not equal.

conclusion: If the there was no difference in population mean between the group too little and the group too much the chances of getting a mean difference of -9947.8 (or a difference more extreme) is 206 in 1,000. Since our p-value of 0.206 is greater than our significance level of 0.017, we fail to reject the null hypothesis. The mean of the group too little is not significantly different from the mean of the group too much.

The confidence interval contain the null value of 0. We are 95% confident that the difference between the mean total family income in the group too little and the group too much based on the same sample size is between -29110.5374 and 9214.94028 95% of the time. Because the interval contain the null value of 0, it is possible that the difference is not statistically significant. The confidence interval and the p-value are in agreement.

Comparing the means between About Right vs Too Much

null hypothesis: The population mean total family income in the group about right and the group too much are equal.

alternative hypothesis: The population mean total family income in the group about right and the group too much are not equal.

conclusion: If the there was no difference in population mean between the group about right group and the group too much the chances of getting a mean difference of -20556.53 (or a difference more extreme) is 169 in 10,000. Since our p-value of 0.0169188 is than our significance level of 0.017, we reject the null hypothesis. The mean of the group about right is significantly less than the mean of the group too much.

The confidence interval does not contain the null value of 0. We are 95% confident that the difference between the mean total family income in the group about right and the group too much based on the same sample size is between -41097.2885 and -15.77881 95% of the time. The confidence interval and the p-value are in agreement.

It is important to point out to the reader that if we had rounded the p-value 0.0169188 to 0.017 our p-value would be equal to our significance level of `r round(new_significance_level, 3). We will therefore fail to reject the null hypothesis in this instance and our p-value will be in disagreement with our confidence interval.

The number of observations for the group too much was only 47 and the distribution is moderately skewed. The failure to reject the null, given that our p-value was equal to the null value could have been due to the lack of power to reject the null due to the inadequate number of observations, adjusting the p-value to decrease the possibiliy of a type 1 error, and error in rounding the results to three significant figures. We might have committed a type 2 error in comparisons involving the group too much.

Calculating for power

Following what was taught in class, we compute for the number of observations that are needed to come up with a test with 80 % power.

n <- ((qnorm(0.8) + 1.96)^2/ci$mean_diff[3]^2) *(sd(about_right$coninc)^2 + sd(too_much$coninc)^2)
n

## [1] 75.10018

We need at least 76 observations.

The plot above show how wide our confidence intervals are and two of of them are very close to the null value of 0.

Another way to compare the means for the levels of a factor in is to use the function tukeyHSD or Tukey Honest Significant Differences in R. According to the help file in R, John Tukey introduced intervals based on the range of the sample means rather than the individual differences. The intervals returned by this function are based on this Studentized range statistics.

The intervals constructed in this way would only apply exactly to balanced designs where there are the same number of observations made at each level of the factor. This function incorporates an adjustment for sample size that produces sensible intervals for mildly unbalanced designs.

par(mar=c(5,10,4,2))
TukeyHSD(aovfit, conf.level = 0.983)

##   Tukey multiple comparisons of means
##     98.3% family-wise confidence level
## 
## Fit: aov(formula = coninc ~ nateduc, data = data_2012)
## 
## $nateduc
##                              diff         lwr      upr     p adj
## About Right-Too Little -10608.735 -24149.3905  2931.92 0.0804091
## Too Much-Too Little      9947.799  -8596.0988 28491.70 0.3042208
## Too Much-About Right    20556.534   -945.0275 42058.09 0.0239798

plot(TukeyHSD(aovfit, conf.level = 0.983), las=1, col.axis = "dark blue", col = "red", ylim = c(0, 4))

Based on result of the tukeyHSD function, the difference in the means between the different groups are not statistically significant given an adjusted significance level of 0.017. The confidence intervals that was generated for all the difference in means among the differnt groups contain the null value of 0.

To further practice what we have learned from this course, we also ask the question whether levels of confidence in educational institutions are independent of opinions on whether enough money is spent on education.

Hypothesis testing for question 2

null hypothesis: Levels of confidence in educational institutions are independent of opinions on whether enough money is spent on education.

alternative hypothesis: Levels of Confidence in educational institutions are not independent of opinions on whether enough money is spent on education.

Conditions for the Chi-Squared Test

Since we are dealing with a two categorical variables with 3 levels, we will use the chi-squared test of independence to test our hypothesis. The chi-squared test quantify how different the observed counts are from the expected counts. large deviations from what would be expected based on sampling variation (chance) alone provide strong evidence for the alternative hypothesis.

The conditions for the chi-squared test are:

1.Independence - Since random sampling was performed and sampling without replacement was performed, we can assume the requirement for independence is met. Each observation contributes to only one cell in the table.

Sample size - Each cell contains more than 5 expected observationsas seen below. This condition is also met.

##               
##                Too Little About Right Too Much Sum
##   A Great Deal         96          42        5 143
##   Only Some           272          47       21 340
##   Hardly Any           83           8       21 112
##   Sum                 451          97       47 595

##               
##                 Too Little About Right    Too Much         Sum
##   A Great Deal 0.161344538 0.070588235 0.008403361 0.240336134
##   Only Some    0.457142857 0.078991597 0.035294118 0.571428571
##   Hardly Any   0.139495798 0.013445378 0.035294118 0.188235294
##   Sum          0.757983193 0.163025210 0.078991597 1.000000000

##          Groups Too.Little About.Right Too.Much
## 1  A Great Deal        108          23       11
## 2     Only Some        258          55       27
## 3    Hardly Any         85          19        9

We prepared the table containing the observations, the percentages, and the expected observations as seen above.

df <- (3-1)*(3-1)
df

## [1] 4

chi_2 <- ((96-108)^2/108) + ((272-258)^2/258) + ((83-85)^2/85) + ((42-23)^2/23) + ((47-55)^2/55) + ((8-19)^2/19) + ((5-11)^2/11) + ((21-27)^2/27) + ((21-9)^2/9)
chi_2

## [1] 45.97385

pchisq(chi_2, df, lower.tail = FALSE)

## [1] 2.493902e-09

We computed the degrees of freedom (df) and the chi-squared (chi_2) statistic as taught in class. We compute our p-value at a 5% significance level using the pchisq function in R.

The large deviations from what would be expected based on sampling variation (chance) provide strong evidence for the alternative hypothesis.

Conclusion for the Chi-Squared Test

Since our p-value 2.493902210^{-9} is less than 0.05, we reject the null hypothesis. The data provide convincing evidence that the observed counts are different from the expected counts.

Confidence in educational institutions and opinions whether enough money is spent on education are dependent.

## 
##  Pearson's Chi-squared test
## 
## data:  data_2012$coneduc and data_2012$nateduc
## X-squared = 45.757, df = 4, p-value = 2.767e-09

We compare our manually computed pvalue 2.493902210^{-9} with the result of the chisq.test function in R 2.767269610^{-9} above and the results are quite similar.

References

Smith, Tom W., Michael Hout, and Peter V. Marsden. General Social Survey, 1972-2012 [Cumulative File]. ICPSR34802-v1. Storrs, CT: Roper Center for Public Opinion Research, University of Connecticut /Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributors], 2013-09-11. doi:10.3886/ICPSR34802.v1[http://doi.org/10.3886/ICPSR34802.v1](http://doi.org/10.3886/ICPSR34802.v1)

Appendix

Codes

code for global options

knitr::opts_chunk$set(warning=FALSE, message=FALSE)

codes for loading the packages

library(ggplot2)
library(gplots)
library(dplyr)
library(statsr)

code for loading the data

load("gss.RData")

codes for part 1

by_coneduc <- gss %>% filter(year != "NA", coneduc != "NA") %>% group_by(year,coneduc) %>% summarise(n_coneduc = n())
byconeduc_year <- gss %>% filter(year != "NA", coneduc != "NA") %>% group_by(year) %>% select(year, coneduc) %>% summarise(total_confid = n())
merged_coneduc <- merge(byconeduc_year, by_coneduc) %>% mutate(perc_conf = n_coneduc/total_confid) %>% mutate(perc_conf = round(perc_conf*100, 1))
merged_coneduc %>% ggplot(aes(x = year, y = perc_conf, colour = coneduc, shape = coneduc)) + geom_line(size=1) + geom_point(size=2) + ylab("Percent") + ggtitle("Confidence in Educational Institutions") +  scale_shape_discrete(name="Response", breaks= c("A Great Deal", "Only Some", "Hardly Any"), labels= c("A Great Deal", "Only Some", "Hardly Any")) + scale_colour_discrete(name="Response", breaks= c("A Great Deal", "Only Some", "Hardly Any"), labels= c("A Great Deal", "Only Some", "Hardly Any"))

by_nateduc <- gss %>% filter(year != "NA", nateduc != "NA") %>% group_by(year,nateduc) %>% summarise(n_nateduc = n())
bynateduc_year <- gss %>% filter(year != "NA", nateduc != "NA") %>% group_by(year) %>% select(year, nateduc) %>% summarise(total_confid = n())
merged_nateduc <- merge(bynateduc_year, by_nateduc) %>% mutate(perc_conf = n_nateduc/total_confid) %>% mutate(perc_conf = round(perc_conf*100, 1))
merged_nateduc %>% ggplot(aes(x = year, y = perc_conf, colour = nateduc, shape = nateduc)) + geom_point(size=2) + geom_line(size=1) + ylab("Percent") + ggtitle("Are we spending enough to Improve the Educational System") +  scale_shape_discrete(name="Response", breaks= c("Too Little", "About Right", "Too Much"), labels= c("Too Little", "About Right", "Too Much")) + scale_colour_discrete(name="Response", breaks= c("Too Little", "About Right", "Too Much"), labels= c("Too Little", "About Right", "Too Much"))

merged_coneduc <- merged_coneduc %>% filter(year == 2012) %>% mutate(category = rep("confidence", 3)) %>% rename(response = coneduc, n = n_coneduc)
merged_nateduc <- merged_nateduc %>% filter(year == 2012) %>% mutate(category = rep("spending", 3)) %>% rename(response = nateduc, n = n_nateduc)
educ_df <-  rbind(merged_coneduc, merged_nateduc)
educ_df %>% group_by(category) %>% ggplot(aes(y = perc_conf, x = response, fill = response)) + geom_bar(stat = "identity") + ylab("Percent") + xlab("Response") + ggtitle("Confidence and Spending in Education in 2012") +  scale_fill_hue(name="Response") + facet_wrap( ~ category) + theme(axis.ticks = element_blank(), axis.text.x = element_blank())

rm(by_nateduc, bynateduc_year, merged_nateduc, by_coneduc, byconeduc_year, merged_coneduc)

mean_income <- gss %>% filter(year != "NA", coninc != "NA", nateduc != "NA") %>% filter(year == 2012) %>% select(coninc, nateduc) %>% group_by(nateduc) %>% summarise(mean_inc = mean(coninc))
gss %>% select(coninc, nateduc, year) %>% filter(year != "NA", coninc != "NA", nateduc != "NA") %>%  filter(year == 2012) %>%  ggplot(aes(x = coninc, fill = nateduc)) + geom_histogram(binwidth= 18500, colour = "black", aes(y = ..density..)) + facet_wrap( ~ nateduc) + geom_vline(aes(xintercept = mean_inc), mean_income, colour = "red") + scale_colour_discrete(name="Response", breaks= c("Too Little", "About Right", "Too Much"), labels= c("Too Little", "About Right", "Too Much"))

codes for part 3

data_2012 <-gss %>% filter(year != "NA", nateduc != "NA", coninc != "NA", coneduc != "NA") %>% filter(year == 2012) %>% select(coneduc, nateduc, coninc, year)
data_2012 %>% group_by(nateduc) %>%summarise(sum_inc = sum(coninc), mean_inc = mean(coninc), sd_inc = sd(coninc), max_inc = max(coninc), iqr_inc = IQR(coninc), n_inc = n(), median_inc = median(coninc))

data_2012 %>% ggplot(aes(x=nateduc, y=coninc, fill=nateduc)) + geom_boxplot() + ylab("Total family income in constant dollars") + xlab("Response") + ggtitle("Family Income VS Spending for Education") +  scale_fill_hue(name="Response")

plotmeans(coninc~nateduc, connect= 1, col="red", main = "Plot of means with 95% CIs", xlab="levels of confidence", ylab="Total family income in constant dollars", mean.labels = TRUE, ci.label = TRUE, ylim = c(20000, 80000), cex = 0.75, data = data_2012)

codes for part 4

aovfit <- aov(coninc~nateduc, data = data_2012)
summary.aov(aovfit)

number_of_tests <- (3*2)/2
number_of_tests
new_significance_level <- round(0.05/number_of_tests, 3)
new_significance_level

too_little <- data_2012 %>% filter(nateduc == "Too Little") %>% select(coninc)
about_right <- data_2012 %>% filter(nateduc == "About Right") %>% select(coninc)
too_much <- data_2012 %>% filter(nateduc == "Too Much") %>% select(coninc)
tl_ar <- t.test(too_little$coninc, about_right$coninc, alternative = "two.sided", mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.983)
tl_tm <- t.test(too_little$coninc, too_much$coninc, alternative = "two.sided", mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.983)
ar_tm <- t.test(about_right$coninc, too_much$coninc, alternative = "two.sided", mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.983)
comparison <- c("Too Little vs About Right", "Too Little vs Too Much", "About Right vs Too Much")
P.Value <- c(tl_ar$p.value, tl_tm$p.value, ar_tm$p.value)
lower_int <- c(tl_ar$conf.int[1], tl_tm$conf.int[1], ar_tm$conf.int[1])
mean_diff <- c(tl_ar$estimate[1]-tl_ar$estimate[2], tl_tm$estimate[1]-tl_tm$estimate[2], ar_tm$estimate[1]-ar_tm$estimate[2])
upper_int <- c(tl_ar$conf.int[2], tl_tm$conf.int[2], ar_tm$conf.int[2])
ci <- data.frame(comparison, P.Value, lower_int, mean_diff, upper_int)
ci
ggplot(data = ci, aes(y = comparison, x = mean_diff, colour = comparison)) + geom_errorbarh(aes(xmin = lower_int, xmax = upper_int, height = .2)) + geom_point(shape = 21, size = 3, fill = "green") + geom_vline(xintercept = 0)

TukeyHSD(aovfit, conf.level = 0.983)
plot(TukeyHSD(aovfit, conf.level = 0.983))

addmargins(table(data_2012$coneduc, data_2012$nateduc))
addmargins(prop.table(table(data_2012$coneduc, data_2012$nateduc)))
Groups <- c(" A Great Deal", "Only Some", "Hardly Any")
Too.Little <- c(108, 257, 86)
About.Right <- c(23, 55, 19)
Too.Much <- c(11, 26, 10)
exp_val <- data.frame(Groups, Too.Little, About.Right, Too.Much)
exp_val

df <- (3-1)*(3-1)
df
chi_2 <- ((96-108)^2/108) + ((272-257)^2/257) + ((83-86)^2/86) + ((42-23)^2/23) + ((47-55)^2/55) + ((8-19)^2/19) + ((5-11)^2/11) + ((21-26)^2/26) + ((21-10)^2/10)
chi_2
pchisq(chi_2, df, lower.tail = FALSE)

chisq.test(data_2012$coneduc, data_2012$nateduc)

sessionInfo()

## R version 3.2.4 (2016-03-10)
## Platform: i386-w64-mingw32/i386 (32-bit)
## Running under: Windows 10 (build 10586)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] statsr_0.1-1  dplyr_0.5.0   gplots_2.17.0 ggplot2_2.1.0
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.6        knitr_1.13         magrittr_1.5      
##  [4] munsell_0.4.3      xtable_1.8-2       colorspace_1.2-6  
##  [7] R6_2.1.2           stringr_1.0.0      plyr_1.8.4        
## [10] caTools_1.17.1     tools_3.2.4        grid_3.2.4        
## [13] gtable_0.2.0       KernSmooth_2.23-15 DBI_0.4-1         
## [16] htmltools_0.3.5    gtools_3.5.0       lazyeval_0.2.0    
## [19] assertthat_0.1     yaml_2.1.13        digest_0.6.10     
## [22] tibble_1.1         shiny_0.13.2       formatR_1.4       
## [25] bitops_1.0-6       mime_0.5           evaluate_0.9      
## [28] rmarkdown_1.0      labeling_0.3       gdata_2.17.0      
## [31] stringi_1.1.1      scales_0.4.0       httpuv_1.3.3