Introduction

This is an RMarkdown document displaying R code for generating four samples from a hypothetical population distribution. Then, four plots are constructed as histograms with overlayed density curves.

The plots are intended to demonstrate how samples differ within a population. The hypothetical distribution is intended to represent self-report anxiety scores towards statistics coursework. This reflect a self-report survey item asked of the class prior to the course.

This was created with the intention of supplementing lecture notes regarding sample vs. population.

Constructing Population Distribution of Anxiety Scores and Data Frame Preparation

The first block of code accomplished the following:

  1. Create a hypothetical population distribution of anxiety scores.
  2. Convert the population distribution into an appropriate data frame.
library(ggplot2)
library(dplyr)
library(gridExtra)

anxiety <- c(rep(1,25), rep(2,50), rep(3,70), rep(4,90), rep(5,100), rep(6,100), rep(7,90), rep(8,70), rep(9,50), rep(10,25))

anxiety <- as.data.frame(anxiety)

Simulating Four Samples of Size n = 50.

Four samples of size 50 are randomly sampled from the constructed population distribution and coerced into data frames.

sim1 <- sample(anxiety[,1],50)
sim2 <- sample(anxiety[,1],50)
sim3 <- sample(anxiety[,1],50)
sim4 <- sample(anxiety[,1],50)

sim1 <- as.data.frame(sim1)
sim2 <- as.data.frame(sim2)
sim3 <- as.data.frame(sim3)
sim4 <- as.data.frame(sim4)

Create a Grid of Histograms with Overlayed Density Curves

A histogram is constructed for all four samples. Each histogram has a smoothed density curve overlayed on top of it with slight shading.

p1 <- ggplot(sim1,aes(x=sim1)) + xlab("Anxiety Level") + ylab("Density") + ggtitle("Sample 1") + theme(plot.title = element_text(hjust=.5)) + scale_x_continuous(breaks=seq(1,10,1),limits=c(0,10.5)) + geom_histogram(aes(y=..density..),fill="red",col="black",binwidth=1) + geom_density(fill="black",alpha=.2) 

p2 <- ggplot(sim2,aes(x=sim2)) + xlab("Anxiety Level") + ylab("Density") + ggtitle("Sample 2") + theme(plot.title = element_text(hjust=.5)) + scale_x_continuous(breaks=seq(1,10,1),limits=c(0,10.5)) + geom_histogram(aes(y=..density..),fill="red",col="black",binwidth=1) + geom_density(fill="black",alpha=.2) 

p3 <- ggplot(sim3,aes(x=sim3)) + xlab("Anxiety Level") + ylab("Density") + ggtitle("Sample 3") + theme(plot.title = element_text(hjust=.5)) + scale_x_continuous(breaks=seq(1,10,1),limits=c(0,10.5)) + geom_histogram(aes(y=..density..),fill="red",col="black",binwidth=1) + geom_density(fill="black",alpha=.2)

p4 <- ggplot(sim4,aes(x=sim4)) + xlab("Anxiety Level") + ylab("Density") + ggtitle("Sample 4") + theme(plot.title = element_text(hjust=.5)) + scale_x_continuous(breaks=seq(1,10,1),limits=c(0,10.5)) + geom_histogram(aes(y=..density..),fill="red",col="black",binwidth=1) + geom_density(fill="black",alpha=.2) 

grid.arrange(p1,p2,p3,p4,ncol=2)
## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_bar).

## Warning: Removed 1 rows containing missing values (geom_bar).