Introduction

This is an RMarkdown document displaying R code for generating two versions of animated plots. The plots are intended to demonstrate how histograms construct gradually. The simulated data is meant to represent the distribution of IQ scores.

In addition to the aforementioned plots, animated plots are constructed to visualize the change in distribution when transforming an entire dataset via addition or multiplication by a constant.

This was created with the intention of supplementing lecture notes regarding frequency distributions and transformations.

Initial IQ Score Simulation and Data Frame Preparation

The first block of code accomplished the following:

  1. Load appropriate R packages.
  2. Simulate 2,635 IQ scores from a hypothetical normal distribution.
  3. Create a data frame that contains IQ score, as well as a “Dataset” variable to be used for animating the distribution gradually.
library(ggplot2)
library(dplyr)
library(gganimate)
library(transformr)

z1 <- rnorm(10,100,15)
z2 <- rnorm(25,100,15)
z3 <- rnorm(100,100,15)
z4 <- rnorm(500,100,15)
z5 <- rnorm(2000,100,15)

z1 <- as.matrix(z1, nrows=10, ncols=1)
z2 <- as.matrix(z2, nrows=25, ncols=1)
z3 <- as.matrix(z3, nrows=100, ncols=1)
z4 <- as.matrix(z4, nrows=500, ncols=1)
z5 <- as.matrix(z5, nrows=2000, ncols=1)

z1 <- as.data.frame(z1)
z2 <- as.data.frame(z2)
z3 <- as.data.frame(z3)
z4 <- as.data.frame(z4)
z5 <- as.data.frame(z5)

colnames(z1) <- "IQ"
colnames(z2) <- "IQ"
colnames(z3) <- "IQ"
colnames(z4) <- "IQ"
colnames(z5) <- "IQ"

dataset <- c(rep(1,10),rep(2,25),rep(3,100),rep(4,500),rep(5,2000))

fullmat <- matrix(0,2635,2) 
fullmat[,1] <- dataset
partmat <- rbind(z1,z2,z3,z4,z5)
fullmat[,2] <- partmat[,1]
colnames(fullmat) <- c("dataset","IQ")
fullmat <- as.data.frame(fullmat)

Two Ways of Animating the Gradual Building of a Histogram

The next two blocks of code demonstrate two versions of animating the gradual construction of a histogram:

  1. First plot allows for a flexible axis, that changes dynamically as the sample size grows. This provides a closer view of the shape of the distribution at smaller sample sizes.

  2. Second plot uses a fixed set of axes, that do not change as sammple size grows. This allows for a better sense of how the histgoram changes as it grows in sample size.

anim_plot1 <- ggplot(fullmat,aes(IQ)) + geom_histogram(col = "black",fill = "blue") + transition_states(dataset,2,4) + view_follow(fixed_x = TRUE)

anim_plot1
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

anim_plot2 <- ggplot(fullmat,aes(IQ)) + geom_histogram(col = "black",fill = "blue") + transition_states(dataset,2,4)

anim_plot2
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Second Simulation of IQ Scores to Demonstrate Data Transformations (Adding by a Constant)

The prior simulation and data-related steps are undertaken for a second simulation of IQ scores. An initial data frame is created as a sample of IQ scores. Then, transformed dataframes are created for use in animation.

t1 <- rnorm(1000,70,15)
t2 <- t1 + 15
t3 <- t2 + 15
t4 <- t3 + 15
t5 <- t4 + 15

t1 <- as.matrix(t1, nrows=1000, ncols=1)
t2 <- as.matrix(t2, nrows=1000, ncols=1)
t3 <- as.matrix(t3, nrows=1000, ncols=1)
t4 <- as.matrix(t4, nrows=1000, ncols=1)
t5 <- as.matrix(t5, nrows=1000, ncols=1)

t1 <- as.data.frame(t1)
t2 <- as.data.frame(t2)
t3 <- as.data.frame(t3)
t4 <- as.data.frame(t4)
t5 <- as.data.frame(t5)

colnames(t1) <- "IQ"
colnames(t2) <- "IQ"
colnames(t3) <- "IQ"
colnames(t4) <- "IQ"
colnames(t5) <- "IQ"

dataset <- c(rep(1,1000),rep(2,1000),rep(3,1000),rep(4,1000),rep(5,1000))
fullmat <- matrix(0,5000,2) 
fullmat[,1] <- dataset
partmat <- rbind(t1,t2,t3,t4,t5)
fullmat[,2] <- partmat[,1]
colnames(fullmat) <- c("dataset","IQ")
fullmat <- as.data.frame(fullmat)

Animated Plots of Data Transformation (Addition by a Constant)

Density curve plots are constructed for the intial and transformed IQ score distributions.

anim_plot3 <- ggplot(fullmat,aes(IQ)) + geom_density(col = "black",fill = "blue") + transition_states(dataset,2,4) + shadow_mark(alpha = .3)

anim_plot3

Third Simulation of IQ Scores to Demonstrate Data Transformations (Multiplying by a Constant)

The prior simulation and data-related steps are undertaken for a third simulation of IQ scores. An initial data frame is created as a sample of IQ scores. Then, transformed dataframes are created for use in animation.

v1 <- rnorm(1000,100,5)
v2 <- v1*2
v3 <- v2*2

v1 <- as.matrix(v1, nrows=1000, ncols=1)
v2 <- as.matrix(v2, nrows=1000, ncols=1)
v3 <- as.matrix(v3, nrows=1000, ncols=1)

v1 <- as.data.frame(v1)
v2 <- as.data.frame(v2)
v3 <- as.data.frame(v3)

colnames(v1) <- "IQ"
colnames(v2) <- "IQ"
colnames(v3) <- "IQ"

dataset <- c(rep(1,1000),rep(2,1000),rep(3,1000))
fullmat <- matrix(0,3000,2) 
fullmat[,1] <- dataset
partmat <- rbind(v1,v2,v3)
fullmat[,2] <- partmat[,1]
colnames(fullmat) <- c("dataset","IQ")
fullmat <- as.data.frame(fullmat)

Animated Plots of Data Transformation (Multiplication by a Constant)

Density curve plots are constructed for the intial and transformed IQ score distributions.

anim_plot4 <- ggplot(fullmat,aes(IQ)) + geom_density(col = "black",fill = "blue") + transition_states(dataset,2,4) + shadow_mark(alpha = .3)

anim_plot4