Scenario

We are interested in the Gapminder data set, which records measurements (such as life expectancy, GDP per capita, and population) for different countries over different years. Specifically, we will focus on the values from the year 2007. This will require us to create a new data set, gap_2007, which we will do here:

gap_2007 <- gap %>% filter(year == 2007)

Exploring the Data

Here, we calculate the dimensions of the data set and identify the names of the different variables in our gap_2007 data set. The results are recorded below:

names(gap)     
## [1] "country"   "continent" "year"      "lifeExp"   "pop"       "gdpPercap"
dim(gap)        
## [1] 1704    6
head(gap)     
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.
str(gap)        
## tibble [1,704 × 6] (S3: tbl_df/tbl/data.frame)
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int [1:1704] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num [1:1704] 28.8 30.3 32 34 36.1 ...
##  $ pop      : int [1:1704] 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num [1:1704] 779 821 853 836 740 ...

We are interested in looking at histograms of each of our quantitative variables. The results are below:

ggplot(gap_2007, aes(x = gdpPercap)) +
  geom_histogram(bins = 30, alpha = 0.8, fill = "lightblue", color = "black") +
  theme_minimal()

ggplot(gap_2007, aes(x = pop)) +
  geom_histogram(bins = 30, alpha = 0.8, fill = "lightblue", color = "black") +
  theme_minimal()

ggplot(gap_2007, aes(x = lifeExp)) +
  geom_histogram(bins = 30, alpha = 0.8, fill = "lightblue", color = "black") +
  theme_minimal()

Calculating Statistics for one Variable

We decide to hone in on one of our variables, namely gdpPercap/pop/lifeExp (choose one and erase the others). For this variable, we calculate the mean, median, IQR, and standard deviation in the space below:

#mean
mean(gap_2007$lifeExp)
## [1] 67.00742
#median
median(gap_2007$lifeExp)
## [1] 71.9355
#range
range(gap_2007$lifeExp)
## [1] 39.613 82.603
#IQR
IQR(gap_2007$lifeExp)
## [1] 19.253
#standard deviation
sd(gap_2007$lifeExp)
## [1] 12.07302

Summary

The Life Expectancy Histogram seems to be generally skewed to the left. The mean life expectancy is less robust than the median life expectancy and is dragged in the direction of the left skew. This is clear because the mean is approximately 67 years while the median is approximately 72 years.