Part 1: Pick a dataset and write a short description. Also make a summary table and describe it.

1a: Choice and description of the dataset

I pick the dataset bigcity, which contains the population (in thousands) of 49 U.S. cities in 1920 and 1930. For more information about this dataset, please visit here or here.

library(boot)
data("bigcity")
summary(bigcity)
##        u               x        
##  Min.   :  2.0   Min.   : 46.0  
##  1st Qu.: 43.0   1st Qu.: 58.0  
##  Median : 64.0   Median : 79.0  
##  Mean   :103.1   Mean   :127.8  
##  3rd Qu.:120.0   3rd Qu.:130.0  
##  Max.   :507.0   Max.   :634.0

1b: Summary table and description

A summary table of the dataset is presented above. The dataset has 2 variables:

  • the first variable is the 1920 city population (in thousands)
  • the second variable is the 1930 city population (in thousands).
  1. The first variable takes values ranging from 2 to 507, with a median at 64 and an interquartile range of 77.
  2. The second variable takes values ranging from 46 to 634, with a median at 79 and an interquartile range of 72.
120-43
## [1] 77
130-58
## [1] 72

Part 2: Make a plot based on that data.

The plot below shows an increasing trend in a city population.

plot(bigcity$u, bigcity$x, xlab="1920 city population (in 1000's)", ylab="1930 city population (in 1000's)")
title("Trends in US city population: 1930 vs. 1920", "Data Source: https://astrostatistics.psu.edu/su07/R/html/boot/html/bigcity.html")