load dplyr and read households.csv

library(tidyverse)
## ── Attaching packages ───────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.3.4     ✔ dplyr   0.7.4
## ✔ tidyr   0.7.2     ✔ stringr 1.2.0
## ✔ readr   1.1.1     ✔ forcats 0.2.0
## ── Conflicts ──────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(dplyr, warn.conflicts = FALSE)
dths <- read_csv("households.csv")
## Parsed with column specification:
## cols(
##   Household = col_integer(),
##   `Family Size` = col_integer(),
##   Location = col_integer(),
##   Ownership = col_integer(),
##   `First Income` = col_integer(),
##   `Second Income` = col_integer(),
##   `Monthly Payment` = col_integer(),
##   Utilities = col_integer(),
##   Debt = col_integer()
## )
  1. Indicate the type of data (categorical or continuous) for each of the variables included in the survey. ## sort names of the variables and indicate the type of the data
sort(names(dths))
## [1] "Debt"            "Family Size"     "First Income"    "Household"      
## [5] "Location"        "Monthly Payment" "Ownership"       "Second Income"  
## [9] "Utilities"

In general, variables (and data) either represent measurements on some continuous scale, or they represent information about some categorical or discrete characteristics.

Continuous:“Family Size”, “First Income”, “Second Income”,“Monthly Payment”, “Utilities”, “Debt” Categorical:“Household”,Location“,”Ownership"

  1. For each of the categorical variables in the survey, indicate whether you believe the variable is nominal or ordinal.

Nominal variables are data whose levels are labels or descriptions, and which cannot be ordered. Ordinal variables can be ordered, or ranked in logical order, but the interval between levels of the variables are not necessarily known.

Nominal:“Household”,“Location”,“Ownership”

  1. Create a histogram for each of continuous Variables. What does the histogram tell you about debt? ##create a histogram for all continous variables with x-axis represents the variable and y-axis represents frequency
hist(dths$`Family Size`,xlab = "Family Size", ylab = "Frequency", main = "Histogram of Family Size")

hist(dths$`First Income`,xlab = "First Income", ylab = "Frequency", main = "Histogram of First Income")

hist(dths$`Second Income`,xlab = "Second Income", ylab = "Frequency", main = "Histogram of Second Income")

hist(dths$`Monthly Payment`,xlab = "Monthly Payment", ylab = "Frequency", main = "Histogram of Monthly Payment")

hist(dths$Utilities,xlab = "Ultilities", ylab = "Frequency", main = "Histogram of Utilities")

hist(dths$Debt,xlab = "Household", ylab = "Frequency", main = "Histogram of Debt")

The histogram of debt tells that the range of the debt is from 0 to 10000. Households have a debt of 3000 to 4000 are most common. Debt is a continuous variable.The distributions of Debt tend to symmetric.

  1. Find the maximum and minimum debt levels for the households in this sample. ##find out the maximum and minimum debt levels for the households
max(dths$Debt)
## [1] 9104
min(dths$Debt)
## [1] 227
  1. Report the indebtedness levels at each of the 25th, 50th, and 75th percentiles. ##find out the indebtedness levels at 25th, 50th, and 75th percentiles
quantile(dths$Debt, c(0.25,0.5,0.75))
##    25%    50%    75% 
## 2948.5 4267.5 5675.5
  1. Report and interpret the interquartile range for the indebtedness levels of households? ##report the interquartile range for the indebtedness levels of households
IQR(dths$Debt)
## [1] 2727

the interquartile range equals to the difference between 75th and 25th percentiles