A02b_G22358412

load dplyr and read households.csv

library(tidyverse)

## ── Attaching packages ───────────────────────────── tidyverse 1.2.1 ──

## ✔ ggplot2 2.2.1     ✔ purrr   0.2.4
## ✔ tibble  1.3.4     ✔ dplyr   0.7.4
## ✔ tidyr   0.7.2     ✔ stringr 1.2.0
## ✔ readr   1.1.1     ✔ forcats 0.2.0

## ── Conflicts ──────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(dplyr, warn.conflicts = FALSE)
dths <- read_csv("households.csv")

## Parsed with column specification:
## cols(
##   Household = col_integer(),
##   `Family Size` = col_integer(),
##   Location = col_integer(),
##   Ownership = col_integer(),
##   `First Income` = col_integer(),
##   `Second Income` = col_integer(),
##   `Monthly Payment` = col_integer(),
##   Utilities = col_integer(),
##   Debt = col_integer()
## )

Indicate the type of data (categorical or continuous) for each of the variables included in the survey. ## sort names of the variables and indicate the type of the data

sort(names(dths))

## [1] "Debt"            "Family Size"     "First Income"    "Household"      
## [5] "Location"        "Monthly Payment" "Ownership"       "Second Income"  
## [9] "Utilities"

In general, variables (and data) either represent measurements on some continuous scale, or they represent information about some categorical or discrete characteristics.

Continuous:“Family Size”, “First Income”, “Second Income”,“Monthly Payment”, “Utilities”, “Debt” Categorical:“Household”,Location“,”Ownership"

For each of the categorical variables in the survey, indicate whether you believe the variable is nominal or ordinal.

Nominal variables are data whose levels are labels or descriptions, and which cannot be ordered. Ordinal variables can be ordered, or ranked in logical order, but the interval between levels of the variables are not necessarily known.

Nominal:“Household”,“Location”,“Ownership”

Create a histogram for each of continuous Variables. What does the histogram tell you about debt? ##create a histogram for all continous variables with x-axis represents the variable and y-axis represents frequency

hist(dths$`Family Size`,xlab = "Family Size", ylab = "Frequency", main = "Histogram of Family Size")

hist(dths$`First Income`,xlab = "First Income", ylab = "Frequency", main = "Histogram of First Income")

hist(dths$`Second Income`,xlab = "Second Income", ylab = "Frequency", main = "Histogram of Second Income")

hist(dths$`Monthly Payment`,xlab = "Monthly Payment", ylab = "Frequency", main = "Histogram of Monthly Payment")

hist(dths$Utilities,xlab = "Ultilities", ylab = "Frequency", main = "Histogram of Utilities")

hist(dths$Debt,xlab = "Household", ylab = "Frequency", main = "Histogram of Debt")

The histogram of debt tells that the range of the debt is from 0 to 10000. Households have a debt of 3000 to 4000 are most common. Debt is a continuous variable.The distributions of Debt tend to symmetric.

Find the maximum and minimum debt levels for the households in this sample. ##find out the maximum and minimum debt levels for the households

max(dths$Debt)

## [1] 9104

min(dths$Debt)

## [1] 227

Report the indebtedness levels at each of the 25th, 50th, and 75th percentiles. ##find out the indebtedness levels at 25th, 50th, and 75th percentiles

quantile(dths$Debt, c(0.25,0.5,0.75))

##    25%    50%    75% 
## 2948.5 4267.5 5675.5

Report and interpret the interquartile range for the indebtedness levels of households? ##report the interquartile range for the indebtedness levels of households

IQR(dths$Debt)

## [1] 2727

the interquartile range equals to the difference between 75th and 25th percentiles