a. Indicate the type of data (categorical or continuous) for each of the variables included in the survey

households <-read_csv("households.csv", cols( Household = col_integer(),`Family Size` = col_integer(),Location = col_integer(), Ownership = col_integer(),`First Income` = col_integer(), `Second Income` = col_integer(),`Monthly Payment` = col_integer(),Utilities = col_integer(), Debt = col_integer()), col_names = T) #read the csv file
households

## # A tibble: 500 x 9
##    Household `Family Size` Location Ownership `First Income`
##        <int>         <int>    <int>     <int>          <int>
##  1         1             2        2         1          58206
##  2         2             6        2         0          48273
##  3         3             3        4         0          37582
##  4         4             1        1         1          56610
##  5         5             3        3         0          37731
##  6         6             4        1         0          30434
##  7         7             1        1         1          47969
##  8         8             1        1         1          55487
##  9         9             3        2         1          59947
## 10        10             6        1         0          36970
## # ... with 490 more rows, and 4 more variables: `Second Income` <int>,
## #   `Monthly Payment` <int>, Utilities <int>, Debt <int>

Categorical Data: Household, Location, and Ownership. Continuous Data:Family Size, First Income, Second Income, Monthly Payment, Utilities, Debt.

b. For each of the categorical variables in the survey, indicate whether you believe the variable is nominal or ordinal.

Categorical Data:
1. Household: Ordinal variable. Because the variables of household are used to mark each household. Each number, 1 to 500, represents each family. In the survey, it follows the order.
2. Location: Nominal variable. Because the variables of location represent different location. There is no order to follow.
3. Ownership: Nominal variable. Because the 0 and 1 represent the status of ownership. There is no order to follow.

c. Create a histogram for each of Debt. What does the histogram tell you about debt?

hist(households$Debt, xlab = "Debt", main = "Histogram of the Debt", breaks = 30, col = "pink") #output the histogram of Dabt and set the x-axis, title, and color.

The histogram shows that a roughly symmetric distribution. More than 50 families got a debt that is more than $3000. Also, most families got the debt between $2000 and $6000

d. Find the maximum and minimum debt levels for the households in this sample.

sprintf("The maximum debt level for the households: %s", max(households$Debt)) #maximum debt level for the household

## [1] "The maximum debt level for the households: 9104"

sprintf("The minimum debt level for the households: %s", min(households$Debt)) #minimum debt level for the household

## [1] "The minimum debt level for the households: 227"

e. Report the indebtedness levels at each of the 25th, 50th, and 75th percentiles.

sprintf("Indebtedness levels at 25th percentiles: %s", quantile(households$Debt,.25))

## [1] "Indebtedness levels at 25th percentiles: 2948.5"

sprintf("Indebtedness levels at 50th percentiles: %s", quantile(households$Debt,.50))

## [1] "Indebtedness levels at 50th percentiles: 4267.5"

sprintf("Indebtedness levels at 75th percentiles: %s", quantile(households$Debt,.75)) #combine the sentence "indebtedness levels at each of 25th,50th,and 75th percentiles" and calculation of debt level at 25th, 50th, and 75th percentiles.

## [1] "Indebtedness levels at 75th percentiles: 5675.5"

f. Report and interpret the interquartile range for the indebtedness levels of households?

sprintf("interquartile range for the indebtedness levels of households is: %s", IQR(households$Debt)) #combine the sentence "interquartile range for the indebtedness levels of households is:" and the calculation result of IQR.

## [1] "interquartile range for the indebtedness levels of households is: 2727"

Assignment 02 Part A

Jiasheng Li

January 30, 2018