households <-read_csv("households.csv", cols( Household = col_integer(),`Family Size` = col_integer(),Location = col_integer(), Ownership = col_integer(),`First Income` = col_integer(), `Second Income` = col_integer(),`Monthly Payment` = col_integer(),Utilities = col_integer(), Debt = col_integer()), col_names = T) #read the csv file
households
## # A tibble: 500 x 9
## Household `Family Size` Location Ownership `First Income`
## <int> <int> <int> <int> <int>
## 1 1 2 2 1 58206
## 2 2 6 2 0 48273
## 3 3 3 4 0 37582
## 4 4 1 1 1 56610
## 5 5 3 3 0 37731
## 6 6 4 1 0 30434
## 7 7 1 1 1 47969
## 8 8 1 1 1 55487
## 9 9 3 2 1 59947
## 10 10 6 1 0 36970
## # ... with 490 more rows, and 4 more variables: `Second Income` <int>,
## # `Monthly Payment` <int>, Utilities <int>, Debt <int>
Categorical Data: Household, Location, and Ownership.
Continuous Data: Family Size, First Income, Second Income, Monthly Payment, Utilities, Debt.
Categorical Data:
1. Household: Ordinal variable. Because the variables of household are used to mark each household. Each number, 1 to 500, represents each family. In the survey, it follows the order.
2. Location: Nominal variable. Because the variables of location represent different location. There is no order to follow.
3. Ownership: Nominal variable. Because the 0 and 1 represent the status of ownership. There is no order to follow.
hist(households$Debt, xlab = "Debt", main = "Histogram of the Debt", breaks = 30, col = "pink") #output the histogram of Dabt and set the x-axis, title, and color.
The histogram shows that a roughly symmetric distribution. More than 50 families got a debt that is more than $3000. Also, most families got the debt between $2000 and $6000
sprintf("The maximum debt level for the households: %s", max(households$Debt)) #maximum debt level for the household
## [1] "The maximum debt level for the households: 9104"
sprintf("The minimum debt level for the households: %s", min(households$Debt)) #minimum debt level for the household
## [1] "The minimum debt level for the households: 227"
sprintf("Indebtedness levels at 25th percentiles: %s", quantile(households$Debt,.25))
## [1] "Indebtedness levels at 25th percentiles: 2948.5"
sprintf("Indebtedness levels at 50th percentiles: %s", quantile(households$Debt,.50))
## [1] "Indebtedness levels at 50th percentiles: 4267.5"
sprintf("Indebtedness levels at 75th percentiles: %s", quantile(households$Debt,.75)) #combine the sentence "indebtedness levels at each of 25th,50th,and 75th percentiles" and calculation of debt level at 25th, 50th, and 75th percentiles.
## [1] "Indebtedness levels at 75th percentiles: 5675.5"
sprintf("interquartile range for the indebtedness levels of households is: %s", IQR(households$Debt)) #combine the sentence "interquartile range for the indebtedness levels of households is:" and the calculation result of IQR.
## [1] "interquartile range for the indebtedness levels of households is: 2727"
The interquartile range for the indebtedness levels of households is $2727 which means the amount of different between indebtedness levels at 25th percentiles and indebtedness levels at 75th percentiles.