For assignmet 3 partA. I use readr library for read the data. The dataset is a survey of 500 randomly selected households. Below are the head of the data.
## Household Family.Size Location Ownership First.Income Second.Income
## 1 1 2 2 1 58206 38503
## 2 2 6 2 0 48273 29197
## 3 3 3 4 0 37582 28164
## 4 4 1 1 1 56610 NA
## 5 5 3 3 0 37731 21454
## 6 6 4 1 0 30434 26007
## Monthly.Payment Utilities Debt
## 1 1585 252 5692
## 2 1314 216 4267
## 3 383 207 2903
## 4 1002 249 3896
## 5 743 217 3011
## 6 991 208 3718
- Indicate the type of data for each of the variables included in the survey.
## Household Family.Size Location Ownership
## "integer" "integer" "integer" "integer"
## First.Income Second.Income Monthly.Payment Utilities
## "integer" "integer" "integer" "integer"
## Debt
## "integer"
The type of data for each variables in my data frame(column) is run by using apply function. We can see that all the data are integers.
- For each of the categorical variables in the survey, indicate whether the variable is nominal or ordinal. Explain your reasoning in each case.
Categorical variables in the data frame are Household, Location, and Ownership. All the variables in the categorical variables in this data frame are nominal variables. Nominal variable is category variable without any ordering in the categoties. Household refers to the number of the family. Location is a variable for the area of each location, and Ownership is a binary variable (1 = ownership, 0 = not an ownership) which are a nominal variable.
- Create a histogram for each of the numerical variables in this data set. Indicate whether each of these distributions is approximately symmetric or skewed. Which, if any, of these distributions are skewed to the right? Which, if any, are skewed to the left?
Histogram for the Family Size variable is skewed to the right. Histogram for the First Income variable is skewed to the right. Histogram for the Second Income variable is skewed to the right. Histogram for the Monthly Paymemnt variable is skewed to the right. Histogram for the Utilities variable is approximately symmetric. Histogram for the Debt variable is approximately symmetric.
- Find the maximum and minimum debt levels for the households in this sample.
## [1] "Minimun debt levels"
## [1] 227
## [1] "Maximun debt levels"
## [1] 9104
- Find the indebtedness levels at each of the 25th, 50th, and 75th percentiles.
## 25%
## 0.04899381
## 50%
## 0.06722489
## 75%
## 0.08677484
Indebtness levels come from the summation from the first and second income and devided by debt for each family.
- Find and interpret the interquartile range for the indebtedness levels of these selected households.
## [1] 0.03778103
Interquartile range comese from the third quatile minus first quartile, or we can use IQR() function in R. Interquatile range means the middle 50% of sample households have indebtness level around 3.77% of the family income.
- Write a report that is less than 250 words that summarizes your analysis.
From the sample of 500 households, mostly there are small size of family (from 1 to 5 members). Most first income are between $30,000 and $40,000, and most second income are between $20,000 and $30,000. On the expenditure side, families are having most monthly payment around $600 to $700, unitilities around $250, and in debt around $3,000 to $4,000.