Assignment3 PartA

For assignmet 3 partA. I use readr library for read the data. The dataset is a survey of 500 randomly selected households. Below are the head of the data.

##   Household Family.Size Location Ownership First.Income Second.Income
## 1         1           2        2         1        58206         38503
## 2         2           6        2         0        48273         29197
## 3         3           3        4         0        37582         28164
## 4         4           1        1         1        56610            NA
## 5         5           3        3         0        37731         21454
## 6         6           4        1         0        30434         26007
##   Monthly.Payment Utilities Debt
## 1            1585       252 5692
## 2            1314       216 4267
## 3             383       207 2903
## 4            1002       249 3896
## 5             743       217 3011
## 6             991       208 3718

Indicate the type of data for each of the variables included in the survey.

##       Household     Family.Size        Location       Ownership 
##       "integer"       "integer"       "integer"       "integer" 
##    First.Income   Second.Income Monthly.Payment       Utilities 
##       "integer"       "integer"       "integer"       "integer" 
##            Debt 
##       "integer"

The type of data for each variables in my data frame(column) is run by using apply function. We can see that all the data are integers.

For each of the categorical variables in the survey, indicate whether the variable is nominal or ordinal. Explain your reasoning in each case.

Categorical variables in the data frame are Household, Location, and Ownership. All the variables in the categorical variables in this data frame are nominal variables. Nominal variable is category variable without any ordering in the categoties. Household refers to the number of the family. Location is a variable for the area of each location, and Ownership is a binary variable (1 = ownership, 0 = not an ownership) which are a nominal variable.

Create a histogram for each of the numerical variables in this data set. Indicate whether each of these distributions is approximately symmetric or skewed. Which, if any, of these distributions are skewed to the right? Which, if any, are skewed to the left?

Histogram for the Family Size variable is skewed to the right. Histogram for the First Income variable is skewed to the right. Histogram for the Second Income variable is skewed to the right. Histogram for the Monthly Paymemnt variable is skewed to the right. Histogram for the Utilities variable is approximately symmetric. Histogram for the Debt variable is approximately symmetric.

Find the maximum and minimum debt levels for the households in this sample.

## [1] "Minimun debt levels"

## [1] 227

## [1] "Maximun debt levels"

## [1] 9104

Find the indebtedness levels at each of the 25th, 50th, and 75th percentiles.

##        25% 
## 0.04899381

##        50% 
## 0.06722489

##        75% 
## 0.08677484

Indebtness levels come from the summation from the first and second income and devided by debt for each family.

Find and interpret the interquartile range for the indebtedness levels of these selected households.

## [1] 0.03778103

Interquartile range comese from the third quatile minus first quartile, or we can use IQR() function in R. Interquatile range means the middle 50% of sample households have indebtness level around 3.77% of the family income.

Write a report that is less than 250 words that summarizes your analysis.

From the sample of 500 households, mostly there are small size of family (from 1 to 5 members). Most first income are between $30,000 and $40,000, and most second income are between $20,000 and $30,000. On the expenditure side, families are having most monthly payment around $600 to $700, unitilities around $250, and in debt around $3,000 to $4,000.

Assignment3 PartA

Tracey K. Wanasukaphun

September 19, 2017