A03a_G45965733

Part A a

All variables are integers.

Ordinal data are Household, Location and Ownership and all else are nominal data. The numbers assigned to household represent as labels, each number assigned to location is a label that cannot be compared in quantitative size. 1 for ownership indicates Yes, 0 indicates No.

Numbers for other variables can be compared because the order of the numbers matter.

library(settings)
hist(mydf$Household) # symmetric

hist(mydf$`Family Size`) # right-skewed

hist(mydf$Location) # symmetric

hist(mydf$Ownership) # left-skewed

hist(mydf$`First Income`) # right-skewed

hist(mydf$`Second Income`) # right-skewed

hist(mydf$`Monthly Payment`) # right-skewed

hist(mydf$Utilities) # symmetric

hist(mydf$Debt) # symmetric

summary(mydf$Debt) # min=227, max=9104

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     227    2948    4268    4319    5676    9104

quantile(mydf$Debt, c(.25,.5,.75))

##    25%    50%    75% 
## 2948.5 4267.5 5675.5

# 25th, 50th, 75th percentiles are 2948.5, 4267.5, 5675.5.

IQR(mydf$Debt) # the result is 2727.

## [1] 2727

boxplot(mydf$Debt)

# The interquartile range is the difference of its upper and lower quartiles. It is a measure of how far apart the middle portion of data spreads in value.
# It is a measure of variability and 50% of the samples are in the range between 2948 and 5676.

Based on debt analysis, we can see that indebtness is approximately normally distributed. Indebtness has a huge range of 227 to 9104, with median of 4268. Since the interq range is only 2727, we can see that there are outliers on both sides.

A03a_G45965733

Jiafeng Xu

2017年9月21日