Part A a

All variables are integers.

b

Ordinal data are Household, Location and Ownership and all else are nominal data. The numbers assigned to household represent as labels, each number assigned to location is a label that cannot be compared in quantitative size. 1 for ownership indicates Yes, 0 indicates No.

Numbers for other variables can be compared because the order of the numbers matter.

c

library(settings)
hist(mydf$Household) # symmetric

hist(mydf$`Family Size`) # right-skewed

hist(mydf$Location) # symmetric

hist(mydf$Ownership) # left-skewed

hist(mydf$`First Income`) # right-skewed

hist(mydf$`Second Income`) # right-skewed

hist(mydf$`Monthly Payment`) # right-skewed

hist(mydf$Utilities) # symmetric

hist(mydf$Debt) # symmetric

d

summary(mydf$Debt) # min=227, max=9104
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     227    2948    4268    4319    5676    9104

e

quantile(mydf$Debt, c(.25,.5,.75)) 
##    25%    50%    75% 
## 2948.5 4267.5 5675.5
# 25th, 50th, 75th percentiles are 2948.5, 4267.5, 5675.5.

f

IQR(mydf$Debt) # the result is 2727.
## [1] 2727
boxplot(mydf$Debt)

# The interquartile range is the difference of its upper and lower quartiles. It is a measure of how far apart the middle portion of data spreads in value.
# It is a measure of variability and 50% of the samples are in the range between 2948 and 5676.

g

Based on debt analysis, we can see that indebtness is approximately normally distributed. Indebtness has a huge range of 227 to 9104, with median of 4268. Since the interq range is only 2727, we can see that there are outliers on both sides.