Part A a
All variables are integers.
b
Ordinal data are Household, Location and Ownership and all else are nominal data. The numbers assigned to household represent as labels, each number assigned to location is a label that cannot be compared in quantitative size. 1 for ownership indicates Yes, 0 indicates No.
Numbers for other variables can be compared because the order of the numbers matter.
c
library(settings)
hist(mydf$Household) # symmetric
hist(mydf$`Family Size`) # right-skewed
hist(mydf$Location) # symmetric
hist(mydf$Ownership) # left-skewed
hist(mydf$`First Income`) # right-skewed
hist(mydf$`Second Income`) # right-skewed
hist(mydf$`Monthly Payment`) # right-skewed
hist(mydf$Utilities) # symmetric
hist(mydf$Debt) # symmetric
d
summary(mydf$Debt) # min=227, max=9104
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 227 2948 4268 4319 5676 9104
e
quantile(mydf$Debt, c(.25,.5,.75))
## 25% 50% 75%
## 2948.5 4267.5 5675.5
# 25th, 50th, 75th percentiles are 2948.5, 4267.5, 5675.5.
f
IQR(mydf$Debt) # the result is 2727.
## [1] 2727
boxplot(mydf$Debt)
# The interquartile range is the difference of its upper and lower quartiles. It is a measure of how far apart the middle portion of data spreads in value.
# It is a measure of variability and 50% of the samples are in the range between 2948 and 5676.
g
Based on debt analysis, we can see that indebtness is approximately normally distributed. Indebtness has a huge range of 227 to 9104, with median of 4268. Since the interq range is only 2727, we can see that there are outliers on both sides.