I using the HousePrices dataset found https://vincentarelbundock.github.io/Rdatasets/articles/data.html
data <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/AER/HousePrices.csv", stringsAsFactors = F,
header = T)
#remove empty column name in R
data$X<- NULL
# Changing Data to Houseprices
HousePrices <- data
head(HousePrices)
# one way on finding mean and max of selected column in data frame
# you can use mean() and median() function with the variables inside
# the parenthesis
# Overview of the dataset
summary(HousePrices)
price lotsize bedrooms
Min. : 25000 Min. : 1650 Min. :1.000
1st Qu.: 49125 1st Qu.: 3600 1st Qu.:2.000
Median : 62000 Median : 4600 Median :3.000
Mean : 68122 Mean : 5150 Mean :2.965
3rd Qu.: 82000 3rd Qu.: 6360 3rd Qu.:3.000
Max. :190000 Max. :16200 Max. :6.000
bathrooms stories driveway
Min. :1.000 Min. :1.000 Length:546
1st Qu.:1.000 1st Qu.:1.000 Class :character
Median :1.000 Median :2.000 Mode :character
Mean :1.286 Mean :1.808
3rd Qu.:2.000 3rd Qu.:2.000
Max. :4.000 Max. :4.000
recreation fullbase gasheat
Length:546 Length:546 Length:546
Class :character Class :character Class :character
Mode :character Mode :character Mode :character
aircon garage prefer
Length:546 Min. :0.0000 Length:546
Class :character 1st Qu.:0.0000 Class :character
Mode :character Median :0.0000 Mode :character
Mean :0.6923
3rd Qu.:1.0000
Max. :3.0000
# Mean and median for two attributes prices and stories
apply(HousePrices[c("price","stories")],MARGIN=2, FUN = mean)
price stories
68121.597070 1.807692
# Median of prices and stories
# one way on finding mean and max of selected column in dataframe
apply(HousePrices[c("price","stories")],MARGIN=2, FUN = median)
price stories
62000 2
New data frame is designed to generate low price houses.
lowIncome_house <- subset(HousePrices, price < 40000 &
driveway == 'no' & bedrooms <= 2)[c("price","lotsize","bedrooms","stories","fullbase")]
lowIncome_house
colnames(lowIncome_house) <- c("House_Value","Lot_Space","Rooms","Floors","Basement")
lowIncome_house
apply(lowIncome_house[c("House_Value","Floors")],MARGIN=2, FUN = mean)
House_Value Floors
32657.14 1.00
apply(lowIncome_house[c("House_Value","Floors")],MARGIN=2, FUN = median)
House_Value Floors
32500 1
The mean and the median of the variables prices and stories are almost double the mean and median in the newly built dataset. The lowIncome_house is trimmed based on house prices lower than 40000 which will reduce the dataset and less than the median of the loaded dataset.
# Assign row value to Too small or small or standard
lowIncome_house$Lot_Space[lowIncome_house$Lot_Space == 1836] <- 'Too Small'
lowIncome_house$Lot_Space[lowIncome_house$Lot_Space == 2910] <- 'Small'
lowIncome_house$Lot_Space[lowIncome_house$Lot_Space == 3635] <- 'Standard'
# Assign row value to not built or finished
lowIncome_house$Basement[lowIncome_house$Basement == 'no'] <- 'Not Built'
lowIncome_house$Basement[lowIncome_house$Basement == 'yes'] <- 'Finished'
lowIncome_house
head(lowIncome_house,7)
My Github username is joewarner89
bonus <- read.csv("https://raw.githubusercontent.com/joewarner89/House-Prices-Assignment-2/main/HousePrices.csv",
stringsAsFactors = F,header=T, sep=",")
head(bonus)