1. Use the summary function to gain an overview of the data set. Then display the mean and median for at least two attributes of your data.

I using the HousePrices dataset found https://vincentarelbundock.github.io/Rdatasets/articles/data.html

data <- read.csv("https://vincentarelbundock.github.io/Rdatasets/csv/AER/HousePrices.csv", stringsAsFactors = F, 
                 header = T)

#remove empty column name in R
data$X<- NULL
# Changing Data to Houseprices 
HousePrices <- data
head(HousePrices)

# one way on finding mean and max of selected column in data frame
# you can use mean() and median() function with the variables inside 
# the parenthesis 
# Overview of the dataset
summary(HousePrices)
     price           lotsize         bedrooms    
 Min.   : 25000   Min.   : 1650   Min.   :1.000  
 1st Qu.: 49125   1st Qu.: 3600   1st Qu.:2.000  
 Median : 62000   Median : 4600   Median :3.000  
 Mean   : 68122   Mean   : 5150   Mean   :2.965  
 3rd Qu.: 82000   3rd Qu.: 6360   3rd Qu.:3.000  
 Max.   :190000   Max.   :16200   Max.   :6.000  
   bathrooms        stories        driveway        
 Min.   :1.000   Min.   :1.000   Length:546        
 1st Qu.:1.000   1st Qu.:1.000   Class :character  
 Median :1.000   Median :2.000   Mode  :character  
 Mean   :1.286   Mean   :1.808                     
 3rd Qu.:2.000   3rd Qu.:2.000                     
 Max.   :4.000   Max.   :4.000                     
  recreation          fullbase           gasheat         
 Length:546         Length:546         Length:546        
 Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character  
                                                         
                                                         
                                                         
    aircon              garage          prefer         
 Length:546         Min.   :0.0000   Length:546        
 Class :character   1st Qu.:0.0000   Class :character  
 Mode  :character   Median :0.0000   Mode  :character  
                    Mean   :0.6923                     
                    3rd Qu.:1.0000                     
                    Max.   :3.0000                     
# Mean and median for two attributes prices and stories
apply(HousePrices[c("price","stories")],MARGIN=2, FUN = mean)
       price      stories 
68121.597070     1.807692 
# Median of prices and stories
# one way on finding mean and max of selected column in dataframe
apply(HousePrices[c("price","stories")],MARGIN=2, FUN = median)
  price stories 
  62000       2 

2. Create a new data frame with a subset of the columns AND rows. There are several ways to do this so feel free to try a couple if you want. Make sure to rename the new data set so it simply just doesn’t write it over.

New data frame is designed to generate low price houses.

lowIncome_house <- subset(HousePrices, price < 40000 & 
                             driveway == 'no' & bedrooms <= 2)[c("price","lotsize","bedrooms","stories","fullbase")] 
lowIncome_house

3. Create new column names for each column in the new data frame created in step 2.

colnames(lowIncome_house) <- c("House_Value","Lot_Space","Rooms","Floors","Basement")
lowIncome_house

4. Use the summary function to create an overview of your new data frame created in step 2. The print the mean and median for the same two attributes. Please compare (i.e. tell me how the values changed and why).

apply(lowIncome_house[c("House_Value","Floors")],MARGIN=2, FUN = mean)
House_Value      Floors 
   32657.14        1.00 
apply(lowIncome_house[c("House_Value","Floors")],MARGIN=2, FUN = median)
House_Value      Floors 
      32500           1 

The mean and the median of the variables prices and stories are almost double the mean and median in the newly built dataset. The lowIncome_house is trimmed based on house prices lower than 40000 which will reduce the dataset and less than the median of the loaded dataset.

5. For at least 3 different/distinct values in a column please rename so that every value in that column is renamed. For example, change the letter “e” to “excellent”, the letter “a” to “average’ and the word “bad” to “terrible”.

# Assign row value to Too small or small or standard
lowIncome_house$Lot_Space[lowIncome_house$Lot_Space == 1836] <- 'Too Small'
lowIncome_house$Lot_Space[lowIncome_house$Lot_Space == 2910] <- 'Small'
lowIncome_house$Lot_Space[lowIncome_house$Lot_Space == 3635] <- 'Standard'

# Assign row value to not built or finished

lowIncome_house$Basement[lowIncome_house$Basement == 'no'] <- 'Not Built'
lowIncome_house$Basement[lowIncome_house$Basement == 'yes'] <- 'Finished'

lowIncome_house

6. Display enough rows to see examples of all of steps 1-5 above. This means use a function to show me enough row values that I can see the changes.

head(lowIncome_house,7)

7. BONUS – place the original .csv in a github file and have R read from the link. This should be your own github – not the file source. This will be a very useful skill as you progress in your data science education and career.

My Github username is joewarner89

bonus <- read.csv("https://raw.githubusercontent.com/joewarner89/House-Prices-Assignment-2/main/HousePrices.csv",
                 stringsAsFactors = F,header=T, sep=",")
head(bonus)
LS0tDQp0aXRsZTogIlIgSG9tZXdvcmsgMiINCmF1dGhvcjogIldhcm5lciBBbGV4aXMiDQpkYXRlOiAiMjAyMy0wNy0xOSINCm91dHB1dDogaHRtbF9ub3RlYm9vaw0KLS0tDQoNCiMjIyAxLglVc2UgdGhlIHN1bW1hcnkgZnVuY3Rpb24gdG8gZ2FpbiBhbiBvdmVydmlldyBvZiB0aGUgZGF0YSBzZXQuICBUaGVuIGRpc3BsYXkgdGhlIG1lYW4gYW5kIG1lZGlhbiBmb3IgYXQgbGVhc3QgdHdvIGF0dHJpYnV0ZXMgb2YgeW91ciBkYXRhLg0KDQpJIHVzaW5nIHRoZSBIb3VzZVByaWNlcyBkYXRhc2V0IGZvdW5kIGh0dHBzOi8vdmluY2VudGFyZWxidW5kb2NrLmdpdGh1Yi5pby9SZGF0YXNldHMvYXJ0aWNsZXMvZGF0YS5odG1sDQoNCg0KYGBge3IgSG91c2VQcmljZXN9DQpkYXRhIDwtIHJlYWQuY3N2KCJodHRwczovL3ZpbmNlbnRhcmVsYnVuZG9jay5naXRodWIuaW8vUmRhdGFzZXRzL2Nzdi9BRVIvSG91c2VQcmljZXMuY3N2Iiwgc3RyaW5nc0FzRmFjdG9ycyA9IEYsIA0KICAgICAgICAgICAgICAgICBoZWFkZXIgPSBUKQ0KDQojcmVtb3ZlIGVtcHR5IGNvbHVtbiBuYW1lIGluIFINCmRhdGEkWDwtIE5VTEwNCiMgQ2hhbmdpbmcgRGF0YSB0byBIb3VzZXByaWNlcyANCkhvdXNlUHJpY2VzIDwtIGRhdGENCmhlYWQoSG91c2VQcmljZXMpDQoNCiMgb25lIHdheSBvbiBmaW5kaW5nIG1lYW4gYW5kIG1heCBvZiBzZWxlY3RlZCBjb2x1bW4gaW4gZGF0YSBmcmFtZQ0KIyB5b3UgY2FuIHVzZSBtZWFuKCkgYW5kIG1lZGlhbigpIGZ1bmN0aW9uIHdpdGggdGhlIHZhcmlhYmxlcyBpbnNpZGUgDQojIHRoZSBwYXJlbnRoZXNpcyANCiMgT3ZlcnZpZXcgb2YgdGhlIGRhdGFzZXQNCnN1bW1hcnkoSG91c2VQcmljZXMpDQoNCiMgTWVhbiBhbmQgbWVkaWFuIGZvciB0d28gYXR0cmlidXRlcyBwcmljZXMgYW5kIHN0b3JpZXMNCmFwcGx5KEhvdXNlUHJpY2VzW2MoInByaWNlIiwic3RvcmllcyIpXSxNQVJHSU49MiwgRlVOID0gbWVhbikNCg0KIyBNZWRpYW4gb2YgcHJpY2VzIGFuZCBzdG9yaWVzDQojIG9uZSB3YXkgb24gZmluZGluZyBtZWFuIGFuZCBtYXggb2Ygc2VsZWN0ZWQgY29sdW1uIGluIGRhdGFmcmFtZQ0KYXBwbHkoSG91c2VQcmljZXNbYygicHJpY2UiLCJzdG9yaWVzIildLE1BUkdJTj0yLCBGVU4gPSBtZWRpYW4pDQpgYGANCiMjIyAyLglDcmVhdGUgYSBuZXcgZGF0YSBmcmFtZSB3aXRoIGEgc3Vic2V0IG9mIHRoZSBjb2x1bW5zIEFORCByb3dzLiAgVGhlcmUgYXJlIHNldmVyYWwgd2F5cyB0byBkbyB0aGlzIHNvIGZlZWwgZnJlZSB0byB0cnkgYSBjb3VwbGUgaWYgeW91IHdhbnQuICBNYWtlIHN1cmUgdG8gcmVuYW1lIHRoZSBuZXcgZGF0YSBzZXQgc28gaXQgc2ltcGx5IGp1c3QgZG9lc27igJl0IHdyaXRlIGl0IG92ZXIuDQoNCk5ldyBkYXRhIGZyYW1lIGlzIGRlc2lnbmVkIHRvIGdlbmVyYXRlIGxvdyBwcmljZSBob3VzZXMuIA0KDQpgYGB7ciBMb3dJbmNvbWVfaG91c2V9DQpsb3dJbmNvbWVfaG91c2UgPC0gc3Vic2V0KEhvdXNlUHJpY2VzLCBwcmljZSA8IDQwMDAwICYgDQogICAgICAgICAgICAgICAgICAgICAgICAgICAgIGRyaXZld2F5ID09ICdubycgJiBiZWRyb29tcyA8PSAyKVtjKCJwcmljZSIsImxvdHNpemUiLCJiZWRyb29tcyIsInN0b3JpZXMiLCJmdWxsYmFzZSIpXSANCmxvd0luY29tZV9ob3VzZQ0KYGBgDQoNCiMjIyAzLglDcmVhdGUgbmV3IGNvbHVtbiBuYW1lcyBmb3IgZWFjaCBjb2x1bW4gaW4gdGhlIG5ldyBkYXRhIGZyYW1lIGNyZWF0ZWQgaW4gc3RlcCAyLg0KDQpgYGB7cn0NCmNvbG5hbWVzKGxvd0luY29tZV9ob3VzZSkgPC0gYygiSG91c2VfVmFsdWUiLCJMb3RfU3BhY2UiLCJSb29tcyIsIkZsb29ycyIsIkJhc2VtZW50IikNCmxvd0luY29tZV9ob3VzZQ0KYGBgDQojIyMgNC4JVXNlIHRoZSBzdW1tYXJ5IGZ1bmN0aW9uIHRvIGNyZWF0ZSBhbiBvdmVydmlldyBvZiB5b3VyIG5ldyBkYXRhIGZyYW1lIGNyZWF0ZWQgaW4gc3RlcCAyLiAgVGhlIHByaW50IHRoZSBtZWFuIGFuZCBtZWRpYW4gZm9yIHRoZSBzYW1lIHR3byBhdHRyaWJ1dGVzLiAgUGxlYXNlIGNvbXBhcmUgKGkuZS4gdGVsbCBtZSBob3cgdGhlIHZhbHVlcyBjaGFuZ2VkIGFuZCB3aHkpLg0KDQpgYGB7ciBTdW1tYXJpemF0aW9ufQ0KYXBwbHkobG93SW5jb21lX2hvdXNlW2MoIkhvdXNlX1ZhbHVlIiwiRmxvb3JzIildLE1BUkdJTj0yLCBGVU4gPSBtZWFuKQ0KYXBwbHkobG93SW5jb21lX2hvdXNlW2MoIkhvdXNlX1ZhbHVlIiwiRmxvb3JzIildLE1BUkdJTj0yLCBGVU4gPSBtZWRpYW4pDQpgYGANClRoZSBtZWFuIGFuZCB0aGUgbWVkaWFuIG9mIHRoZSB2YXJpYWJsZXMgcHJpY2VzIGFuZCBzdG9yaWVzIGFyZSBhbG1vc3QgZG91YmxlIHRoZSBtZWFuIGFuZCBtZWRpYW4gaW4gdGhlIG5ld2x5IGJ1aWx0IGRhdGFzZXQuIFRoZSBsb3dJbmNvbWVfaG91c2UgaXMgdHJpbW1lZCBiYXNlZCBvbiBob3VzZSBwcmljZXMgbG93ZXIgdGhhbiA0MDAwMCB3aGljaCB3aWxsIHJlZHVjZSB0aGUgZGF0YXNldCBhbmQgbGVzcyB0aGFuIHRoZSBtZWRpYW4gb2YgdGhlIGxvYWRlZCBkYXRhc2V0LiANCg0KIyMjIDUuCUZvciBhdCBsZWFzdCAzIGRpZmZlcmVudC9kaXN0aW5jdCB2YWx1ZXMgaW4gYSBjb2x1bW4gcGxlYXNlIHJlbmFtZSBzbyB0aGF0IGV2ZXJ5IHZhbHVlIGluIHRoYXQgY29sdW1uIGlzIHJlbmFtZWQuICBGb3IgZXhhbXBsZSwgY2hhbmdlIHRoZSBsZXR0ZXIg4oCcZeKAnSB0byDigJxleGNlbGxlbnTigJ0sIHRoZSBsZXR0ZXIg4oCcYeKAnSB0byDigJxhdmVyYWdl4oCZIGFuZCB0aGUgd29yZCDigJxiYWTigJ0gdG8g4oCcdGVycmlibGXigJ0uDQoNCmBgYHtyIHJlY29yZGNoYW5naW5nfQ0KIyBBc3NpZ24gcm93IHZhbHVlIHRvIFRvbyBzbWFsbCBvciBzbWFsbCBvciBzdGFuZGFyZA0KbG93SW5jb21lX2hvdXNlJExvdF9TcGFjZVtsb3dJbmNvbWVfaG91c2UkTG90X1NwYWNlID09IDE4MzZdIDwtICdUb28gU21hbGwnDQpsb3dJbmNvbWVfaG91c2UkTG90X1NwYWNlW2xvd0luY29tZV9ob3VzZSRMb3RfU3BhY2UgPT0gMjkxMF0gPC0gJ1NtYWxsJw0KbG93SW5jb21lX2hvdXNlJExvdF9TcGFjZVtsb3dJbmNvbWVfaG91c2UkTG90X1NwYWNlID09IDM2MzVdIDwtICdTdGFuZGFyZCcNCg0KIyBBc3NpZ24gcm93IHZhbHVlIHRvIG5vdCBidWlsdCBvciBmaW5pc2hlZA0KDQpsb3dJbmNvbWVfaG91c2UkQmFzZW1lbnRbbG93SW5jb21lX2hvdXNlJEJhc2VtZW50ID09ICdubyddIDwtICdOb3QgQnVpbHQnDQpsb3dJbmNvbWVfaG91c2UkQmFzZW1lbnRbbG93SW5jb21lX2hvdXNlJEJhc2VtZW50ID09ICd5ZXMnXSA8LSAnRmluaXNoZWQnDQoNCmxvd0luY29tZV9ob3VzZQ0KYGBgDQoNCiMjIyA2LglEaXNwbGF5IGVub3VnaCByb3dzIHRvIHNlZSBleGFtcGxlcyBvZiBhbGwgb2Ygc3RlcHMgMS01IGFib3ZlLiAgVGhpcyBtZWFucyB1c2UgYSBmdW5jdGlvbiB0byBzaG93IG1lIGVub3VnaCByb3cgdmFsdWVzIHRoYXQgSSBjYW4gc2VlIHRoZSBjaGFuZ2VzLiAgDQoNCg0KYGBge3J9DQpoZWFkKGxvd0luY29tZV9ob3VzZSw3KQ0KYGBgDQoNCiMjIyA3LglCT05VUyDigJMgcGxhY2UgdGhlIG9yaWdpbmFsIC5jc3YgaW4gYSBnaXRodWIgZmlsZSBhbmQgaGF2ZSBSIHJlYWQgZnJvbSB0aGUgbGluay4gIFRoaXMgc2hvdWxkIGJlIHlvdXIgb3duIGdpdGh1YiDigJMgbm90IHRoZSBmaWxlIHNvdXJjZS4gIFRoaXMgd2lsbCBiZSBhIHZlcnkgdXNlZnVsIHNraWxsIGFzIHlvdSBwcm9ncmVzcyBpbiB5b3VyIGRhdGEgc2NpZW5jZSBlZHVjYXRpb24gYW5kIGNhcmVlci4NCg0KTXkgR2l0aHViIHVzZXJuYW1lIGlzIGpvZXdhcm5lcjg5DQoNCmBgYHtyfQ0KYm9udXMgPC0gcmVhZC5jc3YoImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9qb2V3YXJuZXI4OS9Ib3VzZS1QcmljZXMtQXNzaWdubWVudC0yL21haW4vSG91c2VQcmljZXMuY3N2IiwNCiAgICAgICAgICAgICAgICAgc3RyaW5nc0FzRmFjdG9ycyA9IEYsaGVhZGVyPVQsIHNlcD0iLCIpDQpoZWFkKGJvbnVzKQ0KYGBgDQoNCg==