This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Homes <- read.csv('D:/DataSet/Homes.csv')
class(Homes)
## [1] "data.frame"
str(Homes)
## 'data.frame': 492 obs. of 8 variables:
## $ in_sf : int 0 0 0 0 0 0 0 0 0 0 ...
## $ beds : num 2 2 2 1 0 0 1 1 1 2 ...
## $ bath : num 1 2 2 1 1 1 1 1 1 1 ...
## $ price : int 999000 2750000 1350000 629000 439000 439000 475000 975000 975000 1895000 ...
## $ year_built : int 1960 2006 1900 1903 1930 1930 1920 1930 1930 1921 ...
## $ sqft : int 1000 1418 2150 500 500 500 500 900 900 1000 ...
## $ price_per_sqft: int 999 1939 628 1258 878 878 950 1083 1083 1895 ...
## $ elevation : int 10 0 9 9 10 10 10 10 12 12 ...
summary(Homes)
## in_sf beds bath price
## Min. :0.0000 Min. : 0.000 Min. : 1.000 Min. : 187518
## 1st Qu.:0.0000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 749000
## Median :1.0000 Median : 2.000 Median : 2.000 Median : 1145000
## Mean :0.5447 Mean : 2.155 Mean : 1.906 Mean : 2020696
## 3rd Qu.:1.0000 3rd Qu.: 3.000 3rd Qu.: 2.000 3rd Qu.: 1908750
## Max. :1.0000 Max. :10.000 Max. :10.000 Max. :27500000
## year_built sqft price_per_sqft elevation
## Min. :1880 Min. : 310.0 Min. : 270.0 Min. : 0.00
## 1st Qu.:1924 1st Qu.: 832.8 1st Qu.: 730.5 1st Qu.: 10.00
## Median :1960 Median :1312.0 Median : 960.0 Median : 18.50
## Mean :1959 Mean :1523.0 Mean :1195.6 Mean : 39.85
## 3rd Qu.:2001 3rd Qu.:1809.0 3rd Qu.:1419.0 3rd Qu.: 61.00
## Max. :2016 Max. :7800.0 Max. :4601.0 Max. :238.00
#data documentation
Data include Homes information (in_sf, beds, bath, price, year_built, sqft, price_per_sqft, elevation), the number of homes details Attribute Information in_sf - Information room booked or not beds - Number of Beds bath- Number of washrooms price - House Price year_built - In which year it built sqft - square feet price_per_sqft - price per square feet elevation - elevation details of building
#goals/purpose
Data is related to determine whether a home is in San Francisco or New York. We have taken standard deviation of price and variation for price per square feet. Where boxplot have taken consideration on sqft price. And at last San Francisco is relatively hilly, the elevation of a home may be a good way to distinguish the areas, so we compare cities based on elevation with sqft.
standard_deviation <- sd(Homes$price, na.rm= TRUE)
variation <- var(Homes$price_per_sqft, na.rm= TRUE)
summ <- sum(Homes$beds)
print(standard_deviation)
## [1] 2824055
print(variation)
## [1] 538412
print(summ)
## [1] 1060.5
plot(Homes$sqft, Homes$price_per_sqft, main="Scatter Plot of X vs Y", xlab="sqft", ylab="price_per_sqft")
hist(Homes$beds, main="Histogram of Quantity", xlab="beds")
boxplot(Homes$price_per_sqft, main="Box Plot of pricesper sqft")
pie(table(Homes$bath), main="Pie Chart of Category")
result <- aggregate(Homes$sqft,by=list(Homes$elevation), mean)
result
## Group.1 x
## 1 0 1503.5556
## 2 1 1805.5000
## 3 2 1333.6667
## 4 3 1990.3000
## 5 4 1558.5556
## 6 5 1272.4000
## 7 6 854.8750
## 8 7 1063.5714
## 9 8 1116.1538
## 10 9 1949.6250
## 11 10 1386.9600
## 12 11 1075.1000
## 13 12 1591.9286
## 14 13 1312.5556
## 15 14 1107.2500
## 16 15 2484.6429
## 17 16 939.1667
## 18 17 935.6000
## 19 18 1097.4286
## 20 19 965.1429
## 21 20 1869.0000
## 22 21 2826.2000
## 23 22 1766.6667
## 24 23 1431.9000
## 25 24 1137.1818
## 26 25 938.0000
## 27 26 2378.3333
## 28 27 759.7500
## 29 29 1271.0000
## 30 30 1140.0000
## 31 31 998.0000
## 32 32 720.0000
## 33 33 1389.0000
## 34 34 1457.5000
## 35 35 1142.1667
## 36 36 1030.0000
## 37 38 2100.0000
## 38 39 1049.0000
## 39 41 1264.3333
## 40 42 1500.0000
## 41 43 1144.0000
## 42 44 1316.0000
## 43 46 988.0000
## 44 48 1462.0000
## 45 49 2708.0000
## 46 50 1106.0000
## 47 51 1317.3333
## 48 52 1790.7500
## 49 53 950.0000
## 50 54 1552.6000
## 51 55 1984.7500
## 52 56 1110.0000
## 53 57 1877.0000
## 54 58 1970.0000
## 55 59 2141.0000
## 56 60 1051.5000
## 57 61 1374.0000
## 58 62 3479.0000
## 59 63 1256.0000
## 60 64 1050.0000
## 61 65 1772.0000
## 62 66 1506.2500
## 63 67 4628.5000
## 64 68 1282.5000
## 65 69 1234.6667
## 66 70 832.3333
## 67 71 1145.0000
## 68 72 1012.5000
## 69 73 1782.6000
## 70 74 685.0000
## 71 75 2601.0000
## 72 76 2520.5000
## 73 77 1675.0000
## 74 79 1760.0000
## 75 80 667.0000
## 76 81 2345.0000
## 77 83 2575.0000
## 78 84 1084.3333
## 79 86 1415.0000
## 80 87 2347.0000
## 81 88 1915.0000
## 82 89 2458.0000
## 83 90 2157.0000
## 84 91 1006.6667
## 85 92 1350.0000
## 86 94 1625.0000
## 87 95 1513.0000
## 88 97 2159.5000
## 89 98 3406.5000
## 90 102 1806.5000
## 91 103 1515.0000
## 92 105 1520.0000
## 93 106 1543.7500
## 94 108 2528.0000
## 95 110 1450.0000
## 96 112 1635.0000
## 97 118 1310.0000
## 98 119 2330.0000
## 99 121 1284.0000
## 100 123 1254.0000
## 101 125 2168.0000
## 102 127 3258.0000
## 103 131 2050.0000
## 104 136 1595.0000
## 105 139 1925.6667
## 106 140 1752.0000
## 107 141 1752.0000
## 108 143 1785.5000
## 109 153 2001.0000
## 110 160 1905.0000
## 111 163 1483.5000
## 112 174 3729.0000
## 113 176 2769.0000
## 114 179 1626.0000
## 115 181 2376.0000
## 116 185 1524.0000
## 117 187 1868.0000
## 118 189 990.0000
## 119 216 1305.0000
## 120 227 1603.0000
## 121 238 4813.0000
barplot(result$x, names.arg=result$Group.1, xlab="sqft", ylab="elevation",
main="sqft vs elevation",border="black")