I chose dataset that was posted by Anthony Arroyo and that includes the information about the Cooperative Comparable Rental Income in Queens.
rent <- rbind(c("COMPARABLE RENTAL – 1 – Boro-Block-Lot", "COMPARABLE RENTAL – 1 – Neighborhood", "COMPARABLE RENTAL – 1 – Total Units", "COMPARABLE RENTAL – 1 – Year Built", "COMPARABLE RENTAL – 1 – Gross SqFt", "COMPARABLE RENTAL – 1 – Estimated Gross Income", "COMPARABLE RENTAL – 2 – Boro-Block-Lot", "COMPARABLE RENTAL – 2 – Neighborhood", "COMPARABLE RENTAL – 2 – Total Units", "COMPARABLE RENTAL – 2 – Year Built", "COMPARABLE RENTAL – 2 – Gross SqFt", "COMPARABLE RENTAL – 2 – Estimated Gross Income"),
c("4-08276-0658", "DOUGLASTON", 54, 1977, 27690, 390932, "4-03205-0024", "FOREST HILLS", 20, 1931, 24408, 372944),
c("4-03173-0013", "REGO PARK", 61, 1949, 70910, 1130670, "4-02135-0013", "FOREST HILLS", 135, 1960, 137700, 1889772),
c("4-02134-0013", "FOREST HILLS", 123, 1963, 135400, 1872338, "4-01279-0001", "JACKSON HEIGHTS", 96, 1950, 95000, 1035405),
c("4-01268-0001", "JACKSON HEIGHTS", 66, 1951, 57888, 860870, "4-05046-0006", "FLUSHING-NORTH", 45, 1968, 41398, 605749),
c("4-05046-0040", "FLUSHING-NORTH", 63, 1959, 58399, 899224, "4-03234-0051", "FOREST HILLS", 228, 1929, 174480, 3520549),
c("4-03155-0027", "REGO PARK", 286, 1959, 275197, 4668060, "4-03322-0014", "KEW GARDENS", 84, 1928, 83727, 1220740),
c("4-03329-0054", "KEW GARDENS", 54, 1924, 54300, 814069, "4-15555-0001", "FAR ROCKAWAY", 108, 1950, 87926, 1077972),
c("4-08276-0658", "DOUGLASTON", 54, 1977, 27690, 390932, "4-01268-0001", "JACKSON HEIGHTS", 66, 1951, 57888, 860870))
write.table(rent, file = "rent.csv", sep = ",", col.names=F, row.names=F)
The file will be downloaded from the Github repository to csv file using read.csv function.
rent <- read.csv('https://raw.githubusercontent.com/ex-pr/DATA607/Project-2/rent.csv', header=TRUE, sep=",", check.names=FALSE)
By checking the data downloaded, we have 12 column with 8 rows.
head(rent, n=3)
## COMPARABLE RENTAL – 1 – Boro-Block-Lot COMPARABLE RENTAL – 1 – Neighborhood
## 1 4-08276-0658 DOUGLASTON
## 2 4-03173-0013 REGO PARK
## 3 4-02134-0013 FOREST HILLS
## COMPARABLE RENTAL – 1 – Total Units COMPARABLE RENTAL – 1 – Year Built
## 1 54 1977
## 2 61 1949
## 3 123 1963
## COMPARABLE RENTAL – 1 – Gross SqFt
## 1 27690
## 2 70910
## 3 135400
## COMPARABLE RENTAL – 1 – Estimated Gross Income
## 1 390932
## 2 1130670
## 3 1872338
## COMPARABLE RENTAL – 2 – Boro-Block-Lot COMPARABLE RENTAL – 2 – Neighborhood
## 1 4-03205-0024 FOREST HILLS
## 2 4-02135-0013 FOREST HILLS
## 3 4-01279-0001 JACKSON HEIGHTS
## COMPARABLE RENTAL – 2 – Total Units COMPARABLE RENTAL – 2 – Year Built
## 1 20 1931
## 2 135 1960
## 3 96 1950
## COMPARABLE RENTAL – 2 – Gross SqFt
## 1 24408
## 2 137700
## 3 95000
## COMPARABLE RENTAL – 2 – Estimated Gross Income
## 1 372944
## 2 1889772
## 3 1035405
summary(rent)
## COMPARABLE RENTAL – 1 – Boro-Block-Lot COMPARABLE RENTAL – 1 – Neighborhood
## Length:8 Length:8
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## COMPARABLE RENTAL – 1 – Total Units COMPARABLE RENTAL – 1 – Year Built
## Min. : 54.00 Min. :1924
## 1st Qu.: 54.00 1st Qu.:1950
## Median : 62.00 Median :1959
## Mean : 95.12 Mean :1957
## 3rd Qu.: 80.25 3rd Qu.:1966
## Max. :286.00 Max. :1977
## COMPARABLE RENTAL – 1 – Gross SqFt
## Min. : 27690
## 1st Qu.: 47648
## Median : 58144
## Mean : 88434
## 3rd Qu.: 87033
## Max. :275197
## COMPARABLE RENTAL – 1 – Estimated Gross Income
## Min. : 390932
## 1st Qu.: 708285
## Median : 880047
## Mean :1378387
## 3rd Qu.:1316087
## Max. :4668060
## COMPARABLE RENTAL – 2 – Boro-Block-Lot COMPARABLE RENTAL – 2 – Neighborhood
## Length:8 Length:8
## Class :character Class :character
## Mode :character Mode :character
##
##
##
## COMPARABLE RENTAL – 2 – Total Units COMPARABLE RENTAL – 2 – Year Built
## Min. : 20.00 Min. :1928
## 1st Qu.: 60.75 1st Qu.:1930
## Median : 90.00 Median :1950
## Mean : 97.75 Mean :1946
## 3rd Qu.:114.75 3rd Qu.:1953
## Max. :228.00 Max. :1968
## COMPARABLE RENTAL – 2 – Gross SqFt
## Min. : 24408
## 1st Qu.: 53766
## Median : 85827
## Mean : 87816
## 3rd Qu.:105675
## Max. :174480
## COMPARABLE RENTAL – 2 – Estimated Gross Income
## Min. : 372944
## 1st Qu.: 797090
## Median :1056688
## Mean :1323000
## 3rd Qu.:1387998
## Max. :3520549
Tidy data should follow the rules: each variable is its own column, each observation is its own row, and each value is its own cell. At the current moment, we have “wide data” an can transform it to “long data” as it is better to work with.
We are going to work with data from csv file and use tidyverse package to transform it.
First of all, we will rename column to make it more readable.
rent <- plyr::rename(rent, c("COMPARABLE RENTAL – 1 – Boro-Block-Lot" = "Boro_Block_Lot", "COMPARABLE RENTAL – 1 – Neighborhood" = "Neighborhood", "COMPARABLE RENTAL – 1 – Total Units" = "Total_Units", "COMPARABLE RENTAL – 1 – Year Built" = "Year_Built", "COMPARABLE RENTAL – 1 – Gross SqFt" = "Gross_SqFt", "COMPARABLE RENTAL – 1 – Estimated Gross Income" = "Estimated_Gross_Income","COMPARABLE RENTAL – 2 – Boro-Block-Lot" = "Boro_Block_Lot", "COMPARABLE RENTAL – 2 – Neighborhood" = "Neighborhood", "COMPARABLE RENTAL – 2 – Total Units" = "Total_Units", "COMPARABLE RENTAL – 2 – Year Built" = "Year_Built", "COMPARABLE RENTAL – 2 – Gross SqFt" = "Gross_SqFt", "COMPARABLE RENTAL – 2 – Estimated Gross Income" = "Estimated_Gross_Income"))
## Warning: The plyr::rename operation has created duplicates for the following
## name(s): (`Boro_Block_Lot`, `Neighborhood`, `Total_Units`, `Year_Built`,
## `Gross_SqFt`, `Estimated_Gross_Income`)
Secondly, the table can be split in two by the columns “Boro_Block_Lot”, one table will be created (we will place one table under another) by using function bind_rows().
table1 <- rent[ , 1:6]
table2 <- rent[ , 7:12]
rent <- bind_rows(table1 , table2)
head(rent, n=3)
## Boro_Block_Lot Neighborhood Total_Units Year_Built Gross_SqFt
## 1 4-08276-0658 DOUGLASTON 54 1977 27690
## 2 4-03173-0013 REGO PARK 61 1949 70910
## 3 4-02134-0013 FOREST HILLS 123 1963 135400
## Estimated_Gross_Income
## 1 390932
## 2 1130670
## 3 1872338
The question for the analysis was to determine the net value of each unit.
Unfortunately, there is not enough information to calculate net value, no expenses are provided. But we can check which lot has the oldest house.
The oldest house is built in 1924 in Kew Gardens.
ggplot(rent,aes(x =Boro_Block_Lot,y = Year_Built)) +
geom_point(size=3, color='red') +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), plot.title = element_text(hjust = 0.5)) +
labs(title="Year built", x= "Lot", y = "Year")
rent[rent$Year_Built == min(rent$Year_Built),]
## Boro_Block_Lot Neighborhood Total_Units Year_Built Gross_SqFt
## 7 4-03329-0054 KEW GARDENS 54 1924 54300
## Estimated_Gross_Income
## 7 814069
The house that will bring the most gross income to the owners.
The the bar plot below, Rego Park is a huge lot and has estimated gross income 4668060, more than any other lot in the table.
ggplot(rent, aes(x=`Boro_Block_Lot`, y=`Estimated_Gross_Income`)) +
geom_bar(stat = "identity")+
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), plot.title = element_text(hjust = 0.5)) +
labs(title="Income from each lot", x= "Lot", y = "Estimated income")
rent[rent$`Estimated_Gross_Income` == max(rent$`Estimated_Gross_Income`),]
## Boro_Block_Lot Neighborhood Total_Units Year_Built Gross_SqFt
## 6 4-03155-0027 REGO PARK 286 1959 275197
## Estimated_Gross_Income
## 6 4668060
During the work, we have reviewed how to organize data in a way called “tidy data” using package tidyverse, how to transform wide format to long format.
By analyzing data, we found that the oldest house in Queens was built in 1924! It is almost a hundred years old.
The “richest” block at the same time is the biggest, Rego Park, brings owner almost 5 millions dollars every year.