About the dataset:
This dataset was taken from the following website:
DOF: Cooperative Comparable Rental Income – Queens – FY
2011/2012
The Department of Finance (DOF) is required by NY State law to value
condominiums or cooperatives as if they were residential rental
apartment buildings. DOF uses income information from rental properties
similar in physical features and location to the condominiums or
cooperatives. DOF applies this income data to the condominium or
cooperative and determine its value in the same way DOF values rental
apartment buildings. This is update annually.
What is the Net Value per Rental? Which is the most expensive and
which is the cheapest?
# Upload the libraries.
library(tidyr)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ dplyr 1.0.9
## ✔ tibble 3.1.8 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ✔ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
library(ggplot2)
# Import the data from github.
# Link is provided to the csv file below:
# https://github.com/enidroman/data_607_data_aquisition_and_management_project/blob/main/Anthony%20Arroyo%20Untidy%20Dataset%20Rental.csv
urlfile <- "https://raw.githubusercontent.com/enidroman/data_607_data_aquisition_and_management_project/main/Anthony%20Arroyo%20Untidy%20Dataset%20Rental.csv"
comp_rental <- read.csv(urlfile)
comp_rental
## COMPARABLE.RENTAL...1...Boro.Block.Lot COMPARABLE.RENTAL...1...Neighborhood
## 1 4-08276-0658 DOUGLASTON
## 2 4-03173-0013 REGO PARK
## 3 4-02134-0013 FOREST HILLS
## 4 4-01268-0001 JACKSON HEIGHTS
## 5 4-05046-0040 FLUSHING-NORTH
## 6 4-03155-0027 REGO PARK
## 7 4-03329-0054 KEW GARDENS
## 8 4-08276-0658 DOUGLASTON
## COMPARABLE.RENTAL...1...Total.Units COMPARABLE.RENTAL...1...Year.Built
## 1 54 1977
## 2 61 1949
## 3 123 1963
## 4 66 1951
## 5 63 1959
## 6 286 1959
## 7 54 1924
## 8 54 1977
## COMPARABLE.RENTAL...1...Gross.SqFt
## 1 27690
## 2 70910
## 3 135400
## 4 57888
## 5 58399
## 6 275197
## 7 54300
## 8 27690
## COMPARABLE.RENTAL...1...Estimated.Gross.Income
## 1 390932
## 2 1130670
## 3 1872338
## 4 860870
## 5 899224
## 6 4668060
## 7 814069
## 8 390932
## COMPARABLE.RENTAL...2...Boro.Block.Lot COMPARABLE.RENTAL...2...Neighborhood
## 1 4-03205-0024 FOREST HILLS
## 2 4-02135-0013 FOREST HILLS
## 3 4-01279-0001 JACKSON HEIGHTS
## 4 4-05046-0006 FLUSHING-NORTH
## 5 4-03234-0051 FOREST HILLS
## 6 4-03322-0014 KEW GARDENS
## 7 4-15555-0001 FAR ROCKAWAY
## 8 4-01268-0001 JACKSON HEIGHTS
## COMPARABLE.RENTAL...2...Total.Units COMPARABLE.RENTAL...2...Year.Built
## 1 20 1931
## 2 135 1960
## 3 96 1950
## 4 45 1968
## 5 228 1929
## 6 84 1928
## 7 108 1950
## 8 66 1951
## COMPARABLE.RENTAL...2...Gross.SqFt
## 1 24408
## 2 137700
## 3 95000
## 4 41398
## 5 174480
## 6 83727
## 7 87926
## 8 57888
## COMPARABLE.RENTAL...2...Estimated.Gross.Income
## 1 372944
## 2 1889772
## 3 1035405
## 4 605749
## 5 3520549
## 6 1220740
## 7 1077972
## 8 860870
DATA CLEANING AND TRANSFORMATION
In observing the dataset I see that:
1. The first 6 columns are Comparable Rental 1 and the 6 columns
after are Comparable Rental 2. Seem to have been 2 different datasets
joined together. In order to combine the columns I need to delete the
extra 6 columns and bring down the obersavations from the those columns
to the first 6 columns. To do that I had to create 2 dataframes, one
with the Comparable Rental 1 and the second with the Comparable Rental
2.
2. The columns need to be renamed. I then renamed each columns from
both data frames. Then I was able to rebind the 2 dataframes and combine
the first 6 columns and the second 6 columns.
3. The analysis that Anthony Arroyo wanted to make with this
dataframe is to determine the net value of each unit. Need to create and
add a column for Estimated Expense and Net Operating Income with
observations to perform the analysis.
4. Need to calculate, create, and add a column called Net Value of
Rental.
5. Need to check the classification for the observation in each
column to make sure that the class is correct. Otherwise need to convert
them to the correct class.
# Created a new dataframe witn Comparable Rental 1 only.
comp_rental1 <- select(comp_rental, COMPARABLE.RENTAL...1...Boro.Block.Lot, COMPARABLE.RENTAL...1...Neighborhood, COMPARABLE.RENTAL...1...Total.Units, COMPARABLE.RENTAL...1...Year.Built, COMPARABLE.RENTAL...1...Gross.SqFt, COMPARABLE.RENTAL...1...Estimated.Gross.Income)
comp_rental1
## COMPARABLE.RENTAL...1...Boro.Block.Lot COMPARABLE.RENTAL...1...Neighborhood
## 1 4-08276-0658 DOUGLASTON
## 2 4-03173-0013 REGO PARK
## 3 4-02134-0013 FOREST HILLS
## 4 4-01268-0001 JACKSON HEIGHTS
## 5 4-05046-0040 FLUSHING-NORTH
## 6 4-03155-0027 REGO PARK
## 7 4-03329-0054 KEW GARDENS
## 8 4-08276-0658 DOUGLASTON
## COMPARABLE.RENTAL...1...Total.Units COMPARABLE.RENTAL...1...Year.Built
## 1 54 1977
## 2 61 1949
## 3 123 1963
## 4 66 1951
## 5 63 1959
## 6 286 1959
## 7 54 1924
## 8 54 1977
## COMPARABLE.RENTAL...1...Gross.SqFt
## 1 27690
## 2 70910
## 3 135400
## 4 57888
## 5 58399
## 6 275197
## 7 54300
## 8 27690
## COMPARABLE.RENTAL...1...Estimated.Gross.Income
## 1 390932
## 2 1130670
## 3 1872338
## 4 860870
## 5 899224
## 6 4668060
## 7 814069
## 8 390932
# Renamed the columns in comp_rental1.
rename_comp_rental1 <- comp_rental1 %>%
rename(Boro_Block_Lot = COMPARABLE.RENTAL...1...Boro.Block.Lot,
Neighborhood = COMPARABLE.RENTAL...1...Neighborhood,
Total_Units = COMPARABLE.RENTAL...1...Total.Units,
Year_Built = COMPARABLE.RENTAL...1...Year.Built,
Gross_Sq_Ft = COMPARABLE.RENTAL...1...Gross.SqFt,
Gross_Income = COMPARABLE.RENTAL...1...Estimated.Gross.Income)
rename_comp_rental1
## Boro_Block_Lot Neighborhood Total_Units Year_Built Gross_Sq_Ft
## 1 4-08276-0658 DOUGLASTON 54 1977 27690
## 2 4-03173-0013 REGO PARK 61 1949 70910
## 3 4-02134-0013 FOREST HILLS 123 1963 135400
## 4 4-01268-0001 JACKSON HEIGHTS 66 1951 57888
## 5 4-05046-0040 FLUSHING-NORTH 63 1959 58399
## 6 4-03155-0027 REGO PARK 286 1959 275197
## 7 4-03329-0054 KEW GARDENS 54 1924 54300
## 8 4-08276-0658 DOUGLASTON 54 1977 27690
## Gross_Income
## 1 390932
## 2 1130670
## 3 1872338
## 4 860870
## 5 899224
## 6 4668060
## 7 814069
## 8 390932
# Created a new dataframe witn Comparable Rental 2 only.
comp_rental2 <- select(comp_rental, COMPARABLE.RENTAL...2...Boro.Block.Lot, COMPARABLE.RENTAL...2...Neighborhood, COMPARABLE.RENTAL...2...Total.Units, COMPARABLE.RENTAL...2...Year.Built, COMPARABLE.RENTAL...2...Gross.SqFt, COMPARABLE.RENTAL...2...Estimated.Gross.Income)
comp_rental2
## COMPARABLE.RENTAL...2...Boro.Block.Lot COMPARABLE.RENTAL...2...Neighborhood
## 1 4-03205-0024 FOREST HILLS
## 2 4-02135-0013 FOREST HILLS
## 3 4-01279-0001 JACKSON HEIGHTS
## 4 4-05046-0006 FLUSHING-NORTH
## 5 4-03234-0051 FOREST HILLS
## 6 4-03322-0014 KEW GARDENS
## 7 4-15555-0001 FAR ROCKAWAY
## 8 4-01268-0001 JACKSON HEIGHTS
## COMPARABLE.RENTAL...2...Total.Units COMPARABLE.RENTAL...2...Year.Built
## 1 20 1931
## 2 135 1960
## 3 96 1950
## 4 45 1968
## 5 228 1929
## 6 84 1928
## 7 108 1950
## 8 66 1951
## COMPARABLE.RENTAL...2...Gross.SqFt
## 1 24408
## 2 137700
## 3 95000
## 4 41398
## 5 174480
## 6 83727
## 7 87926
## 8 57888
## COMPARABLE.RENTAL...2...Estimated.Gross.Income
## 1 372944
## 2 1889772
## 3 1035405
## 4 605749
## 5 3520549
## 6 1220740
## 7 1077972
## 8 860870
# Renamed the columns in comp_rental2.
rename_comp_rental2 <- comp_rental2 %>%
rename(Boro_Block_Lot = COMPARABLE.RENTAL...2...Boro.Block.Lot,
Neighborhood = COMPARABLE.RENTAL...2...Neighborhood,
Total_Units = COMPARABLE.RENTAL...2...Total.Units,
Year_Built = COMPARABLE.RENTAL...2...Year.Built,
Gross_Sq_Ft = COMPARABLE.RENTAL...2...Gross.SqFt,
Gross_Income = COMPARABLE.RENTAL...2...Estimated.Gross.Income)
rename_comp_rental2
## Boro_Block_Lot Neighborhood Total_Units Year_Built Gross_Sq_Ft
## 1 4-03205-0024 FOREST HILLS 20 1931 24408
## 2 4-02135-0013 FOREST HILLS 135 1960 137700
## 3 4-01279-0001 JACKSON HEIGHTS 96 1950 95000
## 4 4-05046-0006 FLUSHING-NORTH 45 1968 41398
## 5 4-03234-0051 FOREST HILLS 228 1929 174480
## 6 4-03322-0014 KEW GARDENS 84 1928 83727
## 7 4-15555-0001 FAR ROCKAWAY 108 1950 87926
## 8 4-01268-0001 JACKSON HEIGHTS 66 1951 57888
## Gross_Income
## 1 372944
## 2 1889772
## 3 1035405
## 4 605749
## 5 3520549
## 6 1220740
## 7 1077972
## 8 860870
# Merged both dataframes to make it into 1 dataframe again.
new_comp_rental <- rbind(rename_comp_rental1, rename_comp_rental2)
new_comp_rental
## Boro_Block_Lot Neighborhood Total_Units Year_Built Gross_Sq_Ft
## 1 4-08276-0658 DOUGLASTON 54 1977 27690
## 2 4-03173-0013 REGO PARK 61 1949 70910
## 3 4-02134-0013 FOREST HILLS 123 1963 135400
## 4 4-01268-0001 JACKSON HEIGHTS 66 1951 57888
## 5 4-05046-0040 FLUSHING-NORTH 63 1959 58399
## 6 4-03155-0027 REGO PARK 286 1959 275197
## 7 4-03329-0054 KEW GARDENS 54 1924 54300
## 8 4-08276-0658 DOUGLASTON 54 1977 27690
## 9 4-03205-0024 FOREST HILLS 20 1931 24408
## 10 4-02135-0013 FOREST HILLS 135 1960 137700
## 11 4-01279-0001 JACKSON HEIGHTS 96 1950 95000
## 12 4-05046-0006 FLUSHING-NORTH 45 1968 41398
## 13 4-03234-0051 FOREST HILLS 228 1929 174480
## 14 4-03322-0014 KEW GARDENS 84 1928 83727
## 15 4-15555-0001 FAR ROCKAWAY 108 1950 87926
## 16 4-01268-0001 JACKSON HEIGHTS 66 1951 57888
## Gross_Income
## 1 390932
## 2 1130670
## 3 1872338
## 4 860870
## 5 899224
## 6 4668060
## 7 814069
## 8 390932
## 9 372944
## 10 1889772
## 11 1035405
## 12 605749
## 13 3520549
## 14 1220740
## 15 1077972
## 16 860870
ANALYSIS
In order to do the analysis I had to search on the origional
dataset, DOF: Cooperative Comparable Rental Income – Queens – FY
2011/2012, for Estimated Expense and Net Operating Income for each Boro
Block Lot and create and add a colomn called Estimated Expense and Net
Operating Income with the observations to the dataframe.
Please Note: Gross Income - Estimated Expense = Net Operating
Income.
I only had to add the column for Net Operating Income (which was
already on the origional dataset) but I wanted you to show how Net
Operating Income was calculated.
# Create and add columns with observation called Estimated Expense and Net Operating Income to be able to perform my analysis.
Estimated_Expense <- c(149526, 452085, 796014, 374654, 352611, 2367993, 357727, 149526, 165174, 1026891, 553582, 266036, 1461326, 647210, 484086, 374654)
Net_Operating_Income <- c(241406, 678585, 1076324, 486216, 546613, 2300067, 456342, 241406, 207770, 862881, 481823, 339713, 2059223, 573530, 593886, 486216)
new_comp_rental <- cbind(new_comp_rental, Estimated_Expense, Net_Operating_Income)
new_comp_rental
## Boro_Block_Lot Neighborhood Total_Units Year_Built Gross_Sq_Ft
## 1 4-08276-0658 DOUGLASTON 54 1977 27690
## 2 4-03173-0013 REGO PARK 61 1949 70910
## 3 4-02134-0013 FOREST HILLS 123 1963 135400
## 4 4-01268-0001 JACKSON HEIGHTS 66 1951 57888
## 5 4-05046-0040 FLUSHING-NORTH 63 1959 58399
## 6 4-03155-0027 REGO PARK 286 1959 275197
## 7 4-03329-0054 KEW GARDENS 54 1924 54300
## 8 4-08276-0658 DOUGLASTON 54 1977 27690
## 9 4-03205-0024 FOREST HILLS 20 1931 24408
## 10 4-02135-0013 FOREST HILLS 135 1960 137700
## 11 4-01279-0001 JACKSON HEIGHTS 96 1950 95000
## 12 4-05046-0006 FLUSHING-NORTH 45 1968 41398
## 13 4-03234-0051 FOREST HILLS 228 1929 174480
## 14 4-03322-0014 KEW GARDENS 84 1928 83727
## 15 4-15555-0001 FAR ROCKAWAY 108 1950 87926
## 16 4-01268-0001 JACKSON HEIGHTS 66 1951 57888
## Gross_Income Estimated_Expense Net_Operating_Income
## 1 390932 149526 241406
## 2 1130670 452085 678585
## 3 1872338 796014 1076324
## 4 860870 374654 486216
## 5 899224 352611 546613
## 6 4668060 2367993 2300067
## 7 814069 357727 456342
## 8 390932 149526 241406
## 9 372944 165174 207770
## 10 1889772 1026891 862881
## 11 1035405 553582 481823
## 12 605749 266036 339713
## 13 3520549 1461326 2059223
## 14 1220740 647210 573530
## 15 1077972 484086 593886
## 16 860870 374654 486216
Thought if I convert the Net Operating Income from double to
interger the graph would fix but did not succeed. Line graph did not
work at all for me no lines and doing the conversion from double to
interger did not work either.
# Convert Net Operating Income and Net Value Per Rental to integer from double.
numbers <- analysis
numbers$Net_Operating_Income <- as.integer(numbers$Net_Operating_Income) # First column is a double.
numbers$Net_Value_Per_Rental <- as.integer(numbers$Net_Value_Per_Rental) # Second column is a double.
sapply(numbers, class)
## Boro_Block_Lot Neighborhood Total_Units
## "character" "character" "integer"
## Year_Built Gross_Sq_Ft Gross_Income
## "integer" "integer" "integer"
## Estimated_Expense Net_Operating_Income Net_Value_Per_Rental
## "numeric" "integer" "integer"
Please note for some reason my bar graph and scatter plot is only
plotting 14 observations instead of 16.
Not sure if the calculation are correct the Net Value for each Unit
by dividing Net Operating Income and Total Units .
As per the below graphs Bar Block Lot in Jackson Heights is has more
expensive Net Value Per Rent at close to 15000. The lowest Net Value Per
Rent is Bar Block Lot in Jackson Heights also at close to 5000. Looking
at the numbers from the above dataframe Bar Block Lot in Rego Park has
the most expensive Net Value Per Rent at 11124.344 and the lowest is
Douglaston at 4470.481.
# Bar gaph showing Net Value per Boro Block Lot by Neighborhood.
#ggplot(numbers, aes(x = Net_Value_Per_Rental, y = Boro_Block_Lot, colour = Neighborhood)) +
#geom_line()
ggplot(numbers, aes(x = Net_Value_Per_Rental, y = Boro_Block_Lot, fill = Neighborhood)) +
geom_col()

ggplot(numbers, aes(x = Net_Value_Per_Rental, y = Boro_Block_Lot, colour = Neighborhood)) +
geom_point(size = 5) +
scale_shape_manual(values = c(1,2,3,4,5,6,7)) +
scale_colour_hue(l = 45)
