1.Create a .CSV file (or optionally, a MySQL database!) that includes one of the “wide” datasets identified in the Week 6 Discussion items. You’re encouraged to use a “wide” structure similar to how the information appears above, sothat you can practice tidying and transformations as described below.

I chose dataset that was posted by Anthony Arroyo and that includes the information about the Cooperative Comparable Rental Income in Queens.

rent <- rbind(c("COMPARABLE RENTAL – 1 – Boro-Block-Lot",   "COMPARABLE RENTAL – 1 – Neighborhood", "COMPARABLE RENTAL – 1 – Total Units",  "COMPARABLE RENTAL – 1 – Year Built",   "COMPARABLE RENTAL – 1 – Gross SqFt",   "COMPARABLE RENTAL – 1 – Estimated Gross Income",   "COMPARABLE RENTAL – 2 – Boro-Block-Lot",   "COMPARABLE RENTAL – 2 – Neighborhood", "COMPARABLE RENTAL – 2 – Total Units",  "COMPARABLE RENTAL – 2 – Year Built",   "COMPARABLE RENTAL – 2 – Gross SqFt",   "COMPARABLE RENTAL – 2 – Estimated Gross Income"),
             c("4-08276-0658",  "DOUGLASTON",   54, 1977,   27690,  390932, "4-03205-0024", "FOREST HILLS", 20, 1931,   24408,  372944),
             c("4-03173-0013",  "REGO PARK",    61, 1949,   70910,  1130670,    "4-02135-0013", "FOREST HILLS", 135,    1960,   137700, 1889772),
             c("4-02134-0013",  "FOREST HILLS", 123,    1963,   135400, 1872338,    "4-01279-0001", "JACKSON HEIGHTS",  96, 1950,   95000,  1035405),
             c("4-01268-0001",  "JACKSON HEIGHTS",  66, 1951,   57888,  860870, "4-05046-0006", "FLUSHING-NORTH",   45, 1968,   41398,  605749),
             c("4-05046-0040",  "FLUSHING-NORTH",   63, 1959,   58399,  899224, "4-03234-0051", "FOREST HILLS", 228,    1929,   174480, 3520549),
             c("4-03155-0027",  "REGO PARK",    286,    1959,   275197, 4668060,    "4-03322-0014", "KEW GARDENS",  84, 1928,   83727,  1220740),
             c("4-03329-0054",  "KEW GARDENS",  54, 1924,   54300,  814069, "4-15555-0001", "FAR ROCKAWAY", 108,    1950,   87926,  1077972),
             c("4-08276-0658",  "DOUGLASTON",   54, 1977,   27690,  390932, "4-01268-0001", "JACKSON HEIGHTS",  66, 1951,   57888,  860870))

write.table(rent, file = "rent.csv", sep = ",", col.names=F, row.names=F)

2. Read the information from your .CSV file into R, and use tidyr and dplyr as needed to tidy and transform your data.

2.1 Load a csv file from Github

The file will be downloaded from the Github repository to csv file using read.csv function.

rent <- read.csv('https://raw.githubusercontent.com/ex-pr/DATA607/Project-2/rent.csv', header=TRUE, sep=",", check.names=FALSE)

By checking the data downloaded, we have 12 column with 8 rows.

head(rent, n=3)
##   COMPARABLE RENTAL – 1 – Boro-Block-Lot COMPARABLE RENTAL – 1 – Neighborhood
## 1                           4-08276-0658                           DOUGLASTON
## 2                           4-03173-0013                            REGO PARK
## 3                           4-02134-0013                         FOREST HILLS
##   COMPARABLE RENTAL – 1 – Total Units COMPARABLE RENTAL – 1 – Year Built
## 1                                  54                               1977
## 2                                  61                               1949
## 3                                 123                               1963
##   COMPARABLE RENTAL – 1 – Gross SqFt
## 1                              27690
## 2                              70910
## 3                             135400
##   COMPARABLE RENTAL – 1 – Estimated Gross Income
## 1                                         390932
## 2                                        1130670
## 3                                        1872338
##   COMPARABLE RENTAL – 2 – Boro-Block-Lot COMPARABLE RENTAL – 2 – Neighborhood
## 1                           4-03205-0024                         FOREST HILLS
## 2                           4-02135-0013                         FOREST HILLS
## 3                           4-01279-0001                      JACKSON HEIGHTS
##   COMPARABLE RENTAL – 2 – Total Units COMPARABLE RENTAL – 2 – Year Built
## 1                                  20                               1931
## 2                                 135                               1960
## 3                                  96                               1950
##   COMPARABLE RENTAL – 2 – Gross SqFt
## 1                              24408
## 2                             137700
## 3                              95000
##   COMPARABLE RENTAL – 2 – Estimated Gross Income
## 1                                         372944
## 2                                        1889772
## 3                                        1035405
summary(rent)
##  COMPARABLE RENTAL – 1 – Boro-Block-Lot COMPARABLE RENTAL – 1 – Neighborhood
##  Length:8                               Length:8                            
##  Class :character                       Class :character                    
##  Mode  :character                       Mode  :character                    
##                                                                             
##                                                                             
##                                                                             
##  COMPARABLE RENTAL – 1 – Total Units COMPARABLE RENTAL – 1 – Year Built
##  Min.   : 54.00                      Min.   :1924                      
##  1st Qu.: 54.00                      1st Qu.:1950                      
##  Median : 62.00                      Median :1959                      
##  Mean   : 95.12                      Mean   :1957                      
##  3rd Qu.: 80.25                      3rd Qu.:1966                      
##  Max.   :286.00                      Max.   :1977                      
##  COMPARABLE RENTAL – 1 – Gross SqFt
##  Min.   : 27690                    
##  1st Qu.: 47648                    
##  Median : 58144                    
##  Mean   : 88434                    
##  3rd Qu.: 87033                    
##  Max.   :275197                    
##  COMPARABLE RENTAL – 1 – Estimated Gross Income
##  Min.   : 390932                               
##  1st Qu.: 708285                               
##  Median : 880047                               
##  Mean   :1378387                               
##  3rd Qu.:1316087                               
##  Max.   :4668060                               
##  COMPARABLE RENTAL – 2 – Boro-Block-Lot COMPARABLE RENTAL – 2 – Neighborhood
##  Length:8                               Length:8                            
##  Class :character                       Class :character                    
##  Mode  :character                       Mode  :character                    
##                                                                             
##                                                                             
##                                                                             
##  COMPARABLE RENTAL – 2 – Total Units COMPARABLE RENTAL – 2 – Year Built
##  Min.   : 20.00                      Min.   :1928                      
##  1st Qu.: 60.75                      1st Qu.:1930                      
##  Median : 90.00                      Median :1950                      
##  Mean   : 97.75                      Mean   :1946                      
##  3rd Qu.:114.75                      3rd Qu.:1953                      
##  Max.   :228.00                      Max.   :1968                      
##  COMPARABLE RENTAL – 2 – Gross SqFt
##  Min.   : 24408                    
##  1st Qu.: 53766                    
##  Median : 85827                    
##  Mean   : 87816                    
##  3rd Qu.:105675                    
##  Max.   :174480                    
##  COMPARABLE RENTAL – 2 – Estimated Gross Income
##  Min.   : 372944                               
##  1st Qu.: 797090                               
##  Median :1056688                               
##  Mean   :1323000                               
##  3rd Qu.:1387998                               
##  Max.   :3520549

2.2 Transform data

Tidy data should follow the rules: each variable is its own column, each observation is its own row, and each value is its own cell. At the current moment, we have “wide data” an can transform it to “long data” as it is better to work with.
We are going to work with data from csv file and use tidyverse package to transform it.

First of all, we will rename column to make it more readable.

rent <- plyr::rename(rent, c("COMPARABLE RENTAL – 1 – Boro-Block-Lot" = "Boro_Block_Lot", "COMPARABLE RENTAL – 1 – Neighborhood" = "Neighborhood", "COMPARABLE RENTAL – 1 – Total Units" = "Total_Units", "COMPARABLE RENTAL – 1 – Year Built" = "Year_Built", "COMPARABLE RENTAL – 1 – Gross SqFt" = "Gross_SqFt", "COMPARABLE RENTAL – 1 – Estimated Gross Income" = "Estimated_Gross_Income","COMPARABLE RENTAL – 2 – Boro-Block-Lot" = "Boro_Block_Lot",    "COMPARABLE RENTAL – 2 – Neighborhood" = "Neighborhood",    "COMPARABLE RENTAL – 2 – Total Units" = "Total_Units",  "COMPARABLE RENTAL – 2 – Year Built" = "Year_Built",    "COMPARABLE RENTAL – 2 – Gross SqFt" = "Gross_SqFt",    "COMPARABLE RENTAL – 2 – Estimated Gross Income" = "Estimated_Gross_Income"))
## Warning: The plyr::rename operation has created duplicates for the following
## name(s): (`Boro_Block_Lot`, `Neighborhood`, `Total_Units`, `Year_Built`,
## `Gross_SqFt`, `Estimated_Gross_Income`)

Secondly, the table can be split in two by the columns “Boro_Block_Lot”, one table will be created (we will place one table under another) by using function bind_rows().

table1 <- rent[ , 1:6] 
table2 <- rent[ , 7:12] 
rent <- bind_rows(table1 , table2)
head(rent, n=3)
##   Boro_Block_Lot Neighborhood Total_Units Year_Built Gross_SqFt
## 1   4-08276-0658   DOUGLASTON          54       1977      27690
## 2   4-03173-0013    REGO PARK          61       1949      70910
## 3   4-02134-0013 FOREST HILLS         123       1963     135400
##   Estimated_Gross_Income
## 1                 390932
## 2                1130670
## 3                1872338

3. Perform the analysis requested in the discussion item.

The question for the analysis was to determine the net value of each unit.
Unfortunately, there is not enough information to calculate net value, no expenses are provided. But we can check which lot has the oldest house.
The oldest house is built in 1924 in Kew Gardens.

ggplot(rent,aes(x =Boro_Block_Lot,y = Year_Built)) + 
    geom_point(size=3, color='red') + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), plot.title = element_text(hjust = 0.5)) +
    labs(title="Year built", x= "Lot", y = "Year") 

rent[rent$Year_Built == min(rent$Year_Built),]
##   Boro_Block_Lot Neighborhood Total_Units Year_Built Gross_SqFt
## 7   4-03329-0054  KEW GARDENS          54       1924      54300
##   Estimated_Gross_Income
## 7                 814069

The house that will bring the most gross income to the owners.
The the bar plot below, Rego Park is a huge lot and has estimated gross income 4668060, more than any other lot in the table.

ggplot(rent, aes(x=`Boro_Block_Lot`, y=`Estimated_Gross_Income`)) + 
  geom_bar(stat = "identity")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1), plot.title = element_text(hjust = 0.5)) +
  labs(title="Income from each lot", x= "Lot", y = "Estimated income") 

rent[rent$`Estimated_Gross_Income` == max(rent$`Estimated_Gross_Income`),]
##   Boro_Block_Lot Neighborhood Total_Units Year_Built Gross_SqFt
## 6   4-03155-0027    REGO PARK         286       1959     275197
##   Estimated_Gross_Income
## 6                4668060

4. Conclusion.

During the work, we have reviewed how to organize data in a way called “tidy data” using package tidyverse, how to transform wide format to long format.
By analyzing data, we found that the oldest house in Queens was built in 1924! It is almost a hundred years old.
The “richest” block at the same time is the biggest, Rego Park, brings owner almost 5 millions dollars every year.