Introduction

For This tidyverse assignment, I choose to use the following libraries:

  • dplyr
  • ggplot2
  • magrittr
  • forcats

I did not do extensive data filtering for this assignment, but rather choose to demostrate the capabilities of the tidyverse libraries.

Data Pull

For my dataset, I choose to use kaggle with a gross avg rent across the US. The dataset is fairly long and has several fields. Not all are useful, but in the TidyVerse section, I use some of the ddplyr libaries to filter out what I need.

rent.data <- read.csv("https://raw.githubusercontent.com/dapolloxp/spring2019tidyverse/master/kaggle_gross_rent.csv")

TidyVerse Libraries

In this section, I’m using some of the tidyverse libaries, such as piping in magrittr and dplyr which make it extremely easy to filter and select fields. I also chose fields using select in ddplyr and group_by to aggregrate information.

I choose to count the number of zip codes with the piping built-into dplyr

sub.rent.data <- rent.data %>% select(State_Name, State_ab, County, City, Place, Type, Primary, Zip_Code, Lat, Lon, Mean, Median, Stdev, Samples)

sub.rent.data$Zip_Code <- as.character(sub.rent.data$Zip_Code) 

head(sub.rent.data)
##   State_Name State_ab          County         City            Place Type
## 1    Alabama       AL Chambers County       Wadley           Abanda  CDP
## 2    Alabama       AL  Winston County      Addison          Addison Town
## 3    Alabama       AL Marshall County  Albertville Albertville city City
## 4    Alabama       AL  Pickens County   Aliceville  Aliceville city City
## 5    Alabama       AL   Etowah County Walnut Grove          Altoona Town
## 6    Alabama       AL  Calhoun County     Anniston    Anniston city City
##   Primary Zip_Code      Lat       Lon Mean Median Stdev Samples
## 1   place    36276 33.09163 -85.52703  972    968    51      12
## 2   place    35540 34.20268 -87.17800  519    460   275      64
## 3   place    35950 34.26313 -86.21066  625    585   234    2560
## 4   place    35442 33.12369 -88.15936  546    438   354     574
## 5   place    35990 34.03920 -86.30570  350    303   185     114
## 6   place    36207 33.67344 -85.81092  600    599   274    3901
#datatable(sub.rent.data)

x <- function(zips)
{
  
  for (i in 1:length(zips)) 
  {
  paste0("Processing", zips[i])
  if(nchar(zips[i]) == 4)
    {
      zips[i] <- paste0("0", zips[i], sep = "")
    }
  
  else if(nchar(zips[i]) == 3)
    {
      zips[i] <- paste0("00", zips[i], sep ="")
    }
  }
  return(zips)
}
#sapply(sub.rent.data$Zip_Code, x)

#

sub.rent.data$Zip_Code <- as.factor(sub.rent.data$Zip_Code) 

sub.rent.data$Zip_Code %>% fct_count(sort = TRUE)
## # A tibble: 17,006 x 2
##    f         n
##    <fct> <int>
##  1 78584    22
##  2 94606    20
##  3 36605    18
##  4 731      18
##  5 35215    17
##  6 35630    17
##  7 87532    17
##  8 11203    16
##  9 35020    16
## 10 36117    16
## # ... with 16,996 more rows
avg.rent.byzip <- sub.rent.data %>% group_by(Zip_Code) %>% summarise(AvgRent=mean(Mean))

avg.rent.bystate <- sub.rent.data %>% group_by(State_Name) %>% summarise(AvgRent=mean(Mean))

Graphing

In this section, I am using ggplot to create a basic scatterplot as it is one of the most useful libraries.

ggplot(avg.rent.byzip, aes(x=Zip_Code, y=AvgRent)) + geom_point()

ggplot(avg.rent.bystate, aes(x=avg.rent.bystate$State_Name,y=AvgRent)) + geom_point() + theme(axis.text.x = element_text(angle = 90)) + xlab("State Name") + ylab("Average Rent")


Addition by Debabrata Kabiraj

Original

Show the original data format

head(sub.rent.data,10)
##    State_Name State_ab           County         City            Place Type
## 1     Alabama       AL  Chambers County       Wadley           Abanda  CDP
## 2     Alabama       AL   Winston County      Addison          Addison Town
## 3     Alabama       AL  Marshall County  Albertville Albertville city City
## 4     Alabama       AL   Pickens County   Aliceville  Aliceville city City
## 5     Alabama       AL    Etowah County Walnut Grove          Altoona Town
## 6     Alabama       AL   Calhoun County     Anniston    Anniston city City
## 7     Alabama       AL Limestone County      Ardmore          Ardmore Town
## 8     Alabama       AL      Dale County       Ariton           Ariton Town
## 9     Alabama       AL      Clay County      Ashland          Ashland Town
## 10    Alabama       AL  Escambia County       Atmore      Atmore city City
##    Primary Zip_Code      Lat       Lon Mean Median Stdev Samples
## 1    place    36276 33.09163 -85.52703  972    968    51      12
## 2    place    35540 34.20268 -87.17800  519    460   275      64
## 3    place    35950 34.26313 -86.21066  625    585   234    2560
## 4    place    35442 33.12369 -88.15936  546    438   354     574
## 5    place    35990 34.03920 -86.30570  350    303   185     114
## 6    place    36207 33.67344 -85.81092  600    599   274    3901
## 7    place    35739 34.98784 -86.82902  581    557   283     215
## 8    place    36311 31.59777 -85.71306  581    539   357      45
## 9    place    36251 33.26989 -85.83371  433    411   237     476
## 10   place    36504 31.12794 -87.45764  556    554   227    1109
sample(head(sub.rent.data,20),10)
##    Median Primary State_Name Zip_Code      Lat Samples               Place
## 1     968   place    Alabama    36276 33.09163      12              Abanda
## 2     460   place    Alabama    35540 34.20268      64             Addison
## 3     585   place    Alabama    35950 34.26313    2560    Albertville city
## 4     438   place    Alabama    35442 33.12369     574     Aliceville city
## 5     303   place    Alabama    35990 34.03920     114             Altoona
## 6     599   place    Alabama    36207 33.67344    3901       Anniston city
## 7     557   place    Alabama    35739 34.98784     215             Ardmore
## 8     539   place    Alabama    36311 31.59777      45              Ariton
## 9     411   place    Alabama    36251 33.26989     476             Ashland
## 10    554   place    Alabama    36504 31.12794    1109         Atmore city
## 11    628   place    Alabama    36003 32.43256      71        Autaugaville
## 12    705   place    Alabama    36467 31.30208      40              Babbie
## 13    445   place    Alabama    35903 34.02305      24            Ballplay
## 14    631   place    Alabama    36509 30.40759     290 Bayou La Batre city
## 15    418   place    Alabama    35563 33.93541      15           Beaverton
## 16    328   place    Alabama    36925 32.46366      46             Bellamy
## 17    244   place    Alabama    35546 33.66670     166               Berry
## 18    722   place    Alabama    35203 33.52744   44449     Birmingham city
## 19    498   place    Alabama    35031 34.07573     296        Blountsville
## 20    845   place    Alabama    35957 34.19902      44           Boaz city
##    State_ab Stdev           City
## 1        AL    51         Wadley
## 2        AL   275        Addison
## 3        AL   234    Albertville
## 4        AL   354     Aliceville
## 5        AL   185   Walnut Grove
## 6        AL   274       Anniston
## 7        AL   283        Ardmore
## 8        AL   357         Ariton
## 9        AL   237        Ashland
## 10       AL   227         Atmore
## 11       AL   235   Autaugaville
## 12       AL   155            Opp
## 13       AL   126    Hokes Bluff
## 14       AL   296 Bayou La Batre
## 15       AL   154           Guin
## 16       AL   263           York
## 17       AL   223          Berry
## 18       AL   352     Birmingham
## 19       AL   272   Blountsville
## 20       AL    80           Boaz
#DT::datatable(sub.rent.data, options = list(pagelength=5))

ggplot(sub.rent.data, aes(x=sub.rent.data$State_Name, y=sub.rent.data$Median, color = sub.rent.data$Type)) +
    geom_point(alpha = 0.2) +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

Nest

Using the nest function from tidyr package to group the rents by state, county and city to form a tibble, a list, column

sub.rent.data.nested <- sub.rent.data %>% 
                          tidyr::nest(-c(State_Name, State_ab, County, City)) 
head(sub.rent.data.nested,10)
FALSE # A tibble: 10 x 5
FALSE    State_Name State_ab County           City         data             
FALSE    <fct>      <fct>    <fct>            <fct>        <list>           
FALSE  1 Alabama    AL       Chambers County  Wadley       <tibble [1 x 10]>
FALSE  2 Alabama    AL       Winston County   Addison      <tibble [1 x 10]>
FALSE  3 Alabama    AL       Marshall County  Albertville  <tibble [1 x 10]>
FALSE  4 Alabama    AL       Pickens County   Aliceville   <tibble [2 x 10]>
FALSE  5 Alabama    AL       Etowah County    Walnut Grove <tibble [2 x 10]>
FALSE  6 Alabama    AL       Calhoun County   Anniston     <tibble [2 x 10]>
FALSE  7 Alabama    AL       Limestone County Ardmore      <tibble [1 x 10]>
FALSE  8 Alabama    AL       Dale County      Ariton       <tibble [1 x 10]>
FALSE  9 Alabama    AL       Clay County      Ashland      <tibble [2 x 10]>
FALSE 10 Alabama    AL       Escambia County  Atmore       <tibble [1 x 10]>
DT::datatable(sub.rent.data.nested, options = list(pagelength=5))

Unnest

Using the unnest function from tidyr to ungroup the tibble, a list column, rents back to original form state, county and city

sub.rent.data.unnested <- sub.rent.data.nested %>%
                            tidyr::unnest(data)
head(sub.rent.data.unnested,10)
## # A tibble: 10 x 14
##    State_Name State_ab County City  Place Type  Primary Zip_Code   Lat
##    <fct>      <fct>    <fct>  <fct> <fct> <fct> <fct>   <fct>    <dbl>
##  1 Alabama    AL       Chamb~ Wadl~ Aban~ CDP   place   36276     33.1
##  2 Alabama    AL       Winst~ Addi~ Addi~ Town  place   35540     34.2
##  3 Alabama    AL       Marsh~ Albe~ Albe~ City  place   35950     34.3
##  4 Alabama    AL       Picke~ Alic~ Alic~ City  place   35442     33.1
##  5 Alabama    AL       Picke~ Alic~ Memp~ Town  place   35442     33.1
##  6 Alabama    AL       Etowa~ Waln~ Alto~ Town  place   35990     34.0
##  7 Alabama    AL       Etowa~ Waln~ Waln~ Town  place   35990     34.1
##  8 Alabama    AL       Calho~ Anni~ Anni~ City  place   36207     33.7
##  9 Alabama    AL       Calho~ Anni~ Saks  CDP   place   36206     33.7
## 10 Alabama    AL       Limes~ Ardm~ Ardm~ Town  place   35739     35.0
## # ... with 5 more variables: Lon <dbl>, Mean <int>, Median <int>,
## #   Stdev <int>, Samples <int>
DT::datatable(sub.rent.data.unnested, options = list(pagelength=5))
## Warning in instance$preRenderHook(instance): It seems your data is too
## big for client-side DataTables. You may consider server-side processing:
## https://rstudio.github.io/DT/server.html