Tutorial 2


For this project I will create a data frame by reading a csv into R and then seperating it into several csv files by creating for-loop and ifelse functions. To begin I will need to load the following libraries and, because this project is for my friend Shayne, I set the working directory as shown below (Note: The end of the directory in green text):

library(dplyr)
library(tidyr)
setwd("/Users/mitchell/R/Shayne")

Then, I create a data frame (or df) by reading in the csv file titled “H.Park.csv”, giving it the name “H.Park_Zones.B”. (Note: H.Park represents Highland Park (I think), and I use B at the end of titles such as these to represent “Before” I make my alterations.)

H.Park_Zones.B<- read.csv("H.Park.csv", stringsAsFactors = F)

Now, because I am interested in separating and making csv files based on the way in which the land is used/zoned (as depicted in the original csv), I first make a set of values using the “unique”" command…

b<- unique(H.Park_Zones.B[["Land_Use"]]) 

>>> creating the set of values (in the form of strings) below…

## [1] "Low Medium II Residential"   "Neighborhood Commercial"    
## [3] "General Commercial"          "Public Facilities"          
## [5] "Open Space"                  "Low Moderate II Residential"
## [7] "Medium Residential"          "Low Residential"

The result of this command creates a set of values that is essentially a list of every zone type displayed under the column title “Land_Use” in my df. Now, with this set of values I can create a for-loop that completes my agenda for this portion of the project. (Note: The goal of this project is to create a df from a csv file to then separate it into several csv files- based on a series of variables that I will describe shortly.)

for(i in 1:length(b)){
  H.Park_Zones.A <- subset(H.Park_Zones.B, Land_Use == b[i])
  #  print(head(H.Park_Zones.A))
  write.csv(H.Park_Zones.A, file = paste(b[i], ".csv"), row.names = F)
}

Now, there are a few interesting things going on in this loop that you should take note of. First, take note of the for-loop prior to the arguments (e.g. for(i in 1:length(b))). The “i” represents index and “length” is used so that every variable listed in “b” is used in this loop. If I wanted only to use a certain number of these variables I could have entered in this number instead. That is, there are exactly 8 variables in “b”, and if I only wanted to use the first 5 variables I would have entered “1:5” instead of “1:length”. Or I could have used 2 - 5 by entering “2:5”. However, because I want to use all variables regardless of there being 8 or 80 I used “length”. Got it?! Following this, I want to create a series of dfs based on the variables as values, and do this with this argument:

–> H.Park_Zones.A <- subset(H.Park_Zones.B, Land_Use == b[i]) ** First you can see that I am using “A” to represent “After”, and that a subset* for every variable under the column heading “Land_Use” becomes “H.Park_Zones.A”. However, they olny receive this title after being turned into a csv by this argument here –> write.csv(H.Park_Zones.A, file = paste(b[i], “.csv”), row.names = F) . This is a fun segment of the argument too because every csv is given a title that is based on its variable via the use of “paste”. Regarding the segment of the loop

–> “#print(head(H.Park_Zones.A))”, it can be used to depict brief examples of the csv files just created in the R Studio Console. I just chose to exlude it so as to avoid cluttering my conlsole.

Now, because I am going to do this for variables in other columns and want to honor DRY, I think this might be a time worthy of a small function. Essentially, I want to make a function that consists of the two steps I just took. Namely…

1.)

b<- unique(H.Park_Zones.B[["Land_Use"]]) 

& 2.)

for(i in 1:length(b)){
  H.Park_Zones.A <- subset(H.Park_Zones.B, Land_Use == b[i])
  #  print(head(H.Park_Zones.A))
  write.csv(H.Park_Zones.A, file = paste(b[i], ".csv"), row.names = F)
}

and turn it into this…

make_csv <- function(df, col){
  a <- unique(df[[col]])
  for(i in 1:length(a)){
    newdf <- subset(df, get(col) == a[i])
    #print(head(newdf))
    write.csv(newdf, file = paste(a[i], ".csv", sep=""), row.names = F)
  }
}

Now, with this function, all that’s needed from me is a simple copy/paste of whatever column I am interested in. Furthermore, if I am interested in breaking other csv files into separate csv files I can read them in and copy/paste them in too. To demonstrate, while still using the same df, I changed my sights from the column “Land_Use” to “Filed_Ellis_Act”. Also important to know is this column consists of only two variables: namely “Yes” and “No”. Now when I rnu this function…

make_csv(H.Park_Zones.B, "Filed_Ellis_Act")

Once the function has run the csv files are created and stored appropriately, as you can see below. (Note: As you can see directly below the working directory titled “Shayne” you will find the two csv files titled No and Yes respectively. Also, directly below the html of this R Markdown you will see the 8 csv files titled after the variables under the original column we ran titled “Land_Use”.)


This function is already quite useful; however, I can think of one addition that will make it even more so in particular situations. For instance, if there is an excessive amount of variables in a column, I might want to avoid making a plethora of csv files with small amounts of data. So, with this notion in mind, I have a column titled “LOC_Year” and it has a vast array of variables in the form of years and I want to divy up this df according to this column (Note: LOC_Year means Year of Last Owner Change). Now, to avoid the plethora, I have chosen to make bins for the years to fall within. Specifically, my friend asked for the following…


Unfortunately, there are no years entered in this column that are before 1950 within this column; so I will make a new column of strings to house these years as bins. I start with years prior to 1970, then between 1970 and 1990, then between 1990 and 2000, then 2000 to 2010, and then a bin for all years after 2010. To do this I group the operations mutate and ifelse and do so with the following method…

H.Park_Year.Sold<- mutate(H.Park_Zones.B, Year_Group = ifelse(LOC_Year <1970
            ,"Before 1970",ifelse(LOC_Year <1990
            ,"1970 - 1990", ifelse(LOC_Year <2000,"1990 - 2000"
            , ifelse (LOC_Year <2010,"2000 - 2010", "After 2010")))))

This creates a df (“H.Park_Year.Sold”) with a new column of bins for my years titled “Year_Group”. With this, I now have a df and a column title that I can enter in my function…

make_csv(H.Park_Year.Sold, "Year_Group")

And that’s that! I can now easily finish and email all the csv files Shayne asked for. In short…skedooosh!

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.