READ THIS

This is a bonus WPA that you can use as revision on controlling your working directory. It is entirely voluntary and I suggest you complete it after completing WPA8 or in you own time outside class. Your working directory is the folder that R (and Rstudio) will search whenever you direct it to load from (or save to) your machine to R.

I previously told you two methods for controlling your working directory. These were using R projects (available in Rstudio) or by using setwd() to set the filepath to your desired working directory. This WPA will have 3 sections. The first will go through how to use setwd(). The second will go through how to create an R project, and how to create a new script in an existing R project. These two sections/methods are independent of each other. You can do whichever method you prefer, or you can try out both methods to see which is easier for you.

Section 3 will then go through some examples for saving and loading data.

I’ve put up two versions of this WPA, one with worked answers and instructions and one without. You can use whichever one you prefer. If you want to test your ability I’d suggest that you use the version without answers first. If you are struggling with the concept or practicalities of setting your working directory, I’d suggest you use the worked answer version and ask me lots of questions.

Create some new folders

Before doing anything in R the first thing I want you to do is to create a new folder, called Week10. It doesn’t matter where you create this folder (although I’d recommend using a similar location to the rest of your coursework), but you do need to know where this folder is. In the Week10 folder create two subfolders called WDrevision and otherfolder. We will use the WDrevision folder as our working directory for both methods.

Here are my subfolders. The Filepath to these is /Users/aluckman/Documents/R_Spring2017/Week10

Here are my subfolders. The Filepath to these is /Users/aluckman/Documents/R_Spring2017/Week10

Section 1: setwd()

  1. Open Rstudio and create a new R script called wpa_setwd.R.

  2. Check the current working directory.

getwd()
## [1] "/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision"

My working Directory is already set to the correct location. Yours (hopefully) won’t be.

  1. Change the working directory, using setwd so that it is otherfolder. Check the working directory again.
#Replace the filepath below with the filepath to wherever you have created the "otherfolder".
setwd("/Users/aluckman/Documents/R_Spring2017/Week10/otherfolder")
getwd()

If you run this (correctly) it should return the working directory for the otherfolder

  1. Change the working directory, using setwd so that it is WDrevision. Check the working directory again.
#Again replce this filpath with the location of your "WDrevision" folder
setwd("/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision")
getwd()
## [1] "/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision"

You can see that working directory returned matches the one I gave it

  1. Use the following code to create some fake data. Then save this data to the folder WDrevision. Call the saved file dataswd.Rdata
data1<- array(1:10, dim=c(2,5))
save(data1, file= "dataswd")
  1. Did it save in the correct location (Look in the folder).
Mine has. This is my current WDrevision folder. Note that I forgot to add .Rdata when saving dataswd and I haven’t saved the data_swd.R script file yet

Mine has. This is my current WDrevision folder. Note that I forgot to add .Rdata when saving dataswd and I haven’t saved the data_swd.R script file yet

  1. Save the script and close Rstudio.

Section 2: R projects

Create a new Rproject

  1. Open Rstudio and create a new R project using the menus(i.e. File > New Project). Make the location of the R project the existing working directory/folder WDrevision.
After selecting New project, select Existing Directory. If you had premade the Week10 folder but not the WDrevision folder you could select new directory instead

After selecting New project, select Existing Directory. If you had premade the Week10 folder but not the WDrevision folder you could select new directory instead

You can browse to the location of WDrevision

You can browse to the location of WDrevision

If you look in the WDrevision folder you should see a file called WDrevision.Rproj. This is your R project.

Yours should look something like this, with both the WDrevision.Rproj and dataswd files now in this location (if you completed Section1)

Yours should look something like this, with both the WDrevision.Rproj and dataswd files now in this location (if you completed Section1)

  1. Create a new R script called wpa_create.R.

  2. Check the current working directory. If it isn’t WDrevision you have made a mistake when creating the project.

getwd()
## [1] "/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision"

Mine is correct

  1. Use the following code to create some fake data. Then save this data to the folder WDrevision. Call the saved file datarp.Rdata
data2<- array(0, dim=c(2,5))
save(data2, file= "datarp.Rdata")
  1. Did it save in the correct location (Look in the folder).
You should have something like this in WDrevision Folder

You should have something like this in ‘WDrevision’ Folder

  1. Save the script and close Rstudio.

Creating a new Script in an Existing R project.

R projects are great for organising your analysis, particularly when you have a lot of script files for the same project. This is because you can have multiple R scripts in a single project. All you need to do is insure that you open your desired R project first, before creating any new script files. For instance you could have the WPA script for each week all saved as part of the same R project.

  1. Open up the R project you just created WDrevision.Rproj.

You can either open Rstudio first and use the menus to open a project (File > Open Project), or just open the WDrevision.Rproj file in the finder.

  1. Create a new R script called wpa_exist.R.

  2. Check the current working directory. If it isn’t WDrevision you aren’t in the R project.

getwd()
## [1] "/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision"

Mine is correct. Sometimes you might, inadvertently, have multiple windows of Rstudio open. Make sure you have the correct Project window open. You can tell because it will have the working directory listed at the top of the window.

This is what my window currently looks like. You can see the filepath at the top of the window. If you are in a different Rstudio window/session there will be a different filepath (if in another Project) or just Rstudio at the top.

This is what my window currently looks like. You can see the filepath at the top of the window. If you are in a different Rstudio window/session there will be a different filepath (if in another Project) or just Rstudio at the top.

  1. Use the following code to create some fake data. Then save this data to the folder WDrevision. Call the saved file datarp.Rdata
data3<- array(1, dim=c(2,5))
save(data3, file= "datarp2.Rdata")
  1. Did it save in the correct location (Look in the folder).
My current WDrevision Folder

My current ‘WDrevision’ Folder

  1. Save the script and close Rstudio.

Section 3: Saving and Loading data

This section assumes you have completed at least one of Sections 1 and 2. I will use data1 and dataswd.Rdata in the questions and examples, but you can use data2 and datarp.Rdata if you did not complete Section 1.

  1. Open one of the three script files and make sure that the working directory is WDrevision (use whichever method you prefer).

Saving data.

  1. Load data1.
load("dataswd.Rdata")
  1. Create a new dataset, called data4, by subracting 10 from each score in data1.
data4<- data1-10
  1. Use the following code to save data4.
save(data4, file= "data/data4.Rdata")
  1. The code should produce an error. Why? How can you fix it.

The problem is the file argument. By specifying data/data4.Rdata you are telling R to create a file called data4.Rdata and save it in the subfolder data of our working directory folder WDrevision. However, WDrevision doesn’t have a subfolder caller data, so R can’t complete the operation as the location you have specified doesn’t exist. The easiest way to fix this is to remove the data subfolder part of the filepath. Now it will save in the our working directory folder WDrevision with all our other datafiles.

save(data4, file= "data4.Rdata")

If you actually wanted to save it in a subfolder called data, you could also solve this issue by just creating the subfolder yourself, outside R.

  1. Lets say that instead you wanted to save data4 in the folder otherfolder. You could do this by changing the working directory. However, this is inefficient, as you would need to change it back after to keep working. Putting ../ at the start of a filepath goes to the parent directory. Try to use this to save data4 in otherfolder.
save(data4, file= "../otherfolder/data4.Rdata")

The above code works because ../ moves the location to the parent dirctory of the current working directory. So in this case it moves us to the directory Week10. From here we can specify the path from Week10 to our desired location otherfolder, just like we specify filepaths relative to the working directory. You can also try using multiple ../ to move up more than one level.

  1. Check your working directory is still correct.
getwd()
## [1] "/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision"

As expected this doesn’t change our working directory, just like specifying data/data4... in question 4 doesn’t change our working directory to the (non-existent) data subfolder.

Online data

When loading data directly from the internet, our working directory doesn’t really matter. Instead of giving the filepath and name, we provide the URL for the location of the data. For example we use the following code to load the fake data from wpa6.

wpa6.df <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/11/wpa6.txt",
                      header = TRUE,         
                      sep = "\t") 

Text Files- common mistakes

When loading .Rdata files, we are directly loading a filetype created by R, so we don’t need to provide any information about the data format. When we load .txt files using read.table (or other non-R file types) we need to provide some additional details about the format of the data. This section deals with 2 common mistakes I’ve seen students make.

  1. Use the following code to load another version of the fake data from wpa6. We’ll call it wpa6.df2
wpa6.df2 <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/11/wpa6.txt",
                      header = FALSE,         
                      sep = "\t") 
  1. What is different between wpa6.df and `wpa6.df2. Use head, str and/or names to compare the two. What mistake have we made in wpa6.df2
head(wpa6.df)
##   sex age      major haircolor  iq     country logic siblings multitasking
## 1   m  22  economics    blonde 106 switzerland  7.43        1           69
## 2   m  20 psychology     brown 107     germany  9.72        1           49
## 3   f  23  chemistry     brown 109     austria 12.88        3           50
## 4   m  27 psychology    blonde 114 switzerland  7.92        3           50
## 5   m  18 psychology     black 113 switzerland  7.95        2           61
## 6   m  25  chemistry    blonde 108 switzerland 12.19        2           37
##   partners marijuana risk
## 1        6         1    0
## 2        4         1    1
## 3       11         1    1
## 4       14         1    0
## 5        0         1    1
## 6       14         1    0
head(wpa6.df2)
##    V1  V2         V3        V4  V5          V6    V7       V8           V9
## 1 sex age      major haircolor  iq     country logic siblings multitasking
## 2   m  22  economics    blonde 106 switzerland  7.43        1           69
## 3   m  20 psychology     brown 107     germany  9.72        1           49
## 4   f  23  chemistry     brown 109     austria 12.88        3           50
## 5   m  27 psychology    blonde 114 switzerland  7.92        3           50
## 6   m  18 psychology     black 113 switzerland  7.95        2           61
##        V10       V11  V12
## 1 partners marijuana risk
## 2        6         1    0
## 3        4         1    1
## 4       11         1    1
## 5       14         1    0
## 6        0         1    1
names(wpa6.df)
##  [1] "sex"          "age"          "major"        "haircolor"   
##  [5] "iq"           "country"      "logic"        "siblings"    
##  [9] "multitasking" "partners"     "marijuana"    "risk"
names(wpa6.df2)
##  [1] "V1"  "V2"  "V3"  "V4"  "V5"  "V6"  "V7"  "V8"  "V9"  "V10" "V11"
## [12] "V12"

Hopefully, by using head you can see that the first row in wpa6.df2 isn’t data, it is the column/variable names. This is even clearer if we use names. The header argument controls whether the first row of a text file is treated as data/observations or column names. If you don’t specify header the default is usually FALSE but this depends upon the structure of the data (see help(read.table))

  1. Use the following code to load another version of the student.math data. We’ll call it wpa6.df3
wpa6.df3 <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/11/wpa6.txt",
                      header = TRUE,         
                      sep = ",") 
  1. What is different between wpa6.df and wpa6.df3. Use head, str and/or names to compare the two. What mistake have we made in wpa6.df3
str(wpa6.df)
## 'data.frame':    100 obs. of  12 variables:
##  $ sex         : Factor w/ 2 levels "f","m": 2 2 1 2 2 2 1 2 1 2 ...
##  $ age         : int  22 20 23 27 18 25 20 27 26 21 ...
##  $ major       : Factor w/ 4 levels "biology","chemistry",..: 3 4 2 4 4 2 4 4 4 1 ...
##  $ haircolor   : Factor w/ 4 levels "black","blonde",..: 2 3 3 2 1 2 2 1 1 3 ...
##  $ iq          : int  106 107 109 114 113 108 100 119 104 108 ...
##  $ country     : Factor w/ 4 levels "austria","germany",..: 4 2 1 4 4 4 4 2 2 4 ...
##  $ logic       : num  7.43 9.72 12.88 7.92 7.95 ...
##  $ siblings    : int  1 1 3 3 2 2 1 3 2 2 ...
##  $ multitasking: int  69 49 50 50 61 37 58 58 44 49 ...
##  $ partners    : int  6 4 11 14 0 14 4 14 11 2 ...
##  $ marijuana   : int  1 1 1 1 1 1 1 1 0 0 ...
##  $ risk        : int  0 1 1 0 1 0 1 0 0 0 ...
str(wpa6.df3)
## 'data.frame':    100 obs. of  1 variable:
##  $ sex.age.major.haircolor.iq.country.logic.siblings.multitasking.partners.marijuana.risk: Factor w/ 100 levels "f\t19\tpsychology\tbrown\t116\tgermany\t8.72\t2\t28\t2\t0\t0",..: 63 55 25 98 51 91 4 97 49 57 ...

From the output we can see that wpa6.df3 only has 1 column/variable, instead of 12. This is because we have told read.table that wpa6.txt uses commas to separate the columns, when it actually uses tabs. Since there are no commas in the text file, R assumes that there is only a single column of data. This is why the sep argument is important, as it specifies how the data is arranged in the text file.

Done

You don’t need to submit this WPA. But you should check your answers