This is a bonus WPA that you can use as revision on controlling your working directory. It is entirely voluntary and I suggest you complete it after completing WPA8 or in you own time outside class. Your working directory is the folder that R (and Rstudio) will search whenever you direct it to load from (or save to) your machine to R.
I previously told you two methods for controlling your working directory. These were using R projects (available in Rstudio) or by using setwd() to set the filepath to your desired working directory. This WPA will have 3 sections. The first will go through how to use setwd(). The second will go through how to create an R project, and how to create a new script in an existing R project. These two sections/methods are independent of each other. You can do whichever method you prefer, or you can try out both methods to see which is easier for you.
Section 3 will then go through some examples for saving and loading data.
I’ve put up two versions of this WPA, one with worked answers and instructions and one without. You can use whichever one you prefer. If you want to test your ability I’d suggest that you use the version without answers first. If you are struggling with the concept or practicalities of setting your working directory, I’d suggest you use the worked answer version and ask me lots of questions.
Before doing anything in R the first thing I want you to do is to create a new folder, called Week10. It doesn’t matter where you create this folder (although I’d recommend using a similar location to the rest of your coursework), but you do need to know where this folder is. In the Week10 folder create two subfolders called WDrevision and otherfolder. We will use the WDrevision folder as our working directory for both methods.
Here are my subfolders. The Filepath to these is /Users/aluckman/Documents/R_Spring2017/Week10
Open Rstudio and create a new R script called wpa_setwd.R.
Check the current working directory.
getwd()
## [1] "/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision"
My working Directory is already set to the correct location. Yours (hopefully) won’t be.
setwd so that it is otherfolder. Check the working directory again.#Replace the filepath below with the filepath to wherever you have created the "otherfolder".
setwd("/Users/aluckman/Documents/R_Spring2017/Week10/otherfolder")
getwd()
If you run this (correctly) it should return the working directory for the otherfolder
setwd so that it is WDrevision. Check the working directory again.#Again replce this filpath with the location of your "WDrevision" folder
setwd("/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision")
getwd()
## [1] "/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision"
You can see that working directory returned matches the one I gave it
WDrevision. Call the saved file dataswd.Rdatadata1<- array(1:10, dim=c(2,5))
save(data1, file= "dataswd")
Mine has. This is my current WDrevision folder. Note that I forgot to add .Rdata when saving dataswd and I haven’t saved the data_swd.R script file yet
WDrevision.After selecting New project, select Existing Directory. If you had premade the Week10 folder but not the WDrevision folder you could select new directory instead
You can browse to the location of WDrevision
If you look in the WDrevision folder you should see a file called WDrevision.Rproj. This is your R project.
Yours should look something like this, with both the WDrevision.Rproj and dataswd files now in this location (if you completed Section1)
Create a new R script called wpa_create.R.
Check the current working directory. If it isn’t WDrevision you have made a mistake when creating the project.
getwd()
## [1] "/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision"
Mine is correct
WDrevision. Call the saved file datarp.Rdatadata2<- array(0, dim=c(2,5))
save(data2, file= "datarp.Rdata")
You should have something like this in ‘WDrevision’ Folder
R projects are great for organising your analysis, particularly when you have a lot of script files for the same project. This is because you can have multiple R scripts in a single project. All you need to do is insure that you open your desired R project first, before creating any new script files. For instance you could have the WPA script for each week all saved as part of the same R project.
WDrevision.Rproj.You can either open Rstudio first and use the menus to open a project (File > Open Project), or just open the WDrevision.Rproj file in the finder.
Create a new R script called wpa_exist.R.
Check the current working directory. If it isn’t WDrevision you aren’t in the R project.
getwd()
## [1] "/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision"
Mine is correct. Sometimes you might, inadvertently, have multiple windows of Rstudio open. Make sure you have the correct Project window open. You can tell because it will have the working directory listed at the top of the window.
This is what my window currently looks like. You can see the filepath at the top of the window. If you are in a different Rstudio window/session there will be a different filepath (if in another Project) or just Rstudio at the top.
WDrevision. Call the saved file datarp.Rdatadata3<- array(1, dim=c(2,5))
save(data3, file= "datarp2.Rdata")
My current ‘WDrevision’ Folder
This section assumes you have completed at least one of Sections 1 and 2. I will use data1 and dataswd.Rdata in the questions and examples, but you can use data2 and datarp.Rdata if you did not complete Section 1.
WDrevision (use whichever method you prefer).data1.load("dataswd.Rdata")
data4, by subracting 10 from each score in data1.data4<- data1-10
data4.save(data4, file= "data/data4.Rdata")
The problem is the file argument. By specifying data/data4.Rdata you are telling R to create a file called data4.Rdata and save it in the subfolder data of our working directory folder WDrevision. However, WDrevision doesn’t have a subfolder caller data, so R can’t complete the operation as the location you have specified doesn’t exist. The easiest way to fix this is to remove the data subfolder part of the filepath. Now it will save in the our working directory folder WDrevision with all our other datafiles.
save(data4, file= "data4.Rdata")
If you actually wanted to save it in a subfolder called data, you could also solve this issue by just creating the subfolder yourself, outside R.
data4 in the folder otherfolder. You could do this by changing the working directory. However, this is inefficient, as you would need to change it back after to keep working. Putting ../ at the start of a filepath goes to the parent directory. Try to use this to save data4 in otherfolder.save(data4, file= "../otherfolder/data4.Rdata")
The above code works because ../ moves the location to the parent dirctory of the current working directory. So in this case it moves us to the directory Week10. From here we can specify the path from Week10 to our desired location otherfolder, just like we specify filepaths relative to the working directory. You can also try using multiple ../ to move up more than one level.
getwd()
## [1] "/Users/aluckman/Documents/R_Spring2017/Week10/WDrevision"
As expected this doesn’t change our working directory, just like specifying data/data4... in question 4 doesn’t change our working directory to the (non-existent) data subfolder.
When loading data directly from the internet, our working directory doesn’t really matter. Instead of giving the filepath and name, we provide the URL for the location of the data. For example we use the following code to load the fake data from wpa6.
wpa6.df <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/11/wpa6.txt",
header = TRUE,
sep = "\t")
When loading .Rdata files, we are directly loading a filetype created by R, so we don’t need to provide any information about the data format. When we load .txt files using read.table (or other non-R file types) we need to provide some additional details about the format of the data. This section deals with 2 common mistakes I’ve seen students make.
wpa6.df2wpa6.df2 <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/11/wpa6.txt",
header = FALSE,
sep = "\t")
wpa6.df and `wpa6.df2. Use head, str and/or names to compare the two. What mistake have we made in wpa6.df2head(wpa6.df)
## sex age major haircolor iq country logic siblings multitasking
## 1 m 22 economics blonde 106 switzerland 7.43 1 69
## 2 m 20 psychology brown 107 germany 9.72 1 49
## 3 f 23 chemistry brown 109 austria 12.88 3 50
## 4 m 27 psychology blonde 114 switzerland 7.92 3 50
## 5 m 18 psychology black 113 switzerland 7.95 2 61
## 6 m 25 chemistry blonde 108 switzerland 12.19 2 37
## partners marijuana risk
## 1 6 1 0
## 2 4 1 1
## 3 11 1 1
## 4 14 1 0
## 5 0 1 1
## 6 14 1 0
head(wpa6.df2)
## V1 V2 V3 V4 V5 V6 V7 V8 V9
## 1 sex age major haircolor iq country logic siblings multitasking
## 2 m 22 economics blonde 106 switzerland 7.43 1 69
## 3 m 20 psychology brown 107 germany 9.72 1 49
## 4 f 23 chemistry brown 109 austria 12.88 3 50
## 5 m 27 psychology blonde 114 switzerland 7.92 3 50
## 6 m 18 psychology black 113 switzerland 7.95 2 61
## V10 V11 V12
## 1 partners marijuana risk
## 2 6 1 0
## 3 4 1 1
## 4 11 1 1
## 5 14 1 0
## 6 0 1 1
names(wpa6.df)
## [1] "sex" "age" "major" "haircolor"
## [5] "iq" "country" "logic" "siblings"
## [9] "multitasking" "partners" "marijuana" "risk"
names(wpa6.df2)
## [1] "V1" "V2" "V3" "V4" "V5" "V6" "V7" "V8" "V9" "V10" "V11"
## [12] "V12"
Hopefully, by using head you can see that the first row in wpa6.df2 isn’t data, it is the column/variable names. This is even clearer if we use names. The header argument controls whether the first row of a text file is treated as data/observations or column names. If you don’t specify header the default is usually FALSE but this depends upon the structure of the data (see help(read.table))
wpa6.df3wpa6.df3 <- read.table(file = "http://nathanieldphillips.com/wp-content/uploads/2016/11/wpa6.txt",
header = TRUE,
sep = ",")
wpa6.df and wpa6.df3. Use head, str and/or names to compare the two. What mistake have we made in wpa6.df3str(wpa6.df)
## 'data.frame': 100 obs. of 12 variables:
## $ sex : Factor w/ 2 levels "f","m": 2 2 1 2 2 2 1 2 1 2 ...
## $ age : int 22 20 23 27 18 25 20 27 26 21 ...
## $ major : Factor w/ 4 levels "biology","chemistry",..: 3 4 2 4 4 2 4 4 4 1 ...
## $ haircolor : Factor w/ 4 levels "black","blonde",..: 2 3 3 2 1 2 2 1 1 3 ...
## $ iq : int 106 107 109 114 113 108 100 119 104 108 ...
## $ country : Factor w/ 4 levels "austria","germany",..: 4 2 1 4 4 4 4 2 2 4 ...
## $ logic : num 7.43 9.72 12.88 7.92 7.95 ...
## $ siblings : int 1 1 3 3 2 2 1 3 2 2 ...
## $ multitasking: int 69 49 50 50 61 37 58 58 44 49 ...
## $ partners : int 6 4 11 14 0 14 4 14 11 2 ...
## $ marijuana : int 1 1 1 1 1 1 1 1 0 0 ...
## $ risk : int 0 1 1 0 1 0 1 0 0 0 ...
str(wpa6.df3)
## 'data.frame': 100 obs. of 1 variable:
## $ sex.age.major.haircolor.iq.country.logic.siblings.multitasking.partners.marijuana.risk: Factor w/ 100 levels "f\t19\tpsychology\tbrown\t116\tgermany\t8.72\t2\t28\t2\t0\t0",..: 63 55 25 98 51 91 4 97 49 57 ...
From the output we can see that wpa6.df3 only has 1 column/variable, instead of 12. This is because we have told read.table that wpa6.txt uses commas to separate the columns, when it actually uses tabs. Since there are no commas in the text file, R assumes that there is only a single column of data. This is why the sep argument is important, as it specifies how the data is arranged in the text file.
You don’t need to submit this WPA. But you should check your answers