RStudio 101

First things first

Download R
Download RStudio
Stop using RGUI

Navigating R Studio

R Studio is the extremely helpful user interface provided by the boffins at R Consortium. It is open source, free to download and free to use. You can organise the windows any way you please but this is how it comes and so, how I have left it!

Top left

This is the window where you open scripts or markdowns, or any file you want to save for later and come back to. It allows you to edit code and save it, test it, change it and play with it without having to scroll miles bck through your console when you decide that actually, that bit was the answer you wanted after all.

We will go into more detail about Scripts and Markdowns later on.

Top Right

This is your ‘environment’. You can see here what files you have read into your session. Using the dropdown arrow is the equivalent of ‘str(dataframe)’ and is handy if you know you’ll need to keep referring back to a list of column names and you don’t want to scroll back up through your console every time.

Bottom Left

Speaking of consoles…So far with Dani and Python you will have only ever seen interactive code, where the output is produced underneath the code block as soon as you run it. With RStudio, you can run code and the output will appear in the console. It is a record of your work so far and your answers, which you will probably want to copy into a notes document if you are planned to discuss the results in for example, an essay.

Bottom Right

This is the not so codey bit of R Studio - you can do a lot of clicking here to find things you need; help sheets, file paths and its also where any plots you produce will pop up - you can either hard code a save into your work, or you can ‘export’ as you go along from here.

Installing and using new packages

# install.packages("tidyverse") 
# Once you have done this once on your computer it is there - you can just use library in future - this installs the files into your memory

# library("tidyverse") 
# This is basically a command that instructs RStudio to load the package from memory into each session - you will have to do this one each time.

That’s great. But what if I’ve never heard of the package before?

?tidyverse_packages

## No documentation for 'tidyverse_packages' in specified packages and libraries:
## you could try '??tidyverse_packages'

A single ‘?’ will allow you to search everything you have loaded and will bring up the help page. Using the ‘??’ in front of anything will run a wildcard search over the internet - Useful if you know what you’re looking for but you’ve forgotten half of it’s name!

??tidyv

Where to find more help and how to make the most of it

StackOverflow

There is nothing these people don’t know - you just have to figure out how to apply their answer to your problem. Most good quality questions and answers will create some sample data within the code - copy it and run the code line by line, to see if it does to the dummy data, what you want to do to yours!

An example

Here’s your ‘real’ data

mydata <- read.csv("~/Google Drive/PhD/Data/energy_data/hypothetical_energy.csv")
mydata <- mydata[,3:5] #I'm just removing some columns we don't need here 

head(mydata)

##   address_postcodesec smart_meter rt0000
## 1               AL1 1           G   0.00
## 2               AL1 1           G   5.00
## 3               AL1 1           G 152.00
## 4               AL1 1           G 204.33
## 5               AL1 2           E 280.33
## 6               AL1 2           E 356.33

And here’s your real problem:

Let’s say you need an overall total of one column (rt0000) based on ‘address_postcodesec’ so for AL1 1 you would expect a total of 419.33.

You’ve googled “how do I get an overall total when I group a column in R” and this is what you’ve found. You think it does what you want it to.

# Here is the dummy data they have provided as an example
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                      "Third", "Third", "Second")), 
                    Frequency=c(10,15,5,2,14,20,3))

# And here is the answer that somebody else has provided

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)

##   Category  x
## 1    First 30
## 2   Second  5
## 3    Third 34

Where do you start?

See if you can re-write the code with actual pen and paper and we’ll test some answers

The biggest part of fixing a problem in code is knowing how to apply a general answer to a specific problem…once you’ve figured out the exact combination of words to google to get you there in the first place.

Yes!! It is frustrating! But the more you do, the more you learn and the faster you get at fixing mistakes (You still make lots of mistakes! That’s fine too.)

Best Practise (Subjective!)

Make your own life easy!!

I prefer to write a notes script and a final script in which I include only the code I eventually run to generate results - there is a lot of trial and error and I am messy!
Keep a ‘glossary’ script; if you find yourself googling the same thing over and over, make a note of it.
# Make good comments - even if they’re brief, there’s nothing worse than having no idea what a piece of code did to those results.
Change your colur theme in RStudio preferences. The white BURNS.

Save things in a sensible place!

A raw data folder and an output data folder and sensible naming conventions make my life worth living. It doesn’t matter what convention you choose, just stick with it.
Setting up a dropbox or google drive will pay dividends because you will to save the latest version of something to your USB.

Console/Scripts/Markdown vs giving up and going back to Python

You cannot save the console and repeat your work! Reproducibility is so important, especially when you work with open data - People will always want to try and prove you wrong.

So, write a script - or two - a notes and a final, save them, run and tweak lines over and over again until you get the result you want without having to copy and paste it every time, work on it like you would a word document and when you’re finally happy….

Turn it into a Markdown. Markdowns should look very familiar, a bit like Jupyter notebooks? No? YES! They’re the same idea, don’t be afraid of them. Instead of clicking “cell type:Code”, “cell type: Markdown” you just type away and when you want to insert code you just…

# Stick your code in here

Viola!