Welcome to R 101

Topics

First things first

  1. Download R
  2. Download RStudio
  3. Stop using RGUI

Installing and using new packages

# install.packages("tidyverse") 
# Once you have done this once on your computer it is there - you can just use library in future - this installs the files into your memory

# library("tidyverse") 
# This is basically a command that instructs RStudio to load the package from memory into each session - you will have to do this one each time. 

That’s great. But what if I’ve never heard of the package before?

?tidyverse_packages
## No documentation for 'tidyverse_packages' in specified packages and libraries:
## you could try '??tidyverse_packages'

A single ‘?’ will allow you to search everything you have loaded and will bring up the help page. Using the ‘??’ in front of anything will run a wildcard search over the internet - Useful if you know what you’re looking for but you’ve forgotten half of it’s name!

??tidyv

Where to find more help and how to make the most of it

StackOverflow

There is nothing these people don’t know - you just have to figure out how to apply their answer to your problem. Most good quality questions and answers will create some sample data within the code - copy it and run the code line by line, to see if it does to the dummy data, what you want to do to yours!

An example

Here’s your ‘real’ data

mydata <- read.csv("~/Google Drive/PhD/Data/energy_data/hypothetical_energy.csv")
mydata <- mydata[,3:5] #I'm just removing some columns we don't need here 

head(mydata)
##   address_postcodesec smart_meter rt0000
## 1               AL1 1           G   0.00
## 2               AL1 1           G   5.00
## 3               AL1 1           G 152.00
## 4               AL1 1           G 204.33
## 5               AL1 2           E 280.33
## 6               AL1 2           E 356.33

And here’s your real problem:

Let’s say you need an overall total of one column (rt0000) based on ‘address_postcodesec’ so for AL1 1 you would expect a total of 419.33.

You’ve googled “how do I get an overall total when I group a column in R” and this is what you’ve found. You think it does what you want it to.

# Here is the dummy data they have provided as an example
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
                                      "Third", "Third", "Second")), 
                    Frequency=c(10,15,5,2,14,20,3))
# And here is the answer that somebody else has provided

aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
##   Category  x
## 1    First 30
## 2   Second  5
## 3    Third 34

Where do you start?

See if you can re-write the code with actual pen and paper and we’ll test some answers

The biggest part of fixing a problem in code is knowing how to apply a general answer to a specific problem…once you’ve figured out the exact combination of words to google to get you there in the first place.

Yes!! It is frustrating! But the more you do, the more you learn and the faster you get at fixing mistakes (You still make lots of mistakes! That’s fine too.)

Best Practise (Subjective!)

Make your own life easy!!

  • I prefer to write a notes script and a final script in which I include only the code I eventually run to generate results - there is a lot of trial and error and I am messy!
  • Keep a ‘glossary’ script; if you find yourself googling the same thing over and over, make a note of it.
  • # Make good comments - even if they’re brief, there’s nothing worse than having no idea what a piece of code did to those results.
  • Change your colur theme in RStudio preferences. The white BURNS.

Save things in a sensible place!

  • A raw data folder and an output data folder and sensible naming conventions make my life worth living. It doesn’t matter what convention you choose, just stick with it.
  • Setting up a dropbox or google drive will pay dividends because you will to save the latest version of something to your USB.

Console/Scripts/Markdown vs giving up and going back to Python

You cannot save the console and repeat your work! Reproducibility is so important, especially when you work with open data - People will always want to try and prove you wrong.

So, write a script - or two - a notes and a final, save them, run and tweak lines over and over again until you get the result you want without having to copy and paste it every time, work on it like you would a word document and when you’re finally happy….

Turn it into a Markdown. Markdowns should look very familiar, a bit like Jupyter notebooks? No? YES! They’re the same idea, don’t be afraid of them. Instead of clicking “cell type:Code”, “cell type: Markdown” you just type away and when you want to insert code you just…

# Stick your code in here

Viola!

Hints and Cheatsheets

RStudio put together cheatsheets to help you! Find them here

Print them. Laminate them. USE THEM.

Get to grips with the data wranglig packages - the %>% function in dplyr will save you an enormous amount of time and make your code much more readable.

And finally

When you get bored, learn how to code emoji poos into your presentations.

## 💩