Welcome to R 101
# install.packages("tidyverse")
# Once you have done this once on your computer it is there - you can just use library in future - this installs the files into your memory
# library("tidyverse")
# This is basically a command that instructs RStudio to load the package from memory into each session - you will have to do this one each time.
That’s great. But what if I’ve never heard of the package before?
?tidyverse_packages
## No documentation for 'tidyverse_packages' in specified packages and libraries:
## you could try '??tidyverse_packages'
A single ‘?’ will allow you to search everything you have loaded and will bring up the help page. Using the ‘??’ in front of anything will run a wildcard search over the internet - Useful if you know what you’re looking for but you’ve forgotten half of it’s name!
??tidyv
There is nothing these people don’t know - you just have to figure out how to apply their answer to your problem. Most good quality questions and answers will create some sample data within the code - copy it and run the code line by line, to see if it does to the dummy data, what you want to do to yours!
Here’s your ‘real’ data
mydata <- read.csv("~/Google Drive/PhD/Data/energy_data/hypothetical_energy.csv")
mydata <- mydata[,3:5] #I'm just removing some columns we don't need here
head(mydata)
## address_postcodesec smart_meter rt0000
## 1 AL1 1 G 0.00
## 2 AL1 1 G 5.00
## 3 AL1 1 G 152.00
## 4 AL1 1 G 204.33
## 5 AL1 2 E 280.33
## 6 AL1 2 E 356.33
And here’s your real problem:
Let’s say you need an overall total of one column (rt0000) based on ‘address_postcodesec’ so for AL1 1 you would expect a total of 419.33.
You’ve googled “how do I get an overall total when I group a column in R” and this is what you’ve found. You think it does what you want it to.
# Here is the dummy data they have provided as an example
x <- data.frame(Category=factor(c("First", "First", "First", "Second",
"Third", "Third", "Second")),
Frequency=c(10,15,5,2,14,20,3))
# And here is the answer that somebody else has provided
aggregate(x$Frequency, by=list(Category=x$Category), FUN=sum)
## Category x
## 1 First 30
## 2 Second 5
## 3 Third 34
Where do you start?
See if you can re-write the code with actual pen and paper and we’ll test some answers
The biggest part of fixing a problem in code is knowing how to apply a general answer to a specific problem…once you’ve figured out the exact combination of words to google to get you there in the first place.
Yes!! It is frustrating! But the more you do, the more you learn and the faster you get at fixing mistakes (You still make lots of mistakes! That’s fine too.)
You cannot save the console and repeat your work! Reproducibility is so important, especially when you work with open data - People will always want to try and prove you wrong.
So, write a script - or two - a notes and a final, save them, run and tweak lines over and over again until you get the result you want without having to copy and paste it every time, work on it like you would a word document and when you’re finally happy….
Turn it into a Markdown. Markdowns should look very familiar, a bit like Jupyter notebooks? No? YES! They’re the same idea, don’t be afraid of them. Instead of clicking “cell type:Code”, “cell type: Markdown” you just type away and when you want to insert code you just…
# Stick your code in here
Viola!
RStudio put together cheatsheets to help you! Find them here
Print them. Laminate them. USE THEM.
Get to grips with the data wranglig packages - the %>% function in dplyr will save you an enormous amount of time and make your code much more readable.
When you get bored, learn how to code emoji poos into your presentations.
## 💩