Continuing from where you left off in lecture 1. Go to the Teams site, download the dataset called rainfall.csv, save it in your project’s data folder, and do the following.
Create a new script, call it Lecture 2 and save it in your Scripts folder.
Import and redo
Import the rainfall data and call it d.rain
Repeat what you did yesterday.
Code
remove(list=ls()) #clear all from memoryrequire(data.table)require(ggplot2)dat <-fread("Data/wheat_student_data.csv")d.rain <-fread("Data/rainfall.csv")dat[, N_tot := N_plant + N_tdress + N_spray]dat <- dat[!is.na(yld),]dat[, yld_max :=max(yld), by = .(year,local,rep)]dat[, yld_norm := yld/yld_max*100 ]
Now look at the data by simply typing the name of the dataset (d.rain) into the console and hit enter. Now do the same with dat. What is the difference?
Coming back to the rainfall dataset. You will see that the dataset contains the year and local for the growing season months (Apr to Sep). Calculate the total rainfall during the growing season and call it rain_tot (note that I keep the variable names as short as possible)
Now we get to the fun part! Lets plot! Let’s start by plotting the rainfall data.
Rainfall plots
Using the dat dataset, we can now visualise the rainfall data to understand it better. Lets start by plotting the total growing season rainfall for all locations as a density plot. For help with getting started, look at the ggplot cheat sheet. Call your plot p1.
Code
p1 <-ggplot() +geom_density(data= d.rain,aes(x = rain_tot)) #its better to specify the data in geom_density and not ggplot() if you want plot the data of multiple datasets on the same plot.p1
This plot looks OK, but we can do much better - the background should be white, the horizontal axis-title should be expanded, the vertical axis does not start at 0, and the horizontal axis should begin and end at the min and max rainfall.
Let’s start by adding a theme to remove the grey background. I use theme_bw() for all of my plots since it both prints well and works great with a projector. Note the base_size = 12 with the function, the function would work without this, but then it would revert to all of the defaults. However, changing the size of the text is typically necessary.
Code
p1.1<-#name of the next version of the plotp1 +theme_bw(base_size =11)p1.1#plots version p1.1 of the plot
OK, now we should do something about the axis titles and maybe we can add a title to the plot while we are at it. This can be done with the labs function.
Code
p1.2<-#name of the next version of the plotp1.1+labs(x ="Total growing season rainfall (mm)",y ="Density", title ="Growing season rainfall (2016-2019)")p1.2#plots version p1.1 of the plot
OK, now we can fix the axes. This step is tricky, especially if we also want to specify the break as the axis labels. Here we specify 10 breaks on the x-axis with breaks = scales::pretty_breaks(n = 10), expand = c(0, 0) specifies that the plot should start and stop with the data i.e. there is no “padding” around the plots, and limits = c(min(d.rain$rain),max(d.rain$rain)) specifies that the plot should start and stop at the min and max rainfall values. I do the same thing on the y-axis but with fewer breaks Note that I have to manually specify the y-axis’s upper limit by looking at it; in this case, it is limits = c(0,0.006).
Code
p1.3<-#name of the next version of the plotp1.2+scale_x_continuous(breaks = scales::pretty_breaks(n =10),expand =c(0, 0),limits =c(min(d.rain$rain),max(d.rain$rain))) +scale_y_continuous(breaks = scales::pretty_breaks(n =5),expand =c(0, 0),limits =c(0,0.008))p1.3
Now, what if we wanted so see the rainfall distribution for each location individually? For that purpose we use the facit_wrap function. In this instance, we only want to facet by local; thus, the first argument is blank as indicated by the . with the local following after the ~. Note that I add the facet to p1.2 and not p1.3 since it includes the limits specified according to the combined plot; thus, the facets will not plot correctly. Try adding the facets to p1.3 to see what happens.
Code
p1.4<-p1.2+facet_wrap(.~local)p1.4
Now lets take it even further by adding a vertical dotted line which shows the average rainfall for each location during these four years. For this, you will also have to draw on your learnings from last week.
What does the setting all.x = T mean when using the merge function. Hint: Type ?merge into the console.
Assignment 2
OK, now its your turn.
Plot the density of actual wheat yields by location. Stretch goal: remove the labels of the vertical axis.
Now plot the relationship between yield and total nitrogen application by location using a scatter plot (point) with the year indicated by the point’s colour.
Helpful links
To be a good coder, you also have to be a good “Googler” and user of what other people have done. These are excellent links: