Some important commands are listed below. The . indicate blanks where you would fill in missing info.
?... # Loads help files for a specific command
?getwd # Loads help files for "getwd"
??getwd # Searches for the word "getwd" in all the help files
setwd("C:\mypath...") # Sets the working directory to whatever path you specify
getwd() # Shows the current working directory
dir() # List the files in the working directory
ls() # List the variables in your R workspace
# "#" Indicates a comment. The text . is ignored
install.packages(".") # Gets user-created packages from online repositories
library(.) # Loads packages that you've installed
Start R studio (“Start”>“RStudio”)
First, put the name of your class directory into a variable “mydir”.
mydir = "C:/myRdir" # This puts the text for your directory in the variable mydir
# --> The text is whatever directory your GEOG576 HW is in.
mydir # I always type the name of the variable after creating it to see if R did what I expect.
Now, go to your directory using setwd(), and list the contents:
setwd(mydir)
dir()
Click “File>New File > RScript” Put the commands from the two boxes above into the Rscript window, highlight the text and click >Run. You can also run the lines by clicking on the first line, then click “>Run” repeatedly to run each line.
Here I have published some data in a google sheet to the web using File -> Publish to the web, csv option. Doing this gives you a url that can be referenced in R to read in the data.
poway.url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vT4Ov3nGPpsPjPz1ZgbkOipHGArx4V4g_LTZl-Gdv4G-3NwY-ZE07s2g8TEMQOWU2peawDaeVS3z6Br/pub?output=csv"
poway.precip.daily = read.csv(url(poway.url))
See how the functions read.csv() and url() work. Type:
?read.csv
?url
*** Copy and paste this url into your brower: https://docs.google.com/spreadsheets/d/e/2PACX-1vT4Ov3nGPpsPjPz1ZgbkOipHGArx4V4g_LTZl-Gdv4G-3NwY-ZE07s2g8TEMQOWU2peawDaeVS3z6Br/pub?output=csv
and save the csv file into a directory on your machine. You could call this file “poway_precip_daily.csv”.
Now, load the csv. Make sure you used setwd to get into the directory where you put the file:
#poway.precip.daily = read.csv("poway_precip_daily.csv")
Now, look at the variable that you just loaded:
poway.precip.daily
But this prints out a lot of data–too much! To just see the first, or last, 5 lines of x:
head(poway.precip.daily)
## STATION DATE LATITUDE LONGITUDE ELEVATION NAME
## 1 USC00047111 1893-01-01 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## 2 USC00047111 1893-01-02 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## 3 USC00047111 1893-01-03 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## 4 USC00047111 1893-01-04 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## 5 USC00047111 1893-01-05 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## 6 USC00047111 1893-01-06 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## PRCP.0.1mm PRCP.mm PRCP_ATTRIBUTES
## 1 0 0 P,,6,
## 2 0 0 P,,6,
## 3 0 0 P,,6,
## 4 0 0 P,,6,
## 5 0 0 P,,6,
## 6 0 0 P,,6,
tail(poway.precip.daily)
## STATION DATE LATITUDE LONGITUDE ELEVATION NAME
## 28436 USC00047111 7/28/2022 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## 28437 USC00047111 7/29/2022 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## 28438 USC00047111 7/30/2022 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## 28439 USC00047111 7/31/2022 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## 28440 USC00047111 9/6/2022 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## 28441 USC00047111 9/7/2022 33.01953 -117.031 197.5 POWAY VALLEY, CA US
## PRCP.0.1mm PRCP.mm PRCP_ATTRIBUTES
## 28436 0 0 ,,7,1600
## 28437 0 0 ,,7,1600
## 28438 0 0 ,,7,1600
## 28439 0 0 ,,7,1600
## 28440 0 0 ,,H,1600
## 28441 0 0 ,,H,1600
What do each of the following commands do?:
names(poway.precip.daily)
poway.precip.daily[,1]
poway.precip.daily[,2]
poway.precip.daily[1:10,]
poway.precip.daily[1:10,"PRCP.mm"]
Date variables can be created using “as.Date”, which creates something called a Date date-time object.
For more information about dates and date-time objects, review “A Quick introduction to R”, and see https://www.stat.berkeley.edu/~s133/dates.html.
Check out the help for as.Date:
?as.Date
What is the class of your DATE object?
class(poway.precip.daily$DATE)
## [1] "character"
Characters won’t plot right…Dates have to be a “Date” class.
poway.precip.daily$Date = as.Date(poway.precip.daily$DATE,format="%m/%d/%Y")
# I often use "Date" in lower case to indicate it's my own formatted Date...
head(poway.precip.daily$Date)
## [1] NA NA NA NA NA NA
What class is the Date column?
Now look at subsets of the data by year and by precipitation threshold:
poway.precip.daily[poway.precip.daily$Date >= as.Date("1990-01-01"),]
poway.precip.daily[poway.precip.daily$PRCP.mm>50,]
Check out the plot command in help using ?plot
plot(poway.precip.daily$Date,poway.precip.daily$PRCP.mm,type="l") # "l" means plot it as a "line"
Now change the axis labels:
plot(poway.precip.daily$Date,poway.precip.daily$PRCP.mm,type="l",xlab="",ylab="Daily precipitation at poway, mm")
There’s a big gap in the middle. You can restrict the range of the
x-axis using the “xlim” option:
plot(poway.precip.daily$Date,poway.precip.daily$PRCP.mm,type="l",xlab="",ylab="Daily precipitation at poway, mm",xlim=as.Date(c("1955-01-01","2022-12-31")))
Or, you can create a new data.frame with just the data from ~1955-2022.
How can you find out the first date with data after the 1910-1950s gap?
Hint: use View…
Now, use that date to create a new data.frame with just the data from 195x-2022. Note: you have to change the string “19xx-xx-xx” to the date you find:
poway.precip.daily.sub = poway.precip.daily[poway.precip.daily$Date >= as.Date("19xx-xx-xx")]
Now plot the smaller data.frame, and make sure it removed the earlier
data.
I’m not showing the code for how to do this…Hint: you don’t need the
xlab option anymore:
daily.sorted = sort(poway.precip.daily$PRCP.mm,decreasing=TRUE)
daily.sorted[1:20]
## [1] 86.9 82.3 79.5 78.0 77.5 77.0 73.7 71.1 63.0 62.0 61.0 59.7 59.2 58.9 58.4
## [16] 58.2 57.7 56.4 56.4 56.1
To calculate total precipitation for a water year, we need to have a column that defines the water year for each day, and sum the precipitation values for those labels (yikes!).
Note: I found the solution below at http://stackoverflow.com/questions/22073881/hydrological-year-time-series after googling “R hydroTSM october monthly2annual”)
# First, generate a vector of "breaks", starting on Oct 1 and going by year.
# 1962 is the first year of data at Poway.
breaks <- seq(as.Date("1956-10-01"), length=(2023-1956), by="year")
# Length is the number of years you want the breaks for.
years.breaks = as.numeric(format(breaks,"%Y"))
labels.water.year = years.breaks[2:length(breaks)] # Why are we leaving off the first year in the water years label?
poway.precip.daily.sub$WaterYear <- cut(poway.precip.daily.sub$Date, breaks,labels=labels.water.year)
Check to make sure the hydroYears were assigned correctly by looking at the first 600 lines:
poway.precip.daily.sub[1:400,] # See the first 400 days of daily data
x.annual.wy = aggregate(poway.precip.daily.sub$PRCP.mm,by=list(poway.precip.daily.sub$WaterYear),FUN=sum)
names(x.annual.wy) = c("WaterYear","Rainfall.mm")
# x.daily$Group.1 is a "factor" class, which is annoying and doesn't plot right.
# check to see the results make sense
head(x.annual.wy)
## WaterYear Rainfall.mm
## 1 1957 297.9
## 2 1958 82.8
## 3 1959 128.6
## 4 1960 261.2
## 5 1961 125.1
## 6 1962 312.6
Do these values seem reasonable? How can you know?
What class is WaterYear?
Convert it to a numerical value. Note that you have to do “as.character”
first to turn a factor class into a character class…
x.annual.wy$WaterYear = as.numeric(as.character(x.annual.wy$WaterYear))
x.annual.wy.count = aggregate(poway.precip.daily.sub$PRCP.mm,by=list(poway.precip.daily.sub$WaterYear),FUN=length)
names(x.annual.wy.count) = c("WaterYear","Count.dates")
x.annual.wy.count$WaterYear = as.numeric(as.character(x.annual.wy.count$WaterYear))
head(x.annual.wy.count)
## WaterYear Count.dates
## 1 1957 365
## 2 1958 214
## 3 1959 365
## 4 1960 364
## 5 1961 365
## 6 1962 365
How many years have less than 365 days of data? Hint: create a new data.frame from x.annual.wy.count where Count.dates is >= 365…
How could you remove the WaterYears with less than 365 days of data?
plot(x.annual.wy$WaterYear,x.annual.wy$Rainfall.mm,type="l",ylab="Annual precip, mm",xlab="Water year")
Usually, rainfall is plotted as a bar chart, not a line, since the rainfall happens over a year, not at a point.
# Note: I found this code by googling "R barplot of annual precipitation"
b.plot <- barplot(height = x.annual.wy$Rainfall.mm, names.arg = x.annual.wy$WaterYear,
xlab="Year", ylab="Precipitation (mm)")
…and plot WaterYears with all 365 days as a blue dot:
x.annual.wy.sub = x.annual.wy[x.annual.wy.count$Count.dates>=365,]
plot(x.annual.wy$WaterYear,x.annual.wy$Rainfall.mm,type="l",ylab="Annual precip, mm",xlab="Water year")
points(x.annual.wy.sub$WaterYear,x.annual.wy.sub$Rainfall.mm,col="blue")
# And add a legend
legend("topleft",c("All WY",">365 days of data"),col=c("black","blue"),pch=c(NA,1),lty=c(1,NA))
Do WaterYears with <365 days of data make a big difference? In which years?
Make a plot showing the # of days with data (y-axis) vs the WaterYear (x-axis)
I won’t show the code for this…see part b above. Hint: what function should you use instead of the sum?
## Group.1 x
## 1 1957 32.8
## 2 1958 19.6
## 3 1959 24.1
## 4 1960 27.9
## 5 1961 24.9
## 6 1962 58.2
why is this plot funny? How do you fix it? I’m not showing the
code…hint: what class is WaterYear?
Add a trend line:
# Linear model
lm.daily = lm(Dailymax.rainfall.mm ~ WaterYear, x.dailymax.wy)
summary(lm.daily)
##
## Call:
## lm(formula = Dailymax.rainfall.mm ~ WaterYear, data = x.dailymax.wy)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.30 -11.72 -2.60 10.45 38.79
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -205.6330 202.3697 -1.016 0.313
## WaterYear 0.1237 0.1017 1.216 0.228
##
## Residual standard error: 15.74 on 64 degrees of freedom
## Multiple R-squared: 0.0226, Adjusted R-squared: 0.007326
## F-statistic: 1.48 on 1 and 64 DF, p-value: 0.2283
Add a trendline to your plot:
plot(x.dailymax.wy$WaterYear,x.dailymax.wy$Dailymax.rainfall.mm,type="l",ylab="Maximum daily precip, inches",xlab="Water year")
abline(lm.daily)
Plot the residuals on a normal probability plot:
lm.stdres = rstandard(lm.daily)
qqnorm(lm.stdres, ylab="Standardized Residuals", xlab="Normal Scores", main="Maximum daily rainfall: Annual maximum series")
qqline(lm.stdres)
install.packages("Kendall")
library(Kendall)
poway.mk.test = MannKendall(x.dailymax.wy$Dailymax.rainfall.mm)
summary(poway.mk.test)
## Score = 227 , Var(Score) = 32639
## denominator = 2138.992
## tau = 0.106, 2-sided pvalue =0.21095