These excercises accompany the Subsetting, Sorting and Dates tutorial.
read.csv() function and save it to a variable called so2.data. Create a histogram plot of the SO2 values.[]) to find out the value within the 15th row of the 4th column of the chicago_air dataset (the dataset we used during the lecture).subset() function, create a subset of the chicago_air ozone data by only selecting ozone values greater than .065 ppm. Assign this subsetted data to a new variable called high.ozonemy.date which contains a vector of two values, ‘October 7 2015’ and ‘November 2 2015’. Convert the values in my.date to a date class in R and save it as a variable called new.date. Convert the dates in new.date to a different date format: ‘mmddyyyy’. Save this to a new variable called new.date.format.my.date2. Use str() to see how the dates were imported. Convert the numbers in the date column into dates in R and save as new.date2. Then convert them to the format ‘mm-dd-yyyy’ and save this as new.date.format2.chicago_air data in descending order by ozone concentration and save the sorted data to a new variable called chi_air_desc. Using the write.csv() function, save this sorted data to a csv file on your thumb drive named “OzoneSorted.csv”.so2.data <- read.csv("E:/RIntro/datasets/so2_data.csv", header = TRUE, stringsAsFactors = FALSE)
hist(so2.data$SO2) #The missing value for the SO2 data (-999) is causing the histogram of SO2 data to look a little strange. We will learn later about how to remove these missing data.
library(region5air)
data(chicago_air)
chicago_air[15,4]
## [1] 0.66
head(chicago_air)
## date ozone temp solar month weekday
## 1 2013-01-01 0.032 17 0.65 1 3
## 2 2013-01-02 0.020 15 0.61 1 4
## 3 2013-01-03 0.021 28 0.17 1 5
## 4 2013-01-04 0.028 18 0.62 1 6
## 5 2013-01-05 0.025 26 0.48 1 7
## 6 2013-01-06 0.026 36 0.47 1 1
high.ozone <- chicago_air[(chicago_air$ozone > .065),] #Notice that the method utilizing brackets returns rows with NA values in the ozone column.
high.ozone <- subset(chicago_air, ozone > .065) # The subset() function automatically removes rows with NA values in the ozone column.
my.date <- c('October 7 2015','November 2 2015')
new.date <- as.Date(my.date,format="%B %d %Y")
new.date.format <- format(new.date, format='%m%d%Y')
new.date.format
## [1] "10072015" "11022015"
my.date2 <- read.csv('E:/RIntro/datasets/dates_values.csv', header = TRUE)
str(my.date2) #Notice these were imported as Excel integer dates
## 'data.frame': 10 obs. of 3 variables:
## $ Date : int 42393 42394 42395 42396 42397 42398 42399 42400 42401 42402
## $ Value : int 1 2 3 4 5 6 7 8 9 10
## $ Value2: int 5 8 -99 3 4 -999 6 1 3 4
new.date2 <- as.Date(my.date2[,1], origin = '1899-12-30') #Utilize the Excel origin date to convert these into an R date
new.date.format2 <- format(new.date2,format="%m-%d-%Y") #Change the format from the default R date format ("yyyy-mm-dd") to the format you provided
new.date.format2
## [1] "01-24-2016" "01-25-2016" "01-26-2016" "01-27-2016" "01-28-2016"
## [6] "01-29-2016" "01-30-2016" "01-31-2016" "02-01-2016" "02-02-2016"
library(region5air)
data(chicago_air)
chi_air_desc <- chicago_air[order(-chicago_air$ozone),]
head(chi_air_desc)
## date ozone temp solar month weekday
## 134 2013-05-14 0.081 74 1.40 5 3
## 252 2013-09-09 0.078 83 1.11 9 2
## 171 2013-06-20 0.074 80 1.35 6 5
## 139 2013-05-19 0.069 73 1.21 5 1
## 140 2013-05-20 0.069 81 1.38 5 2
## 121 2013-05-01 0.068 80 1.36 5 4
write.csv(chi_air_desc,file="E:/RIntro/datasets/OzoneSorted.csv")