During this lab, we are going to create some maps for Houston, visualizing certain emission data.

Download all the files from Canvas and set working directory properly.

Note: After finishing this R markdown file, knit it to html and submit the html file to the Canvas. Please type your code in the designated areas and write your answers below the questions.

Try to think hard and utilize the available resources (readme, google, etc.), before asking your TA for help.

#I. If and For loop If and For loops are very important flow control methods in programming language. Now let’s practice implementing them.

##I.a If statement Below is a very simple example of “If”.

{r <- 1
if (r == 1){
  print("r equals 1")
} else {
  print("r does not equal 1")
}}
## [1] "r equals 1"

If there are more than one conditions, we can use logical operaters to connect them. For instance, && means AND, || means or. Look at below statements. Run them and compare the results with what you thought.

TRUE&&TRUE
## [1] TRUE
TRUE&&TRUE&&FALSE
## [1] FALSE
TRUE&&TRUE||FALSE
## [1] TRUE
TRUE&&(TRUE||FALSE)
## [1] TRUE

There are several errors in below code. Correct and run it.

#correct the codes
num<-3
if (num>=3 || num==4){
  num = num + 1
  print("Bingo!")
} else {
  num = num -1
  print ("Oops!") 
}
## [1] "Bingo!"

Write an if statement: If an integer variable ‘a’ is odd, change its value to 3 times ‘a’ plus 1, otherwise change its value to half of its current value. Hint: look up %% operator

a=5
#write your codes in below blank
if (a%%3){
  a=(3*a)+1
} else {
  a=a/2
}
#
a
## [1] 16

#I.b For loop

Now let’s move to the For loop.

First let’s take a look at an easy way to generate an integer sequence.

This line generates an integer sequence from 1 to 10.

1:10
##  [1]  1  2  3  4  5  6  7  8  9 10

The below code generates a sequence from 5 to -5 and stores the sequence in variable “s”

s<-5:-5
s
##  [1]  5  4  3  2  1  0 -1 -2 -3 -4 -5

The colon “:” is commonly used to create the numerical sequence in R. However, different from that in Matlab, where the step of sequence can be specified, in R, the step of “:” is fixed to 1. But, through very simple transformation, you can still use “:” to generate sequence of any step.

Below, please create the sequence: 0, 0.15, 0.30, 0,45, …, 5.85, 6

#write you code in below blank
seq(from=0, to=6, by=0.15)
##  [1] 0.00 0.15 0.30 0.45 0.60 0.75 0.90 1.05 1.20 1.35 1.50 1.65 1.80 1.95 2.10
## [16] 2.25 2.40 2.55 2.70 2.85 3.00 3.15 3.30 3.45 3.60 3.75 3.90 4.05 4.20 4.35
## [31] 4.50 4.65 4.80 4.95 5.10 5.25 5.40 5.55 5.70 5.85 6.00

Please, run the below example of the for loop.

#create a vection containing names of my favorite fruits:)
fruits=c("banana","strawberry","kiwi","watermelon") 
#print out the names of fruits
for (i in fruits){
  print(i)
}
## [1] "banana"
## [1] "strawberry"
## [1] "kiwi"
## [1] "watermelon"

Please write your answer in the space below each question: 1. Whats the difference between the above for loop and the below line?

The above for loop places fruits as a column vector while the bottom line recalls the characters set to equal fruits as a sequence

fruits
## [1] "banana"     "strawberry" "kiwi"       "watermelon"
  1. See below another example of for loop and briefly describe what it’s doing.

The below loop is creating a collumn vector and squaring the numbers in the sequence 1:5

#print out the names of fruits
for (i in 1:5){
  print(i^2)
}
## [1] 1
## [1] 4
## [1] 9
## [1] 16
## [1] 25

Now, please write a for loop that prints out the decimal equivalent of the reciprocals 1/2, 1/4, 1/6, … , 1/18, 1/20.

#write your code in below blank
for (i in seq(from=2, to=20, by=2))
  print(1/i)
## [1] 0.5
## [1] 0.25
## [1] 0.1666667
## [1] 0.125
## [1] 0.1
## [1] 0.08333333
## [1] 0.07142857
## [1] 0.0625
## [1] 0.05555556
## [1] 0.05

HVData

Background (By Saurabh Maheshwari):

a.The data is related to the gas emissions in the houston ship channel by ship vessels between 11/11/13 - 11/15/13

b.The sensors installed in the channel capture the emissions for 10 gases (Co2, Ch3, etc.) in 15 minute intervals

c.We aim at mapping the nh3 emissions on 11/11/13 for cruising vessels using leaflet library

Data examination

Load the library leaflet.

library(leaflet)

Don’t forget to set the working directory to the one where the dataset is located. You can do so by going to Session > Set Working Directory > Choose Directory or using Ctrl + Shift + H (Windows :P)

Read data from “HVData.csv” and store them in variable “HVData”. During last lab, we used “import dataset” tool to get the data into R. That is not professional! Now we use command lines to achieve a similar function. For instance, we can take use of the “read.csv()” function.

#Complete below line
HVData<-read.csv("HVData.csv")

Use a function to obtain the total number of rows for dataframe “HVData”. Now you are not provided with the function. Instead, you should be able to search for it.

Try to google it now :)

#Complete below line
HVData_length<-nrow(HVData)
HVData_length
## [1] 10000

Examine the data set and answer,

3.Are there any pollutants missing some measurements? List all these pollutants.

There are some pollutants values missing for the CO2,CH4,and n2O

#write your code in below blank


``

## Data processing

Now for the coloumn of pollutant "ch4 measurement, we want to replace all the "NA"s with 0. Let's use a classic method first. In this exercise, we are going to take use of both for loop and if statement. The algorithm is designed as,

a. For an index variable i from 1 to the length of the column "ch4", use iteration:

b. If the i(th) element of column "ch4" is NA, then equal that element to 0.

Hint: The length of each column is equal to the number of rows of the whole dataframe. 

Hint: You can use function is.na(x) to check if x is NA.
## Error: attempt to use zero-length variable name
v1=NA
v2=c(1,NA,2)
is.na(v1)
is.na(v2)

Try to understand another example.

v3=c(1,2,3)
length_v3=length(v3)
for (i in 1:length_v3){
  if(v3[i]==2){
    v3[i]=0
  }
}

Now it’s your turn. (for the coloumn of pollutant “ch4 measurement, we want to replace all the”NA"s with 0)

(If you made some mistakes resulting in errors in HVData, just re-run the above codes from “Data examination”)

#Replace the ???? with correct codes
for(i in 1:length(HVData$ch4)){
  if(is.na(HVData$ch4[i])){
     HVData$ch4[i]=0 
  }
}

Do the same transformation for pollutant “co2”, but, you will use a different method, by taking use of which() function.

which() function is very useful in R. Now let’s look at some examples.

#firsr construct a vector with elements 1,4,6,11,4,7,8,3.
vector1=c(1,4,6,11,4,7,8,3)
#filter out the elements whose value is equal to 11, and output their indexes.
which(vector1==11)
## [1] 4
#filter out the elements whose value is equal to 4, and output their indexes.
which(vector1==4)
## [1] 2 5
#filter out the elements whose value is equal to 4, and output their values.
vector1[which(vector1==4)]
## [1] 4 4
#filter out the elements whose value is equal to 4, and change them to 444,
vector1[which(vector1==4)]=444
vector1
## [1]   1 444   6  11 444   7   8   3

Now, for “co2” column, change all the “NA” to 0.

Hint: The code takes only one line.

(If you made some mistakes resulting in errors in HVData, just re-run the above codes from “Data examination”)

#Replace the ???? with correct code
HVData$co2[which(is.na(HVData$co2))]=0

Data visualization

Now let’s start visualization for this data set.

First, we will create a leaflet map object.

map_object = leaflet()

You won’t see a map yet because it is still empty. So we will use addProviderTiles() function to add the background tile for it and make it a base map for our data.

map_base = addProviderTiles(map_object, providers$OpenStreetMap)

When in doubt regarding the arguments, look for help using “?”, For example:

?addProviderTiles
## starting httpd help server ... done

Then, we add some circle markers to the map, using addCircleMarkers() function, with red color and radius 1.

map_1 = addCircleMarkers(map_base, 
                         lng = HVData$Longitude,
                         lat = HVData$Latitude,
                         color = "red", 
                         radius = 1)
map_1
  1. Try to understand the above code and answer: What information is being shown on this map1?

It is the longitudes and latitudes given to us in the Data set, denoted by a red circle

The map1 we’ve just created, is a static map. Actually it is quite easy to create interactive map using leaflet.

For instance, we can add to the map:

Labels - text appearing when you hover upon the markers

Popup - text appearing when you click upon the markers

Now let’s do this! Let us add labels (nh3 emission values) and popups (Vessel Ids).

Hint: Labels and Popups should always be passed as character data types and you can use as.character() function to make the conversion.

#Replace the ???? with correct code
map_1 = addCircleMarkers(map_base, 
                         lng = HVData$Longitude, 
                         lat = HVData$Latitude,
                         color = "red",
                         radius = 1, 
                         label = as.character(HVData$nh3),
                         popup = as.character(HVData$VesselID)
                         )
map_1

Try to interect with the map to see if the Labels and Popups have been successfully added.

In the next step, we will change the color of the marks as per nh3 emission values. The vessels having nh3 emissions higher than 5 will be plotted black, otherwise red.

Hint: You will need to use “ifelse()” function for color selection. See below example. For more information, try to use “?ifelse” or google.

v=c(1,2,3,4,5,6)
ifelse(v>3,">3","<=3")
## [1] "<=3" "<=3" "<=3" ">3"  ">3"  ">3"

Finish the below part. The label and popup should be kept the same as previous.

#Replace the ???? with correct code
map_1 = addCircleMarkers(map_base, 
                         lng =HVData$Longitude, 
                         lat =HVData$Latitude,
                         color =ifelse(HVData$nh3>5,"black","red"), 
                         radius = 1, 
                         label =as.character(HVData$nh3),
                         popup =as.character(HVData$VesselID))

map_1

Change the color w.r.t nh3 emission mean values, i.e., if the emission is higher than mean, use “black”, otherwise “red”

#write your code in below blank
map_2 =  addCircleMarkers(map_base, 
                         lng =HVData$Longitude, 
                         lat =HVData$Latitude,
                         color =ifelse(HVData$nh3>mean(HVData$nh3),"black","red"), 
                         radius = 1, 
                         label =as.character(HVData$nh3),
                         popup =as.character(HVData$VesselID))

#
map_2

Change the label to “PositionTme”" and popup values to “landFlag”"

#write your code in below blank
map_3 =  addCircleMarkers(map_base, 
                         lng =HVData$Longitude, 
                         lat =HVData$Latitude,
                         color =ifelse(HVData$nh3>mean(HVData$nh3),"black","red"), 
                         radius = 1, 
                         label =as.character(HVData$PositionTme),
                         popup =as.character(HVData$LandFlag))

#
map_3

To add legend we can use the addLegend function().

map_1_legend = addLegend(map_1,
                         position = "bottomright",
                         colors = c("red", "black"),
                         labels = c("Below average","Above average"), 
                         title = "nh3 emissions")
map_1_legend

The HVData contains measurements of multiple days. But what if we only want to look at the data of a specific day, say “11/11/2011”? So we need to know how to filter data.

Let’s first look at the column “PositionTme”.

#print out the first two elements of PositionTme
HVData$PositionTme[1:2]
## [1] 11/12/2013 11:47:36 11/12/2013 18:45:12
## 8131 Levels: 11/11/2013 0:00:01 11/11/2013 0:00:02 ... 11/15/2013 9:47:35

We can see that the time is expressed in format “month/day/year hour:minute:second”. Since what we need is just date (“month/day/year”), we need to extract only the date part from the whole expression.

as.Date() function is commonly used in converting date format. Read and understand the below code, then try to add a new column “Date” to the “HVData”. The column “Date” should contain only the date part of the column “PositionTme”.

#extract date part from 
as.Date("2011/05/31","%Y/%m/%d")
## [1] "2011-05-31"
as.Date("03-14-1988 12:23:11","%m-%d-%Y")
## [1] "1988-03-14"

Hint: Pay attention to the connection sign between year, month and day.

#Add a new column "Date" to the HVData dataframe which contains the date for each measurement.  
#write your code in below blank
HVData$Date= HVData$PositionTme

HVData$Date<- as.Date("HVData$Positiontme","%Y-%m-%d")

Check if the column has been created and filled with correct content.

Now you are going to create a map for co2 emissions.

You are asked to filter the data with Date = “11/13/2013”, LandFlag as “Water”.

Hint: During filtering the dataframe, since you have two conditions, “Date and”Landflag" you will need to use a single “&” to connect the two conditions.

#Get only the measurements for "Water" landflag at "11/13/2013" 
#Replace the ???? with correct code
HVData_Water1113<-HVData[which(HVData$Data & HVData$water),] 

Use the new dataframe “HVData_Water1113” you’ve just constructed, to create the map for co2 emissions (with Date = “11/13/11”, LandFlag as “Water”). Use the color w.r.t mean values (if the emission is higher than mean, use “black”, otherwise “red”). Labelsare co2 emission values) and popups are Vessel Ids.

Don’t forget to add legend to the map.

#write your code in below blank
map_co2 = addCircleMarkers(map_base, 
                           lng = HVData$Longitude, 
                           lat = HVData$Latitude,
                           color=ifelse(v<435062,"black","red"),
                           radius=1,
                           label = as.character(HVData$co2),
                           popup = as.character(HVData$VesselID)
                           )
#write your code in below blank
map_co2_legend = addLegend(map_co2,
                           position = "bottomright",
                           colors = c("red","black"),
                           labels = c("Below average","Above average"),
                           title = "co2 emmision")

Run the below code to show the map.

map_co2_legend