A. What is a basic workflow for reproducible research?
The basic workflow for reproducible research is data gathering, data analysis, and results presentaion.
B. What are five practical tips for making research reproducible?
Five practical tips for making research reproducible accoding to “Reproducible Research with R and RStudio 2nd edition” are:
1.) Document everything.
2.) Everything is a (text) file. Use the simplest file format it is also the most versatile.
3.) All files should be human readable.
4.) Explicitly tie your files together.
5.) Have a plan to organize, store, and make your files available.
C. Give an example of how you might implement each tip.
1.) Document everything. Readers should be able to understand how the researcher got their data, analysed it, and how they then presented the results. It is even important to document session info in R so future researches can understand what packages were used to make the data even more reproduciple.
2.) Everything is a (text) file. Use the simplest file format it is also the most versatile. Text file in its simplest form makes research “future-proof”. Programs like Word and Excel change regularly through updates. By using a simple text file it allows future researchers to apply the data to whatever program is the most idyllic at the time.
3.) All files should be human readable. Naturally all research files should be written in a way assuming somene who has not worked on the project will be able to understand them at some future time. This can be done through comment codes that communicate the intention or purpose of the code.
4.) Explicitly tie your files together. If all the data is in text file format then the project is individual files with a relationship to one another. To make the study more reproducible it is important to explicitly tie together pdf analysis and charts with the data document that the presenation is drawn from. It is important to make the links between files explicit.
5.) Have a plan to organize, store, and make your files available. Files need to be organized so that independent researchers can understand how everything fits together. One tip is to organize research in files limiting the content any one file has. Placing all the data, statistical models, and results with figures and tables into one document could make it difficult analyze the document. Making files modular and linked is preferred.
D. Which one of these do you think will be the most difficult?
air_hist.R - Unit 2 Live Session Homework
Daily air quality measurements in New York from May to September 1973
str(airquality)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
# Take only Temp columns
Temperature <- airquality$Temp
hist(Temperature)

# Histogram with added parameters
hist(Temperature,
main="Maximum daily temperature at La Guardia Airport",
xlab="Temperature in degrees Fahrenheit",
xlim=c(50,100),
col="darkmagenta",
freq=FALSE
)

# TODO: Assignment 2, Q2A: Complete the following code to yield a scatterplot with x as Month and y as Temp
# You're going to customize your plot slightly. Use the help function to assist you if needed.
# Make the x label "Month" and the y label "Temperature"
# Finally, make the title of the plot "Temperature by Month"
plot(x=airquality$Month, y=airquality$Temp,
main = "Temperature by Month",
xlab = "Month", ylab = "Temperature")

# TODO: Assignment 2, Q2B: Build a scatter plot with x as Temperature and y as Ozone
# Complete the following code:
# Make the x label "Temperature" and the y label "Ozone",
# Make the title of the plot "Temperature vs Ozone"
plot(x=airquality$Temp, y=airquality$Ozone, main = "Temperature vs Ozone", xlab = "Temperature", ylab = "Ozone")

summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
plot(cars)

Including Plots
summary(pressure)
## temperature pressure
## Min. : 0 Min. : 0.0002
## 1st Qu.: 90 1st Qu.: 0.1800
## Median :180 Median : 8.8000
## Mean :180 Mean :124.3367
## 3rd Qu.:270 3rd Qu.:126.5000
## Max. :360 Max. :806.0000
You can also embed plots as default, for example:
plot(x = pressure$pressure, y = pressure$temperature, main = "Temperature vs Pressure", xlab = "Pressure", ylab = "Temperature" )

Now flip it!
plot(x = pressure$temperature, y = pressure$temperature, main = "Pressure vs Temperature", xlab = "Pressure", ylab = "Temperature" )
