1. Using the source function
Create a R script file named problem1.R and save it into your project1 folder. Type print("Hello World") and source this file
printing “Hello World”
print("Hello World")
## [1] "Hello World"
Sourcing the file
source("/Users/dr.auwal/Desktop/Personals/UC Berkeley/Classes/Fall, 2019/PB HLTH 251D Applied Epidemiology Using R/Project/Project 1/problem1.R")
## [1] "Hello World"
2. Read an ASCII data set
The Evans data set (evans.txt) is here: https://github.com/taragonmd/data. Alternatively, here is the raw Evans data set: https://raw.githubusercontent.com/taragonmd/data/master/evans.txt. Demonstrate reading the Evans data file (evans.txt) to create a data frame, and use the str function to explore the structure of the data set. The data dictionary is in Appendix C of the PHDS book.Show the R code chunk and results below.
reading the Evans data file
evans<- read.table("https://raw.githubusercontent.com/taragonmd/data/master/evans.txt",header = TRUE, sep = '')
creating a data frame
data.frame.evans<-data.frame(evans)
using the str function
str(data.frame.evans)
## 'data.frame': 609 obs. of 12 variables:
## $ id : int 21 31 51 71 74 91 111 131 141 191 ...
## $ chd: int 0 0 1 0 0 0 1 0 0 0 ...
## $ cat: int 0 0 1 1 0 0 0 0 0 0 ...
## $ age: int 56 43 56 64 49 46 52 63 42 55 ...
## $ chl: int 270 159 201 179 243 252 179 217 176 250 ...
## $ smk: int 0 1 1 1 1 1 1 0 1 0 ...
## $ ecg: int 0 0 1 0 0 0 1 0 0 1 ...
## $ dbp: int 80 74 112 100 82 88 80 92 76 114 ...
## $ sbp: int 138 128 164 200 145 142 128 135 114 182 ...
## $ hpt: int 0 0 1 1 0 0 0 0 0 1 ...
## $ ch : int 0 0 1 1 0 0 0 0 0 0 ...
## $ cc : int 0 0 201 179 0 0 0 0 0 0 ...
3. Discretizing a continuous variable into a categorical variable
Use the cut function to discretize age into the following age categories and make a table of counts and a table of proportions.
30-39, 40-49, 50-59, 60-69, $>$70
Be sure to pay attention to age interval transitions.
Making a table of counts
evans$agecat <- cut(evans$age, breaks=c(30,40,50,60,70, 100),right=FALSE)
evans.table<- table(evans$agecat)
evans.table
##
## [30,40) [40,50) [50,60) [60,70) [70,100)
## 0 247 203 115 44
Making a Making a table of proportions
sweep(evans.table, 1, sum(evans.table), "/")
##
## [30,40) [40,50) [50,60) [60,70) [70,100)
## 0.00000000 0.40558292 0.33333333 0.18883415 0.07224959
4. Working with dates and times
President Donald Trump was elected on “November 8, 2016”. Convert this character string into a R date object. Show how to use R to display (a) the Julian date; (b) the day of the week, and (c) the week of the year.
(a) the Julian date
trump<- as.Date('November 8, 2016', format = "%B %d, %Y")
trump
## [1] "2016-11-08"
julian(trump)
## [1] 17113
## attr(,"origin")
## [1] "1970-01-01"
(b) the day of the week
weekdays(trump)
## [1] "Tuesday"
(c) the week of the year
format(trump, format='%U')
## [1] "45"
5. Simple two-way analysis
Create a simple 2x2 table of smoking (smk) and coronary heart disease (chd). Use the fisher.test on this 2x2 table and describe your findings.
Creating a simple 2x2 table of smoking (smk) and coronary heart disease (chd)
evans2by2<- xtabs(~ smk+ chd, data = evans)
evans2by2
## chd
## smk 0 1
## 0 205 17
## 1 333 54
Using the fisher.test on the 2x2 table
fisher.test(evans2by2)
##
## Fisher's Exact Test for Count Data
##
## data: evans2by2
## p-value = 0.02512
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
## 1.079813 3.697097
## sample estimates:
## odds ratio
## 1.953491
Describing findings:
It seems that smoking(smk) significantly increases the odds of coronary heart disease (chd). This is because the odds of chd is about 1.95 times higher among smk than Non-Smokers. This is also supported by significant p-value (less than the 0.05). So we reject the null hypotheses and accept the alternate hypothesis.
6. Write your own function
Now, write a function to calculate the risk ratio of your 2x2 table above. The exposure is smoking status and the outcome is coronary heart disease.
epitable <- function(a, b, c, d){
N1=a+c
N0=b+d
R0 <- b/N0
R1 <- a/N1
RR <- R1/R0
RR
}
epitable(54,17,333, 205)
## [1] 1.822161
7.
Now, use the xtabs function to create a 3-D array object of chd, hpt, and smk. Now use the addmargins function on this object.
using the xtabs function to create a 3-D array object of chd, hpt, and `smk
evans3darray <- xtabs(~ chd+ hpt + smk, data= evans)
evans3darray
## , , smk = 0
##
## hpt
## chd 0 1
## 0 122 83
## 1 6 11
##
## , , smk = 1
##
## hpt
## chd 0 1
## 0 204 129
## 1 22 32
Using the addmargins function on the object
addmargins(evans3darray)
## , , smk = 0
##
## hpt
## chd 0 1 Sum
## 0 122 83 205
## 1 6 11 17
## Sum 128 94 222
##
## , , smk = 1
##
## hpt
## chd 0 1 Sum
## 0 204 129 333
## 1 22 32 54
## Sum 226 161 387
##
## , , smk = Sum
##
## hpt
## chd 0 1 Sum
## 0 326 212 538
## 1 28 43 71
## Sum 354 255 609
8. Create a PNG graph and save file
From the Evans data create a histogram of age (age). Label with a title and axis labels. Output to a PNG file using the png function. Hint is provided.
png(file = "myplot.png")
hist(evans$age, breaks = 16,col = "brown", xlab='age', main='Histogram of age of Participants')
dev.off()
## quartz_off_screen
## 2
9. Display PNG file in your Rmarkdown document
Using Rmarkdown syntax, display the PNG file you created above. Hint: use the include_graphics function from the knitr package.
library(knitr)
include_graphics('myplot.png')

10. Using regular expressions
Here are the California counties: https://raw.githubusercontent.com/taragonmd/data/master/calcounty.txt Read in data using the scan function. Hint provided below. Remove the “California” entry. Use regular expressions to identify and display the County names that start with "San " and end with "o".
Reading in data using the scan function
cac<-scan("https://raw.githubusercontent.com/taragonmd/data/master/calcounty.txt", what = "")
Removing the “California” entry
cac <- cac[cac!="California"]
Using regular expressions to identify and display the County names that start with "San " and end with "o".
grep("^San.+o$", cac, value = TRUE)
## [1] "San Benito" "San Bernardino" "San Diego" "San Francisco"
## [5] "San Luis Obispo" "San Mateo"