PH251D Fall 2019 - Project 1

1. Using the `source` function

Create a R script file named `problem1.R` and save it into your project1 folder. Type `print("Hello World")` and `source` this file

printing “Hello World”

print("Hello World")

## [1] "Hello World"

Sourcing the file

source("/Users/dr.auwal/Desktop/Personals/UC Berkeley/Classes/Fall, 2019/PB HLTH 251D Applied Epidemiology Using R/Project/Project 1/problem1.R")

## [1] "Hello World"

2. Read an ASCII data set

The Evans data set (`evans.txt`) is here: https://github.com/taragonmd/data. Alternatively, here is the raw Evans data set: https://raw.githubusercontent.com/taragonmd/data/master/evans.txt. Demonstrate reading the Evans data file (evans.txt) to create a data frame, and use the `str` function to explore the structure of the data set. The data dictionary is in Appendix C of the PHDS book.Show the R code chunk and results below.

reading the Evans data file

evans<- read.table("https://raw.githubusercontent.com/taragonmd/data/master/evans.txt",header = TRUE, sep = '')

creating a data frame

data.frame.evans<-data.frame(evans)

using the `str` function

str(data.frame.evans)

## 'data.frame':    609 obs. of  12 variables:
##  $ id : int  21 31 51 71 74 91 111 131 141 191 ...
##  $ chd: int  0 0 1 0 0 0 1 0 0 0 ...
##  $ cat: int  0 0 1 1 0 0 0 0 0 0 ...
##  $ age: int  56 43 56 64 49 46 52 63 42 55 ...
##  $ chl: int  270 159 201 179 243 252 179 217 176 250 ...
##  $ smk: int  0 1 1 1 1 1 1 0 1 0 ...
##  $ ecg: int  0 0 1 0 0 0 1 0 0 1 ...
##  $ dbp: int  80 74 112 100 82 88 80 92 76 114 ...
##  $ sbp: int  138 128 164 200 145 142 128 135 114 182 ...
##  $ hpt: int  0 0 1 1 0 0 0 0 0 1 ...
##  $ ch : int  0 0 1 1 0 0 0 0 0 0 ...
##  $ cc : int  0 0 201 179 0 0 0 0 0 0 ...

3. Discretizing a continuous variable into a categorical variable

Use the `cut` function to discretize age into the following age categories and make a table of counts and a table of proportions.

30-39, 40-49, 50-59, 60-69, $>$70

Be sure to pay attention to age interval transitions.

Making a table of counts

evans$agecat <- cut(evans$age, breaks=c(30,40,50,60,70, 100),right=FALSE)
evans.table<- table(evans$agecat)
evans.table

## 
##  [30,40)  [40,50)  [50,60)  [60,70) [70,100) 
##        0      247      203      115       44

Making a Making a table of proportions

sweep(evans.table, 1, sum(evans.table), "/")

## 
##    [30,40)    [40,50)    [50,60)    [60,70)   [70,100) 
## 0.00000000 0.40558292 0.33333333 0.18883415 0.07224959

4. Working with dates and times

President Donald Trump was elected on “November 8, 2016”. Convert this character string into a R date object. Show how to use R to display (a) the Julian date; (b) the day of the week, and (c) the week of the year.

(a) the Julian date

trump<- as.Date('November 8, 2016', format = "%B %d, %Y")
trump

## [1] "2016-11-08"

julian(trump)

## [1] 17113
## attr(,"origin")
## [1] "1970-01-01"

(b) the day of the week

weekdays(trump)

## [1] "Tuesday"

(c) the week of the year

format(trump, format='%U')

## [1] "45"

5. Simple two-way analysis

Create a simple 2x2 table of smoking (`smk`) and coronary heart disease (`chd`). Use the `fisher.test` on this 2x2 table and describe your findings.

Creating a simple 2x2 table of smoking (`smk`) and coronary heart disease (`chd`)

evans2by2<- xtabs(~ smk+ chd, data = evans)
evans2by2

##    chd
## smk   0   1
##   0 205  17
##   1 333  54

Using the `fisher.test` on the 2x2 table

fisher.test(evans2by2)

## 
##  Fisher's Exact Test for Count Data
## 
## data:  evans2by2
## p-value = 0.02512
## alternative hypothesis: true odds ratio is not equal to 1
## 95 percent confidence interval:
##  1.079813 3.697097
## sample estimates:
## odds ratio 
##   1.953491

Describing findings:

It seems that smoking(smk) significantly increases the odds of coronary heart disease (chd). This is because the odds of chd is about 1.95 times higher among smk than Non-Smokers. This is also supported by significant p-value (less than the 0.05). So we reject the null hypotheses and accept the alternate hypothesis.

6. Write your own function

Now, write a function to calculate the risk ratio of your 2x2 table above. The exposure is smoking status and the outcome is coronary heart disease.

epitable <- function(a, b, c, d){
N1=a+c
N0=b+d
R0 <- b/N0
R1 <- a/N1
RR <- R1/R0
RR
}
epitable(54,17,333, 205)

## [1] 1.822161

7.

Now, use the `xtabs` function to create a 3-D array object of `chd`, `hpt`, and `smk`. Now use the `addmargins` function on this object.

using the `xtabs` function to create a 3-D array object of `chd`, `hpt`, and `smk

evans3darray <- xtabs(~ chd+ hpt + smk, data= evans)
evans3darray

## , , smk = 0
## 
##    hpt
## chd   0   1
##   0 122  83
##   1   6  11
## 
## , , smk = 1
## 
##    hpt
## chd   0   1
##   0 204 129
##   1  22  32

Using the `addmargins` function on the object

addmargins(evans3darray)

## , , smk = 0
## 
##      hpt
## chd     0   1 Sum
##   0   122  83 205
##   1     6  11  17
##   Sum 128  94 222
## 
## , , smk = 1
## 
##      hpt
## chd     0   1 Sum
##   0   204 129 333
##   1    22  32  54
##   Sum 226 161 387
## 
## , , smk = Sum
## 
##      hpt
## chd     0   1 Sum
##   0   326 212 538
##   1    28  43  71
##   Sum 354 255 609

8. Create a PNG graph and save file

From the Evans data create a histogram of age (`age`). Label with a title and axis labels. Output to a PNG file using the `png` function. Hint is provided.

png(file = "myplot.png")
hist(evans$age, breaks = 16,col = "brown", xlab='age', main='Histogram of age of Participants')
dev.off()

## quartz_off_screen 
##                 2

9. Display PNG file in your Rmarkdown document

Using Rmarkdown syntax, display the PNG file you created above. Hint: use the `include_graphics` function from the `knitr` package.

library(knitr)
include_graphics('myplot.png')

10. Using regular expressions

Here are the California counties: https://raw.githubusercontent.com/taragonmd/data/master/calcounty.txt Read in data using the `scan` function. Hint provided below. Remove the “California” entry. Use regular expressions to identify and display the County names that start with `"San "` and end with `"o"`.

Reading in data using the `scan` function

cac<-scan("https://raw.githubusercontent.com/taragonmd/data/master/calcounty.txt", what = "")

Removing the “California” entry

cac <- cac[cac!="California"]

Using regular expressions to identify and display the County names that start with `"San "` and end with `"o"`.

grep("^San.+o$", cac, value = TRUE)

## [1] "San Benito"      "San Bernardino"  "San Diego"       "San Francisco"  
## [5] "San Luis Obispo" "San Mateo"

PH251D Fall 2019 - Project 1

Auwal Abubakar

11/22/2019

1. Using the source function

Create a R script file named problem1.R and save it into your project1 folder. Type print("Hello World") and source this file

printing “Hello World”

Sourcing the file

2. Read an ASCII data set

reading the Evans data file

creating a data frame

using the str function

3. Discretizing a continuous variable into a categorical variable

Use the cut function to discretize age into the following age categories and make a table of counts and a table of proportions.

30-39, 40-49, 50-59, 60-69, $>$70

Be sure to pay attention to age interval transitions.

Making a table of counts

Making a Making a table of proportions

4. Working with dates and times

President Donald Trump was elected on “November 8, 2016”. Convert this character string into a R date object. Show how to use R to display (a) the Julian date; (b) the day of the week, and (c) the week of the year.

(a) the Julian date

(b) the day of the week

(c) the week of the year

5. Simple two-way analysis

Create a simple 2x2 table of smoking (smk) and coronary heart disease (chd). Use the fisher.test on this 2x2 table and describe your findings.

Creating a simple 2x2 table of smoking (smk) and coronary heart disease (chd)

Using the fisher.test on the 2x2 table

Describing findings:

6. Write your own function

Now, write a function to calculate the risk ratio of your 2x2 table above. The exposure is smoking status and the outcome is coronary heart disease.

7.

Now, use the xtabs function to create a 3-D array object of chd, hpt, and smk. Now use the addmargins function on this object.

using the xtabs function to create a 3-D array object of chd, hpt, and `smk

Using the addmargins function on the object

8. Create a PNG graph and save file

From the Evans data create a histogram of age (age). Label with a title and axis labels. Output to a PNG file using the png function. Hint is provided.

9. Display PNG file in your Rmarkdown document

Using Rmarkdown syntax, display the PNG file you created above. Hint: use the include_graphics function from the knitr package.

10. Using regular expressions

Reading in data using the scan function

Removing the “California” entry

Using regular expressions to identify and display the County names that start with "San " and end with "o".

1. Using the `source` function

Create a R script file named `problem1.R` and save it into your project1 folder. Type `print("Hello World")` and `source` this file

using the `str` function

Use the `cut` function to discretize age into the following age categories and make a table of counts and a table of proportions.

Create a simple 2x2 table of smoking (`smk`) and coronary heart disease (`chd`). Use the `fisher.test` on this 2x2 table and describe your findings.

Creating a simple 2x2 table of smoking (`smk`) and coronary heart disease (`chd`)

Using the `fisher.test` on the 2x2 table

Now, use the `xtabs` function to create a 3-D array object of `chd`, `hpt`, and `smk`. Now use the `addmargins` function on this object.

using the `xtabs` function to create a 3-D array object of `chd`, `hpt`, and `smk

Using the `addmargins` function on the object

From the Evans data create a histogram of age (`age`). Label with a title and axis labels. Output to a PNG file using the `png` function. Hint is provided.

Using Rmarkdown syntax, display the PNG file you created above. Hint: use the `include_graphics` function from the `knitr` package.

Reading in data using the `scan` function

Using regular expressions to identify and display the County names that start with `"San "` and end with `"o"`.