Project 1

Use Rmarkdown to demonstrate the skills below. Be brief and clear. Provide explanations if necessary to ensure clarity.

1. Run the program file (filename1.R) using the ‘source’ command;

source("/Users/Michelle/Desktop/Berkeley MPH/Fall 2016/R for Epi's/R Program Files/Project1Import.R", echo=TRUE)

## 
## > DuoName <- c("Batman", "Star Wars", "Sesame Street", 
## +     "Mario Bros", "Bonnie and Clyde", "Simon and Garfunkel", 
## +     "Star Trek", "X-Files")
## 
## > Hero <- c("Batman", "Han Solo", "Bert", "Mario ", 
## +     "Bonnie Parker", "Paul Simon", "James Tiberius Kirk", "Mulder")
## 
## > Sidekick <- c("Robin", "Chewbacca", "Ernie", "Luigi", 
## +     "Clyde Barrow", "Art Garfunkel", "Spock", "Scully")
## 
## > DuoTable <- data.frame(DuoName, Hero, Sidekick)
## 
## > DuoTable
##               DuoName                Hero      Sidekick
## 1              Batman              Batman         Robin
## 2           Star Wars            Han Solo     Chewbacca
## 3       Sesame Street                Bert         Ernie
## 4          Mario Bros              Mario          Luigi
## 5    Bonnie and Clyde       Bonnie Parker  Clyde Barrow
## 6 Simon and Garfunkel          Paul Simon Art Garfunkel
## 7           Star Trek James Tiberius Kirk         Spock
## 8             X-Files              Mulder        Scully

2. Demonstrate reading an ASCII data file (filename2.dat) to create a ‘data frame’;

source("/Users/Michelle/Desktop/Berkeley MPH/Fall 2016/R for Epi's/R Program Files/ASCI_1", echo=TRUE)

## 
## > Name <- c("Eileen", "Michelle", "Tom", "Nicki")
## 
## > Role <- c("Mother", "Daughter", "Son", "Daugter")
## 
## > Occupation <- c("Engineer", "Clerk", "Ambulance Driver", 
## +     "Medical Assistant")
## 
## > AgeYears <- c(60, 30, 23, 21)
## 
## > AgeMonths <- c(AgeYears * 12)
## 
## > Family <- data.frame(Name, Role, Occupation, AgeYears, 
## +     AgeMonths)
## 
## > Family
##       Name     Role        Occupation AgeYears AgeMonths
## 1   Eileen   Mother          Engineer       60       720
## 2 Michelle Daughter             Clerk       30       360
## 3      Tom      Son  Ambulance Driver       23       276
## 4    Nicki  Daugter Medical Assistant       21       252

3. Demonstrate simple data manipulation (e.g., variable transformation, recoding, etc.);

Recode babies’ weights into categories

BirthWeight<-read.table ("http://www.medepi.net/data/birthwt9.txt", header=TRUE)
BirthWeight$bwt[1:100]

##   [1] 3860 3870 2440 3400 3480 2820 2940 2400 3080 3600 3600 3930 3380 3440
##  [15] 3000 3420 3300 3340 3860 3120 3010 3770 3380 3370 3200 3400 2840 3940
##  [29] 2700 3500 3580 3360 3130 3090 3340 2920 3300 4300 2940 3520 3500 3500
##  [43] 3040 2280 3800 2960 3460 4640 2870 3980 3840 2860 1780 3380 4060 3100
##  [57] 2730 3580 3200 2830 3980 3960 3560 3340 3580 3900 3700 3260 3960 4000
##  [71] 3300 3700 2800 3240 4400 3100 3050 3360 3700 3580 3180 3560 3000 3200
##  [85] 3060 3470 3310 4060 3740 3140 2995 3020 3180 2980 3440 4080 2710 3250
##  [99] 3950 3420

WeightLabels <- c('<2499g', '2500-2999g', '3000-3499', '3500-3999g', '>4000g')
WeightCats <- cut(BirthWeight$bwt, breaks = c(0, 2500, 3000, 3500, 4000, 7000), right = FALSE, labels = WeightLabels)
table(WeightCats)

## WeightCats
##     <2499g 2500-2999g  3000-3499 3500-3999g     >4000g 
##         11         46        163        156         51

4. Demonstrate the use of calendar and Julian dates;

Age.hug is the person’s age in years at the time I last hugged them

Name<-c("Eileen", "Michelle", "Tom", "Nicki")
Role<-c("Mother", "Daughter", "Son", "Daugter")
Occupation<-c("Engineer", "Clerk", "Ambulance Driver", "Medical Assistant")
BirthDate<-c("01/14/1956", "04/06/1986", "04/09/1993", "01/05/1995")
DateLastHugged<-c("11/13/2016", "11/30/2016", "11/13/2016", "11/13/2016")
Family<-data.frame(Name,Role,Occupation,BirthDate,DateLastHugged)
Family

##       Name     Role        Occupation  BirthDate DateLastHugged
## 1   Eileen   Mother          Engineer 01/14/1956     11/13/2016
## 2 Michelle Daughter             Clerk 04/06/1986     11/30/2016
## 3      Tom      Son  Ambulance Driver 04/09/1993     11/13/2016
## 4    Nicki  Daugter Medical Assistant 01/05/1995     11/13/2016

BDate.standard <- as.Date(BirthDate, format="%m/%d/%Y")
FamilyStandard <- cbind(Family, BDate.standard)
Hug.standard <- as.Date(DateLastHugged, format="%m/%d/%Y")
Bdate.Num <- as.numeric(BDate.standard)
Hug.Num <- as.numeric(Hug.standard)
Age.Hug <- round(((Hug.Num-Bdate.Num)/365), digits=2)
Age.Hug <- paste (Age.Hug, "years")
FamilyStandard <- cbind(FamilyStandard, Hug.standard, Age.Hug)
FamilyStandard

##       Name     Role        Occupation  BirthDate DateLastHugged
## 1   Eileen   Mother          Engineer 01/14/1956     11/13/2016
## 2 Michelle Daughter             Clerk 04/06/1986     11/30/2016
## 3      Tom      Son  Ambulance Driver 04/09/1993     11/13/2016
## 4    Nicki  Daugter Medical Assistant 01/05/1995     11/13/2016
##   BDate.standard Hug.standard     Age.Hug
## 1     1956-01-14   2016-11-13 60.87 years
## 2     1986-04-06   2016-11-30 30.67 years
## 3     1993-04-09   2016-11-13 23.61 years
## 4     1995-01-05   2016-11-13 21.87 years

5. Conduct a simple analysis using existing functions (from R, colleagues, etc.);

OddsRatioFun <- function(x){
A = x[1, 1]
B = x[1, 2]
C = x[2, 1]
D = x[2, 2]
crossprod.OR = (A*D)/(B*C)
list(data = x, odds.ratio = crossprod.OR)
}

WNV <- read.csv ("http://www.medepi.net/data/wnv2004fin.txt")
WNVDeathTab = xtabs(~Death + Sex, data=WNV)
OddsRatioFun(WNVDeathTab)

## $data
##      Sex
## Death   F   M
##   No  261 423
##   Yes   9  18
## 
## $odds.ratio
## [1] 1.234043

6. Conduct a simple analysis demonstrating simple programming (e.g., a ‘for’ loop);

I use Excel a lot and when I have code to write for a lot of cells, I automate my text using concatenate in Excel. Here is a way to do so with R.

for (i in 1:10) {
  print (paste("=If (A", i, "= '', B", i, "= '', B", i, "= A", i, ")", sep=""))
}

## [1] "=If (A1= '', B1= '', B1= A1)"
## [1] "=If (A2= '', B2= '', B2= A2)"
## [1] "=If (A3= '', B3= '', B3= A3)"
## [1] "=If (A4= '', B4= '', B4= A4)"
## [1] "=If (A5= '', B5= '', B5= A5)"
## [1] "=If (A6= '', B6= '', B6= A6)"
## [1] "=If (A7= '', B7= '', B7= A7)"
## [1] "=If (A8= '', B8= '', B8= A8)"
## [1] "=If (A9= '', B9= '', B9= A9)"
## [1] "=If (A10= '', B10= '', B10= A10)"

7. Conduct a simple analysis demonstrating an original function created by student;

#function for Kappa Statistic
KappaFun <- function(a){
A = a[1, 1]
B = a[1, 2]
C = a[2, 1]
D = a[2, 2]

#calculate chance pos agreement
PC = (((A+B)/(A+B+C+D))*((A+C)/(A+B+C+D)))*(A+B+C+D)
#calculate chance neg agreement
NC = (((C+D)/(A+B+C+D))*((B+D)/(A+B+C+D)))*(A+B+C+D)
#calculate Cc
Cc = (PC + NC) / (A+B+C+D)
#calculate O
O = (A+D)/(A+B+C+D)
#calculate Kappa
Kappa = (O-Cc)/(1-Cc)
Kappa
}

#write in 2x2 table
m<-c(12,6)
n<-c(3,3)
p<-data.frame(m,n)

#apply function
KappaFun(p)

## [1] 0.1428571

8. Create a simple graph with title, axes labels and legend, and output to file;

Year <-c (2003, 2004,   2005,   2006,   2007,   2008,   2009,   2010,   2011,   2012,   2013,   2014)
AsotinC <- c (1270.1,   945.9,  543.5,  812.5,  471.7,  1296,   979.7,  1863.8, 1538.9, 2840,   2090.1, 2283.5)
AdamsC <- c (649.4, 379.3,  1190.5, 685.4,  550.1,  1036.6, 1711.5, 849,    1220.8, 1408.7, 1036,   844.3)
CountyCompare <- data.frame(Year, AdamsC, AsotinC)

setwd("~/")
getwd()

## [1] "/Users/Michelle"

png(filename="PlotSave.png")

plot(CountyCompare$Year, CountyCompare$AsotinC, type="l", lwd=2, xlab="Year", ylab="Rate per 100,000", main="Teen Chlamydia Rates per 100,000")
legend (2003, 2500, c("Asotin County", "Adams County"), lwd=2, col=1:2)
lines(CountyCompare$Year, CountyCompare$AdamsC, lwd=2, col=2)

dev.off()

## quartz_off_screen 
##                 2

library(png)
OpenSaves <- readPNG("/Users/Michelle/PlotSave.png")
grid::grid.raster(OpenSaves)

9. Demonstrate the use of regular expressions;

I love Scrabble and I wanted to search through all three letter words and identify those with a certain letter in them - in this case “Q”.

Scrabs <- read.csv ("/Users/Michelle/Desktop/Berkeley MPH/Fall 2016/R for Epi's/ScrabbleList.csv", header=TRUE)
VectScrab <- as.vector(Scrabs$Word)
VectScrab[1:100] #view first 100 words

##   [1] "AAH" "AAL" "AAS" "ABA" "ABO" "ABS" "ABY" "ACE" "ACT" "ADD" "ADO"
##  [12] "ADS" "ADZ" "AFF" "AFT" "AGA" "AGE" "AGO" "AGS" "AHA" "AHI" "AHS"
##  [23] "AID" "AIL" "AIM" "AIN" "AIR" "AIS" "AIT" "ALA" "ALB" "ALE" "ALL"
##  [34] "ALP" "ALS" "ALT" "AMA" "AMI" "AMP" "AMU" "ANA" "AND" "ANE" "ANI"
##  [45] "ANT" "ANY" "APE" "APO" "APP" "APT" "ARB" "ARC" "ARE" "ARF" "ARK"
##  [56] "ARM" "ARS" "ART" "ASH" "ASK" "ASP" "ASS" "ATE" "ATT" "AUK" "AVA"
##  [67] "AVE" "AVO" "AWA" "AWE" "AWL" "AWN" "AXE" "AYE" "AYS" "AZO" "BAA"
##  [78] "BAD" "BAG" "BAH" "BAL" "BAM" "BAN" "BAP" "BAR" "BAS" "BAT" "BAY"
##  [89] "BED" "BEE" "BEG" "BEL" "BEN" "BES" "BET" "BEY" "BIB" "BID" "BIG"
## [100] "BIN"

grep("Q", VectScrab)

## [1] 719 720 721 834

VectScrab[grep("Q", VectScrab)]

## [1] "QAT" "QIS" "QUA" "SUQ"

10. Demonstrate the use of the ‘sink’ function to generate an output file

first.names <- c("Michael", "Justin", "Anibal", "Miguel")
last.names <- c("Fulmer", "Verlander", "Sanchez", "Cabrera") 
HeightIn <- c ("75", "77", "72", "76")
Weight <- c("210", "225", "205", "240")
DOB <- c("03/15/1993", "02/20/1983", "02/27/1984", "04/18/1983")
Tigers <- data.frame(first.names, last.names, HeightIn, Weight, DOB)
Tigers

##   first.names last.names HeightIn Weight        DOB
## 1     Michael     Fulmer       75    210 03/15/1993
## 2      Justin  Verlander       77    225 02/20/1983
## 3      Anibal    Sanchez       72    205 02/27/1984
## 4      Miguel    Cabrera       76    240 04/18/1983

sink ("/Users/Michelle/Desktop/Berkeley MPH/Fall 2016/R for Epi's/Tigers3.log")
cat("Ball players", fill = TRUE)

## Ball players

show(Tigers)

##   first.names last.names HeightIn Weight        DOB
## 1     Michael     Fulmer       75    210 03/15/1993
## 2      Justin  Verlander       77    225 02/20/1983
## 3      Anibal    Sanchez       72    205 02/27/1984
## 4      Miguel    Cabrera       76    240 04/18/1983

sink()

Project 1

Michelle Leishman

11/30/2016

1. Run the program file (filename1.R) using the ‘source’ command;

2. Demonstrate reading an ASCII data file (filename2.dat) to create a ‘data frame’;

3. Demonstrate simple data manipulation (e.g., variable transformation, recoding, etc.);

Recode babies’ weights into categories

4. Demonstrate the use of calendar and Julian dates;

Age.hug is the person’s age in years at the time I last hugged them

5. Conduct a simple analysis using existing functions (from R, colleagues, etc.);

6. Conduct a simple analysis demonstrating simple programming (e.g., a ‘for’ loop);

I use Excel a lot and when I have code to write for a lot of cells, I automate my text using concatenate in Excel. Here is a way to do so with R.

7. Conduct a simple analysis demonstrating an original function created by student;

8. Create a simple graph with title, axes labels and legend, and output to file;

9. Demonstrate the use of regular expressions;

I love Scrabble and I wanted to search through all three letter words and identify those with a certain letter in them - in this case “Q”.

10. Demonstrate the use of the ‘sink’ function to generate an output file