PH 251D Final Project: W. J. Vincent

1. Run a program (filename) using the 'source' command;

source("filename1.R", echo = TRUE)

## 
## > Status <- rep(c("Dead", "Survived"), 4)
## 
## > Treatment <- rep(c("Tolbutamide", "Placebo"), c(2, 
## +     2), 2)
## 
## > Agegrp <- rep(c("<55", "55+"), c(4, 4))
## 
## > Freq <- c(8, 98, 5, 115, 22, 76, 16, 69)
## 
## > dat <- data.frame(Status, Treatment, Agegrp, Freq)
## 
## > dat
##     Status   Treatment Agegrp Freq
## 1     Dead Tolbutamide    <55    8
## 2 Survived     Placebo    <55   98
## 3     Dead Tolbutamide    <55    5
## 4 Survived     Placebo    <55  115
## 5     Dead Tolbutamide    55+   22
## 6 Survived     Placebo    55+   76
## 7     Dead Tolbutamide    55+   16
## 8 Survived     Placebo    55+   69

2. Demonstrate reading an ASCII data file (filename2.dat) to creat a 'data frame';

oswego <- read.csv("http://medepi.net/data/oswego/oswego2.csv", header = TRUE)
anthrax <- read.csv("http://www.medepi.net/data/anthrax_labs.txt", header = TRUE)
head(oswego)

##   ID Age Sex   MealDate MealTime Ill OnsetDate TimeOnset Baked.Ham Spinach
## 1  2  52   F 04/18/1940 20:00:00   Y 4/19/1940  00:30:00         Y       Y
## 2  3  65   M 04/18/1940 18:30:00   Y 4/19/1940  00:30:00         Y       Y
## 3  4  59   F 04/18/1940 18:30:00   Y 4/19/1940  00:30:00         Y       Y
## 4  6  63   F 04/18/1940 19:30:00   Y 4/18/1940  22:30:00         Y       Y
## 5  7  70   M 04/18/1940 19:30:00   Y 4/18/1940  22:30:00         Y       Y
## 6  8  40   F 04/18/1940 19:30:00   Y 4/19/1940  02:00:00         N       N
##   Mashed.Potatoes Cabbage.Salad Jello Rolls Brown.Bread Milk Coffee Water
## 1               Y             N     N     Y           N    N      Y     N
## 2               Y             Y     N     N           N    N      Y     N
## 3               N             N     N     N           N    N      Y     N
## 4               N             Y     Y     N           N    N      N     Y
## 5               Y             N     Y     Y           Y    N      Y     Y
## 6               N             N     N     N           N    N      N     N
##   Cakes Vanilla.Ice.Cream Chocolate.Ice.Cream Fruit.Salad
## 1     N                 Y                   N           N
## 2     N                 Y                   Y           N
## 3     Y                 Y                   Y           N
## 4     N                 Y                   N           N
## 5     N                 Y                   N           N
## 6     N                 Y                   Y           N

head(anthrax)

##   labid caseid   lab.date      site sample    test   result
## 1   101      1 10/19/2001     Blood  Serum     IgG Positive
## 2   102      2 10/12/2001      Skin Biopsy     IHC Positive
## 3   103      2 10/12/2001     Blood  Serum     IgG Positive
## 4   104      3 10/18/2001     Blood  Serum     IgG Positive
## 5   105      4 10/15/2001   Pleural Biopsy     IHC Positive
## 6   106      4 10/15/2001     Blood  Serum     IgG Positive

3. Demonstrate simple data manipulation (e.g., variable transformation, recoding, etc.);

oswego$Coffee <- as.character(oswego$Coffee)
oswego$Coffee[oswego$Coffee == "N"] <- "Abstainer"
oswego$Coffee[oswego$Coffee == "Y"] <- "Caffeinator"
oswego$Coffee

##  [1] "Caffeinator" "Caffeinator" "Caffeinator" "Abstainer"   "Caffeinator"
##  [6] "Abstainer"   "Abstainer"   "Abstainer"   "Abstainer"   "Caffeinator"
## [11] "Abstainer"   "Abstainer"   "Caffeinator" "Abstainer"   "Abstainer"  
## [16] "Abstainer"   "Abstainer"   "Abstainer"   "Caffeinator" "Caffeinator"
## [21] "Abstainer"   "Abstainer"   "Caffeinator" "Caffeinator" "Abstainer"  
## [26] "Caffeinator" "Abstainer"   "Caffeinator" "Caffeinator" "Abstainer"  
## [31] "Abstainer"   "Caffeinator" "Abstainer"   "Caffeinator" "Abstainer"  
## [36] "Caffeinator" "Abstainer"   "Abstainer"   "Caffeinator" "Abstainer"  
## [41] "Abstainer"   "Abstainer"   "Abstainer"   "Abstainer"   "Caffeinator"
## [46] "Caffeinator" "Abstainer"   "Abstainer"   "Abstainer"   "Caffeinator"
## [51] "Abstainer"   "Caffeinator" "Abstainer"   "Abstainer"   "Caffeinator"
## [56] "Caffeinator" "Abstainer"   "Caffeinator" "Caffeinator" "Caffeinator"
## [61] "Caffeinator" "Abstainer"   "Abstainer"   "Abstainer"   "Caffeinator"
## [66] "Abstainer"   "Abstainer"   "Abstainer"   "Caffeinator" "Abstainer"  
## [71] "Abstainer"   "Caffeinator" "Caffeinator" "Abstainer"   "Abstainer"

oswego$Coffee[oswego$Coffee == "Abstainer"] <- 0
oswego$Coffee[oswego$Coffee == "Caffeinator"] <- 1
oswego$Coffee

##  [1] "1" "1" "1" "0" "1" "0" "0" "0" "0" "1" "0" "0" "1" "0" "0" "0" "0"
## [18] "0" "1" "1" "0" "0" "1" "1" "0" "1" "0" "1" "1" "0" "0" "1" "0" "1"
## [35] "0" "1" "0" "0" "1" "0" "0" "0" "0" "0" "1" "1" "0" "0" "0" "1" "0"
## [52] "1" "0" "0" "1" "1" "0" "1" "1" "1" "1" "0" "0" "0" "1" "0" "0" "0"
## [69] "1" "0" "0" "1" "1" "0" "0"

4. Demonstrate the use of calendar and Julian dates;

In a study of posttraumatic stress, I am interested in calcuating the number of days since September 11, 2001.

calendar <- c("2/2/2002", "5/8/2002", "9/2/2002", "4/3/2002", "8/29/2002", "10/31/2001")
calendar

## [1] "2/2/2002"   "5/8/2002"   "9/2/2002"   "4/3/2002"   "8/29/2002" 
## [6] "10/31/2001"

jdate <- as.Date(calendar, format = "%m/%d/%Y")
jdate

## [1] "2002-02-02" "2002-05-08" "2002-09-02" "2002-04-03" "2002-08-29"
## [6] "2001-10-31"

sept11 <- as.Date("2001-11-09")
posttrauma <- (jdate - sept11)
posttrauma

## Time differences in days
## [1]  85 180 297 145 293  -9

5. Conduct a simple analysis using existing functions (from R, colleagues, etc.);

group1 <- c(8, 11, 6, 15, 8, 6, 2, 12, 19, 9, 10, 7)
group1

##  [1]  8 11  6 15  8  6  2 12 19  9 10  7

group2 <- c(19, 10, 22, 25, 9, 15, 18, 16, 27)
group2

## [1] 19 10 22 25  9 15 18 16 27

t.test(group1, group2)

## 
##  Welch Two Sample t-test
## 
## data:  group1 and group2
## t = -3.486, df = 13.98, p-value = 0.003644
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -13.686  -3.258
## sample estimates:
## mean of x mean of y 
##     9.417    17.889

6. Conduct a simple analyis demonstrating simple programming (e.g., a 'for' loop);

I have a t-value, and I would like to conduct bootstrapping with replacement to determine the sample of 10,000 t-values that I would later use to calculate a bootstrapped confidence interval.

group1 <- c(8, 11, 6, 15, 8, 6, 2, 12, 19, 9, 10, 7)
group2 <- c(19, 10, 22, 25, 9, 15, 18, 16, 27)

S = 10000
t.values = numeric(S)
for (i in 1:S) {
    g1 = sample(group1, size = 12, replace = T)
    g2 = sample(group2, size = 9, replace = T)
    t.values[i] = t.test(g1, g2)$statistic
}
head(t.values)

## [1] -5.360 -4.302 -4.675 -1.926 -3.911 -1.299

7. Conduct a simple analysis demonstrating an original function created by student.

It want a function to convert Celcius to Farenheit when I am watching the weather and practicing R while traveling abroad some day.

tempconv <- function(x) {
    temperature <- x * 9/5 + 32
    return(temperature)
}

tempconv(0)

## [1] 32

tempconv(37.7778)

## [1] 100

8. Create a simple graph with title axes labels, and legend;

I am interested in plotting HIV Testing Latency (i.e., length time to HIV testing) as a function of Spiritual Beliefs among African American MSM.

setwd("/Users/wvincent/Desktop")
mp <- read.csv("mp.csv", header = TRUE)

plot(mp$hivlong ~ mp$spirbel, main = "HIV Testing Latency vs Spiritual Beliefs Among AA MSM", 
    xlab = "HIV Testing Latency", ylab = "Spiritual Beliefs")
legend(5, 23, c("Adj. R^2 = 0.02", "b = -0.19", "SE = 0.11", "CI.95 (-1.67, 0.10)"))
b <- lm(mp$hivlong ~ mp$spirbel)
abline(b)

plot of chunk unnamed-chunk-8

summary(b)

## 
## Call:
## lm(formula = mp$hivlong ~ mp$spirbel)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -6.17  -3.25  -1.57   1.68  19.33 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    8.806      2.199    4.00  0.00012 ***
## mp$spirbel    -0.188      0.113   -1.67  0.09892 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.28 on 98 degrees of freedom
## Multiple R-squared:  0.0275, Adjusted R-squared:  0.0176 
## F-statistic: 2.78 on 1 and 98 DF,  p-value: 0.0989

9. Demonstrate the use of a regular expression;

I am only intrested in cases of infection with one class of exposure to HIV: infection from birth (i.e., “pediatric cases” not resulting from blood transfusions) that might still exist.

exposure <- c("IDU", "RUAI", "RUVI", "BTRAN", "PED", "MEDJOB", "ASSAULT", "IDU", 
    "RUAI", "RUVI", "BTRAN", "PED", "MEDJOB", "ASSAULT", "IDU", "RUAI", "RUVI", 
    "BTRAN", "PED", "MEDJOB", "ASSAULT", "IDU", "RUAI", "RUVI", "BTRAN", "PED", 
    "MEDJOB", "ASSAULT")
grep("^P", exposure)

## [1]  5 12 19 26

10. Demonstrate the use of the sink function to generate an output file;

sink("Sink Demo.txt")
print("This problem is 'sinking' down to the desktop.")

## [1] "This problem is 'sinking' down to the desktop."

calendar <- c("2/2/2002", "5/8/2002", "9/2/2002", "4/3/2002", "8/29/2002", "10/31/2001")
calendar

## [1] "2/2/2002"   "5/8/2002"   "9/2/2002"   "4/3/2002"   "8/29/2002" 
## [6] "10/31/2001"

jdate <- as.Date(calendar, format = "%m/%d/%Y")
jdate

## [1] "2002-02-02" "2002-05-08" "2002-09-02" "2002-04-03" "2002-08-29"
## [6] "2001-10-31"

sept11 <- as.Date("2001-09-11")
posttrauma <- (jdate - sept11)
posttrauma

## Time differences in days
## [1] 144 239 356 204 352  50

sink()