Project 1 Instructions

  1. Run the program file (filename1.r) using the ‘source’ command

  2. Demonstrate reading an ASCII data file (filename2.dat) to create a ‘data frame’.

  3. Demonstrate simple data manipulation (e.g., variable transformation, recoding, etc.)

  4. Demonstrate the use of calendar and Julian dates.

  5. Conduct a simple analysis using existing functions (from R, colleagues, etc.)

  6. Conduct a simple analysis demonstrating simple programming (e.g., a ‘for’ loop)

  7. Conduct a simple analysis demonstrating an original function created by student

  8. Create a simple graph with title, axes labels and legend, and output to file

  9. Demonstrate the use of regular expressions

  10. Demonstrate the use of the ‘sink’ function to generate an output file

1. Running the program file

First, I set my working directory to a folder I named “Project 1”, then I use the “source” command to run a program file we obtained from Professor Samuel’s guest lecture.

setwd("/Users/tiffamee/Desktop/FALL 2017/CLASSES/PH 251D/Project 1")

source('colorSquare.R')

2. Demonstrate reading an ASCII data file to create a ‘data frame’

One of the ASCII data files provided in this course is “aids.txt”. I read the file in order to create a “data frame”.

dataframe <- read.table('~/Desktop/FALL 2017/CLASSES/PH 251D/Project 1/aids.txt', sep = '', header = TRUE, na.strings = '.') 

dataframe           #Displaying the dataframe
##     cases year
## 1      NA 1980
## 2      NA 1981
## 3      NA 1982
## 4      NA 1983
## 5    4445 1984
## 6    8249 1985
## 7   12932 1986
## 8   21070 1987
## 9   31001 1988
## 10  33722 1989
## 11  41595 1990
## 12  43672 1991
## 13  45472 1992
## 14 103691 1993
## 15  78279 1994
## 16  71547 1995
## 17  66885 1996
## 18  58492 1997
## 19  46521 1998
## 20  45104 1999
## 21  40758 2000
## 22  41868 2001
## 23  42745 2002
## 24  44232 2003

3. Demonstrate simple data manipulation

In Chapter 1, we were given a set of code that incorrectly converts inches to centimeters, as follows: inches <- 1:12 centimeters <- inches/2.54 cbind(inches, centimeters)

To demonstrate simple data manipulation, I correct the R code and make a table:

Conversion Table
inches centimeters
1 2.54
2 5.08
3 7.62
4 10.16
5 12.70
6 15.24
7 17.78
8 20.32
9 22.86
10 25.40
11 27.94
12 30.48

4. Demonstrate the use of calendar and Julian dates

There are calendar dates (e.g. 1/17/1995) and there are Julian dates (the numeric information is the number of days since January 1, 1970).

Here, I use the as.Date function.

#Me and my two brothers' birthdays, in a character vector 
bdays <- c("1/17/1995", "4/2/1997", "10/30/2003")
bdays
## [1] "1/17/1995"  "4/2/1997"   "10/30/2003"
#I convert the bdays to Julian dates, in form of a numeric vector of class data
bdays.julian <- as.Date(bdays, format = "%m/%d/%Y")
## Warning in strptime(x, format, tz = "GMT"): unknown timezone 'zone/tz/
## 2017c.1.0/zoneinfo/America/Los_Angeles'
bdays.julian
## [1] "1995-01-17" "1997-04-02" "2003-10-30"
#I can display the Julian dates
as.numeric(bdays.julian)
## [1]  9147  9953 12355
#I can create a data frame 
bd <- data.frame(Birthday = bdays, Standard = bdays.julian, 
                 Julian = as.numeric(bdays.julian))
bd
##     Birthday   Standard Julian
## 1  1/17/1995 1995-01-17   9147
## 2   4/2/1997 1997-04-02   9953
## 3 10/30/2003 2003-10-30  12355

R also allows me to calculate the age by using today’s date.

date.today <- Sys.Date()
date.today
## [1] "2017-11-27"
age <- (date.today - bdays.julian)
age
## Time differences in days
## [1] 8350 7544 5142

5. Conduct a simple analysis using existing functions

An existing function I’ll use is the “is.na” function which generates a logical vector that can identify which positions contain or do not contain NAs.

For example, in an optional question in a survey, people may leave an answer or choose not to answer.

I want to find the NAs/ missing values.

possible_answers <- c("Facebook", "Twitter", "Instagram", "Word of Mouth", NA)
results <- sample(possible_answers, 20, replace = TRUE); results
##  [1] "Word of Mouth" NA              "Word of Mouth" "Word of Mouth"
##  [5] "Instagram"     "Instagram"     "Facebook"      "Facebook"     
##  [9] "Twitter"       "Instagram"     "Facebook"      NA             
## [13] "Twitter"       NA              "Twitter"       "Word of Mouth"
## [17] "Facebook"      "Word of Mouth" "Instagram"     NA
sum(is.na(results)) #number of NAs 
## [1] 4

6. Conduct a simple analysis demonstrating simple programming

For a simple analysis, I will make a loop to see the results of picking “truth” or “dare” between 5 friends.

truth_or_dare <- c("truth", "dare")

index <- 1:5 # create vector of sequential integers
picked <- list() #empty vector to collect results

for (i in index){
  picking <- sample(truth_or_dare, 1)
  picked <- append(picked, picking)
}

picked
## [[1]]
## [1] "dare"
## 
## [[2]]
## [1] "dare"
## 
## [[3]]
## [1] "dare"
## 
## [[4]]
## [1] "dare"
## 
## [[5]]
## [1] "truth"

7. Conduct a simple analysis demonstrating an original function created by student

My original function will allow one to count down how many days it is until a certain day, from today. The target date will be entered in MM, DD, YY format.

how_long_until <- function(desired_date){
  today <- Sys.Date()
  desired_date.julian <- as.Date(desired_date, origin="1970-01-01", format = "%m/%d/%Y")
  countdown <- (desired_date.julian - today)
  countdown
}

how_long_until("1/17/2018")
## Time difference of 51 days

8. Create a simple graph with title, axes labels, and legend, and output to file

Here, I create a simple graph:

x <- 1:10
y <- 11:20

mygraph <- plot(x,y, main = "Not Creative Enough To Come Up With A Title", xlab = "X-axis", ylab = "Y-axis")

legend(x=8,y=16,legend=c("Red","Blue"),fil=c(2,4), title = "Legend")

Then, I output it to a file using the capture.output function:

capture.output({
mygraph # display x
}, file = '~/Desktop/FALL 2017/CLASSES/PH 251D/Project 1/Output.R')

9. Demonstrate the use of regular expressions

As 4.11 in the textbook states, “a regular expression is a special text string for describing a search pattern which can be used for searching text strings, indexing data objects, and replacing object elements.”

Here is everyone in PH251D (I pulled the names from BCourses) in a CSV file.

PH251D_names <- read.csv('~/Desktop/FALL 2017/CLASSES/PH 251D/Project 1/ph251d_names.csv')
just_names <- PH251D_names[,1]
just_names
##  [1] Emily Borgelt   Victoria Chu    Daniel Collin   Gerardo Cruz   
##  [5] Xing Gao        Michael Huynh   Amanda Keller   Jin Kweon      
##  [9] Juyeon Lee      Matthew Lee     Tomas Leon      Tiffany Ma     
## [13] Laura Magana    Whitney Mgbara  Lindsey Moore   IEMAAN RANA    
## [17] Emily Schneider Tsai-Chu Yeh    Xianglin Zhao                  
## 20 Levels:  Amanda Keller Daniel Collin Emily Borgelt ... Xing Gao

Using a regular function “grep” I want to see which vowel is the most common amongst our class.

a = length(grep("a", just_names, value = TRUE)); a
## [1] 13
e = length(grep("e", just_names, value = TRUE)); e
## [1] 13
i = length(grep("i", just_names, value = TRUE)); i
## [1] 12
o = length(grep("o", just_names, value = TRUE)); o
## [1] 10
u = length(grep("u", just_names, value = TRUE)); u
## [1] 6

We see that “a” and “e” are the most common vowels in the names in our class.

10. Demonstrate the use of the “sink” function to generate an output file

For example, I pick a number from 1-5 using the “sample” function. I collect that result using the “sink” function.

my_number <- sample(1:5)
sink('~/Desktop/FALL 2017/CLASSES/PH 251D/Project 1/Output.R') # open connection
cat('Results from picking a number from 1-5', fill = TRUE)
## Results from picking a number from 1-5
show(my_number)
## [1] 1 2 5 4 3
sink() # close connection