WPA #3: Chapter 6 – Matrices and Dataframes

A drug study

You’re a lab assistant for a multi-billion dollar drug company called Novartirosche. The company has just developed a new cognitive performance enhancing drug called drug.x that it expects will revolutionize the industry. To test the performance of the drug, the company recruited 1,000 participants to perform one of two cognitive tasks after having taken drug.x or a placebo sugar pill. Participants assigned to the ‘wordsearch’ task have to find 100 words in a jumbled list as fast as possible. Participants assigned to the ‘animals task’ have to name 20 different animals as quickly as possible. For each task, the a lab assistant recorded how long it took each participant, in seconds, to complete their assigned task. The results are stored in a tab-delimited text file at https://dl.dropboxusercontent.com/u/7618380/drug.txt

You should probably get access to the dataset, load the data into R as a new dataframe object by running the following code.

drug <- read.table("https://dl.dropboxusercontent.com/u/7618380/drug.txt")

Ok, let’s make sure it looks ok. Print the first few rows of the dataframe into the console using numerical indexing (that is, brackets!).
Hmmm, let’s double-check the first few rows with a function. Print the first few rows of the dataframe into the console using a function (if you use your head you should be able to remember the name of the function…).
Someone left me a note saying that there could be problems in rows 50 through 60. Print rows 50 through 60 into the console and make sure they look ok.
Better take a quick look at the whole dataset. View the entire dataframe in a new window using View()
Print summary statistics of each column using summary()
I’m superstitious about the numbers of pi…What was the data from the patient with id 314?
What are the names of the columns of the dataframe? Use the appropriate function.
One of the column names has some unnecessary numbers. Fix the name.
A colleague requested just the response time data. She also wants it in minutes instead of seconds. Create a new vector object called time.m that only contains the time data in minutes.
Let’s check some of the effects of sex. Create two separate dataframes for male and female participants. Call them drug.male and drug.female.
Let’s look just at the females: What percent of the female participants were given the placebo? What was the mean response time of females?
Now let’s focus on the males: What percent of the male participants were given the placebo? What was the mean response time of males?
Let’s look at some of the age related data. Create a new dataframe called drug.oldest containing only the data from the oldest patient(s), and a new dataframe called drug.youngest that only contains data from the youngest participants. To do this, use basic indexing in addition to the max() and min() functions.
Show me a table of the age data. In other words, how many participants were there of each age?

In the next question, you need to change specific values of a vector based on some criteria. We leaned how to do this in Chapter 5. If you forgot how, here’s an example:

a <- c(1, 1, 1, 1, 2, 2, 2, 2)
a == 2

## [1] FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

a[a == 2] <- 10
a

## [1]  1  1  1  1 10 10 10 10

Oh no. Some of the patients may have been too young to participate in a medical study…Please um…help them grow up. Change the age of anyone less than 18 to 18.
Ok. Now that you’ve ‘fixed’ the dataframe, show me a new table of the age data. No one should be less than 18 now.
Hey I just found some new data from the study. The participants’ height and weight is stored in a separate table. Run the following code to store the data as a new dataframe called heightweight. Look at the first few rows of the dataframe to see how it looks.

heightweight <- read.table("https://dl.dropboxusercontent.com/u/7618380/moredata.txt", sep = "\t")

Now, combine the two dataframes drug and heightweight into a single dataframe using either data.frame() or cbind()
Ok now let’s start analyzing the response time data. Of course, we want to see if drug.x improved people’s performance….Calculate the mean response time separately for the placebo and drug.x pills (ignoring the type of task).
Did drug.x help performance by reducing response times relative to the placebo? You can answer with words.
Just for fun, let’s see which task was harder. Calculate the mean response time for each task (ignoring the drug)
Based on what you found, which task was harder?
Ok, now calculate the mean response time, for each combination of pills and task. That is, calculate the mean response time for placebo in the animal task, drug.x in the animal task, and placebo in the wordsearch task, and drug.x in the wordsearch task.
Based on what you know now, did drug.x help performance by reducing response times?
I think there may have been a problem with the study design. Use the table() function to create a classification table showing how many people were assigned to each pill and each task. To do this, just put the drug and task columns as two separate arguments to table()
What went wrong with this study? Why did the apparent performance of drug.x versus the placebo change from question 21 to question 25?

WPA #3: Chapter 6 – Matrices and Dataframes

Basel Spring 2016

A drug study