Psyc210a Practice 02, due by 10:30pm, Wednesday 9/14

Practice 2 is due by 10:30pm, Wednesday 9/14. Late submission will get zero credit.

If you use this .Rmd template file to do this practice, make sure to submit two documents: 1) Psc210a_PR02_yourInitials.Rmd; 2) Psc210a_PR02_yourInitials.html (this .html file should be generated from your .Rmd file). Before you submit your work, replace ‘yourInitials’ by your own initials in the file name.

Write R codes to set your working directory to ‘Psy210a_F2022’. Make sure to include appropriate path. Remember to write relevant comments for the R codes.

##set working directory 
 ##make sure to update your working directory 
#setwd('~/Desktop/Psy210a_F2022')
# I use Rcloud so setwd doesn't work because I have to import data,but Im including as comment bc would be needed for desktop version

The data in ‘rtime.csv’ has two variables:

group: a categorical variable with two categories: ‘experiment’ and ‘control’;
rt: reaction time in ‘millisecond’, continuous scale.

[total 10 points] Using R function ‘read.csv()’ to read the data ‘rtime.csv’ into R and store it in object ‘pr02’.

##read data into R
pr02<-read.csv('rtime.csv') #to read my imported data and name it pr02#

[5 points] As an example, in the following R code chunck, function subset() is used to select observations with group=‘control’ only and store the observations in object ‘cont’. Then generate an object ‘cont.des’ to hold and show the descriptive stats.

##select observations with group='control' only

cont <- subset(pr02, group=='control')

cont.des <- c(n=nrow(cont), mean=mean(cont$rt), 
               sd=sd(cont$rt), min=min(cont$rt), max=max(cont$rt),cv=sd(cont$rt)/mean(cont$rt))

##show descriptive, round to 2 digits
round(cont.des, 2)

##      n   mean     sd    min    max     cv 
##  50.00 129.00  17.13  83.00 164.22   0.13

##Question 1####
read.csv('rtime.csv') #to read my imported data#

##          group         rt
## 1   experiment 103.772389
## 2   experiment 138.074715
## 3      control 127.753777
## 4   experiment 113.960675
## 5      control 111.662759
## 6      control 153.061610
## 7   experiment 107.134311
## 8      control 128.985576
## 9   experiment 118.179179
## 10  experiment 132.860933
## 11     control  97.000000
## 12  experiment 112.094877
## 13     control 133.045294
## 14     control 138.567051
## 15  experiment  78.824463
## 16  experiment 106.856262
## 17     control 137.881732
## 18  experiment 107.161529
## 19     control 130.012023
## 20  experiment 106.652455
## 21     control 159.893706
## 22  experiment 146.614467
## 23     control 121.324802
## 24  experiment  87.477850
## 25  experiment 112.094091
## 26     control 111.254522
## 27     control 133.198170
## 28     control 124.888634
## 29  experiment 111.042746
## 30     control 121.104194
## 31     control 124.898667
## 32  experiment 104.465747
## 33     control 104.835280
## 34     control 120.233280
## 35     control 120.831727
## 36  experiment 119.370652
## 37  experiment 114.620203
## 38     control 133.524011
## 39  experiment  97.688725
## 40  experiment 132.518803
## 41  experiment 110.189464
## 42     control  95.981623
## 43     control 130.289709
## 44  experiment 118.531949
## 45     control 119.031591
## 46  experiment 115.038100
## 47     control 112.466425
## 48  experiment 102.293473
## 49  experiment 140.322686
## 50  experiment 132.660696
## 51  experiment 112.166329
## 52     control 122.080301
## 53  experiment 108.946514
## 54     control 161.095932
## 55     control 156.322597
## 56     control 134.935964
## 57     control 134.101179
## 58  experiment 112.417228
## 59  experiment 105.401940
## 60     control 147.153740
## 61  experiment 139.212667
## 62  experiment 159.348105
## 63  experiment 106.106822
## 64  experiment 108.827368
## 65  experiment 123.149160
## 66     control 143.332356
## 67     control 116.042966
## 68  experiment  96.274351
## 69     control  83.000000
## 70  experiment 128.735002
## 71  experiment  99.128203
## 72  experiment 140.996713
## 73  experiment  91.081581
## 74  experiment  96.664018
## 75  experiment 102.687982
## 76  experiment 116.983509
## 77     control 164.221623
## 78     control 146.718482
## 79     control 127.270475
## 80     control 120.865165
## 81     control 131.145042
## 82  experiment 110.908196
## 83     control 135.911766
## 84     control 153.045652
## 85     control 132.607119
## 86     control 137.442177
## 87  experiment 151.085375
## 88     control 117.206658
## 89     control 119.706235
## 90  experiment 161.500000
## 91  experiment 110.967088
## 92     control 109.790134
## 93  experiment 113.259556
## 94  experiment 145.616720
## 95     control 152.742869
## 96     control 111.038067
## 97     control 133.023521
## 98     control 128.682248
## 99  experiment 139.991105
## 100    control 138.577051

pr02<-read.csv('rtime.csv') #To rename my imported data#
#?head(x) #To help me remember if head is the function I want to preview data#
#head(pr02$rt) #Preview reaction time, a continuous variable#
#head(pr02$group) #preview group, the categorical variable of data#
cont <- subset(pr02, group=='control') #name observations with group='control' as cont
cont.des <- c(n=nrow(cont), mean=mean(cont$rt), sd=sd(cont$rt),
              min=min(cont$rt), max=max(cont$rt), 'cv'=sd(cont$rt)/mean(cont$rt)) #define function "des" of group "cont" to give combined descriptive stats of control group
round(cont.des, 2)  #round descriptive stats to 2 decimal places

##      n   mean     sd    min    max     cv 
##  50.00 129.00  17.13  83.00 164.22   0.13

Now select observations with group=‘experiment’ only and store the observations in object ‘exp’. Then generate an object ‘exp.des’ to hold and show the descriptive stats.

##R codes for 1 a)
exp <- subset(pr02, group=='experiment')
#head(exp) #preview exp group data
exp.des <- c(n=nrow(exp), mean=mean(exp$rt), sd=sd(exp$rt),
         min=min(exp$rt), max=max(exp$rt),'cv'=sd(exp$rt)/mean(exp$rt)) #define function exp.des to give all combined descriptive stats
exp.des #give exact descriptive stats

##            n         mean           sd          min          max           cv 
##  50.00000000 117.03913944  18.34955421  78.82446256 161.50000000   0.15678135

round(exp.des, 2) #round descriptive stats to 2 decimal places

##      n   mean     sd    min    max     cv 
##  50.00 117.04  18.35  78.82 161.50   0.16

Study the following R codes which use function rbind() to combine the descriptive stats for each group together in one summary table.

#this is for your own study, you don't need to write any codes here
#rbind(control=cont.des, experiment=exp.des)

rbind(Control=cont.des, Experiment=exp.des) #combine descriptive stats for control and experimental, exact values

##             n      mean        sd       min       max         cv
## Control    50 128.99571 17.130613 83.000000 164.22162 0.13279987
## Experiment 50 117.03914 18.349554 78.824463 161.50000 0.15678135

round(rbind(Control=cont.des, Experiment=exp.des),2) #combine descriptive stats and round to 2 decimal places

##             n   mean    sd   min    max   cv
## Control    50 129.00 17.13 83.00 164.22 0.13
## Experiment 50 117.04 18.35 78.82 161.50 0.16

[5 points] Present side-by-side boxplots of ‘rt’ by ‘group (i.e. one boxplot for gender=’control’, one for group=‘experiment’). Describe what the boxplots reveal.

## R codes for 1 c) 

boxplot(pr02$rt ~ pr02$group,
        horizontal = FALSE,
        main= "Boxplot of reaction time distribution",
        ylab='Reaction time in MS',
        xlab='Group',
        frame.plot=TRUE)

#Together with the descriptive statistics (rounded to 3 decimal places), the boxplots show that reaction times in the experimental group were slightly faster on average (experimental mean= 117.039, control mean=128.995-this is shown by the black lines in the boxes of the boxplot) and the overall range of reaction times was also faster (minimum reaction time experimental= 78.824, minimum rt control=83; maximum rt experimental 161.5, maximum rt control=164.221). The range of the control group (max-min=81.221) was smaller than the range of the experimental group (max-min=82.676) (as shown by the different starting points of the whiskers of the boxplots). The sd of the control group was slightly smaller than that of the experimental group (control sd=17.13, experimental sd=18.35),indicating the expected distance of one rt score from the mean is 17.13MS from the mean for the control group, and 18.349MS from the mean for the experimental group. The CV of the experimental group was .16, while the cv of the control group was .13, meaning the level of dispersion of data points around the mean was overall slightly smaller in the control group than in the experimental group (this is shown by the smaller box on the boxplot on the control side vs the experiment side)

(5 points) Propose a research context where testing the differences in variation is of great interest.

your answer here:
##Question 2#### #Testing the differences in variance could be important when comparing the heart rate variability of a control group vs. an experimental group exposed to different treatment interventions (e.g. yoga in an experimental group vs. sedentary education in the control), or when comparing the precision of new at-home medical equipment options (e.g. does machine A or machine B measure blood glucose more accurately on average?).

(total 25 points) Suppose we want to study the errors found in the performance of a simple task. We ask a large number of participants to report the number of people seen entering a major department store in one morning. Some participants will miss some people, and some will count others twice, so we don’t expect everyone to get the same counts. Suppose that the distribution of the counts follows a normal distribution with the mean number of shoppers reported 975 and a standard deviation of 15.

For the following questions, think about the functions related to normal distribution: dnorm(), pnorm(), qnorm(), and rnorm().

[5 points] Present a graph to show the normal distribution curve with mean=975 and sd=15, with appropriate title.

##R codes for 3 a)

##Question 3####
###Q3A####
####Method 1 with histogram and randomly generated data####
set.seed(19)
Qthree.sample <- rnorm(n=100, mean = 975, sd=15)
hist(Qthree.sample, prob=TRUE,
     main='Q3:The distribution of observations of people entering a department store',
     xlab='Number of people observed', ylab='Density')
curve(dnorm(x,mean = 975, sd = 15), 
      from=min(Qthree.sample), to=max(Qthree.sample),
      col='purple', add=TRUE)
text(x=1010, y=.018,
labels=paste('sample mean=', round(mean(Qthree.sample),0),'\nsample sd= ', round(sd(Qthree.sample),0)))

####Method 2 with just a curve####
curve(dnorm(x,mean = 975, sd = 15), 
      from=940, to=1020,
      col='purple', add=TRUE,
      xlab='Number of people observed',
      ylab='Density',
      main='Q3: The distribution of observations of people entering a department store',)

[2.5 points] What percentage of the counts will be between 963 and 989?

##R codes for 3 b), 
#you may include the answer to the question in the R code chunk as comment   

###Q3B####
#the probability of x<=989 in a normal dist#
cp989 <- pnorm(q=989, mean=975, sd=15)
#The probability of x<=963 in a normal dist
cp963 <- pnorm(q=963, mean=975, sd=15)

paste('The probability of x<=989 is', round(cp989,3))

## [1] "The probability of x<=989 is 0.825"

#[1] "The probability of x<=989 is 0.825"#

paste('The probability of x<=963 is', round(cp963,3))

## [1] "The probability of x<=963 is 0.212"

#[1] "The probability of x<=963 is 0.212"#

paste('The probability of x between 963 and 989 is', round(cp989-cp963, 3))

## [1] "The probability of x between 963 and 989 is 0.613"

##[1] "The probability of x between 963 and 989 is 0.613"

[2.5 points] What percentage of the counts will be above 989?

##R codes for 3 c), 
#you may include the answer to the question in the R code chunk as comment   

###Q3C####
####Method 1 to calculate probability of x above 989####
#Calculate and round maximum value
Max.value <- max(Qthree.sample)
round(Max.value,0)

## [1] 1018

# [1] 1018
cpMax <- round(pnorm(q=1018, mean=975, sd=15),3)
#[1] 0.998
#The probability of x<1018 is 0.998
#paste ('The probability of x<-1018 is 0.998')

#Calculating percentage of x up to 989
round(cpMax-cp989,3)

## [1] 0.173

#[1] 0.175
paste('The probability of x above 989 is 0.175')

## [1] "The probability of x above 989 is 0.175"

####Method2 to calculate probability of x above 989 and rounded to 3 decimal places (method we learned in class)####
round(pnorm(q=989, mean=975, sd=15, lower.tail = FALSE), 3)

## [1] 0.175

paste(' The probability of x above 989 is 0.175')

## [1] " The probability of x above 989 is 0.175"

[2.5 points] What two values of the counts would encompass the middle 50% of the results?

##R codes for 3 d), 
#you may include the answer to the question in the R code chunk as comment   
###Q3d####
#The two values that would encompass the middle 50% of the results would include the values located at quartile 25 and quartile 75
#The quartile at 25% (25th percentile)
qnorm(p=.25, mean=975, sd=15)

## [1] 964.88265

#[1] 964.8827

#The quartile at 75% (75th percentile)
qnorm(p=.75, mean=975, sd=15)

## [1] 985.11735

#[1] 985.1173

paste('The quartile value at the 25th percentile is 964.8827 and at 75th is 985.1173. The values between these points encompass the middle 50% of the results')

## [1] "The quartile value at the 25th percentile is 964.8827 and at 75th is 985.1173. The values between these points encompass the middle 50% of the results"

[2.5 points] the top 20% of the counts would be greater than_________.

##R codes for 3 e), 
#you may include the answer to the question in the R code chunk as comment   

qnorm(p=.8,mean=975,sd=15)

## [1] 987.62432

#[1] 987.6243
paste('Since the quantile value at the 80th percentile is 987.6243 the top 20 percent of observations would be greater than this value')

## [1] "Since the quantile value at the 80th percentile is 987.6243 the top 20 percent of observations would be greater than this value"

[10 points] Generate a sample with 100 observations (of the counts) from this normal distribution (that is, the normal distribution with mean 975 and standard deviation 15) and store it in object ‘my.sample’. Then report the mean and sd of this sample, and present a histogram of ‘my.sample’, together with normal curve and kernel density curve. Is the sample distribution approxmiately normal?

##R codes for 3 f), 
#you may include the answer to the question in the R code chunk as comment   

###Q3F####
set.seed(38)
my.sample <- rnorm(n=100, mean=975, sd=15) 
hist(my.sample, prob=TRUE,
     main='Q4:The distribution of observations of people entering a department store',
     xlab='Number of people observed', ylab='Density')
#Normal curve
curve(dnorm(x,mean = 975, sd = 15), 
      from=min(my.sample), to=max(my.sample),
      col='red', add=TRUE)
text(x=1010, y=.018,
     labels=paste('sample mean=', round(mean(my.sample),0),'\nsample sd= ', round(sd(my.sample),0)))
#Kernel dist curve
lines(density(my.sample), col='blue')

#Yes the distribution is approximately normal in this sample as demonstrated by the sort of bell shape of the normal curve and kernel density curve around the mean
#I also ran a QQplot below to double check and the clustering of most data points around the line of reference suggest it is normal

#QQPLOT
#qqnorm(mysample) #I used qqnorm to double check if the distribution is approximately normal and then disabled this plot as a comment so the histogram can still run without error

##I used function qqline to add a line of reference to my  QQ plot 
#qqline(mysample, col='purple')

Now click ‘knit’ tab and select ‘knit to html’, an html file will be generated in your working folder. You should submit both .rmd file and .html file for this practice.

Psyc210a Practice 02, due by 10:30pm, Wednesday 9/14

Caley M Mikesell