Load packages that you will likely use.

library(statsr)

## Loading required package: BayesFactor

## Loading required package: coda

## Loading required package: Matrix

## ************
## Welcome to BayesFactor 0.9.12-4.2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
## 
## Type BFManual() to open the manual.
## ************

#library(plyr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(rmarkdown)
library(devtools)

## Loading required package: usethis

library(broom)
library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(shiny)
library("readr")

Load Data

Data <- read.csv("Data.csv")

Learning about the data set

names (Data)

##  [1] "P.."                  "Age"                  "Gender"              
##  [4] "Major"                "Minor"                "Pre.Med"             
##  [7] "Experimental.Control" "PANAS.T.1"            "Validity.1"          
## [10] "Validity.2"           "Validity.3"           "Validity.4"          
## [13] "PANAS.T.2"            "Language"             "Delivery"            
## [16] "Note"

str(Data)

## 'data.frame':    95 obs. of  16 variables:
##  $ P..                 : Factor w/ 95 levels "1","10","100",..: 1 13 24 35 46 57 68 79 89 2 ...
##  $ Age                 : int  21 18 20 19 18 18 18 18 18 18 ...
##  $ Gender              : Factor w/ 2 levels "F","M": 2 2 1 2 1 2 1 1 2 1 ...
##  $ Major               : Factor w/ 38 levels "Accounting","Accouting & Business",..: 6 4 4 37 12 31 14 17 38 6 ...
##  $ Minor               : Factor w/ 20 levels "Accounting","Business",..: 9 11 11 11 11 11 18 11 11 11 ...
##  $ Pre.Med             : Factor w/ 3 levels "N","N Ck/Maj",..: 1 1 3 1 1 1 1 1 1 3 ...
##  $ Experimental.Control: Factor w/ 7 levels "Control/Face",..: 3 7 5 2 3 4 7 1 3 5 ...
##  $ PANAS.T.1           : Factor w/ 76 levels "P 12   N 11",..: 70 45 50 24 68 39 27 4 11 59 ...
##  $ Validity.1          : int  5 1 4 4 3 2 4 5 5 4 ...
##  $ Validity.2          : int  2 1 1 2 2 2 2 2 2 2 ...
##  $ Validity.3          : int  1 2 2 2 2 2 2 1 2 2 ...
##  $ Validity.4          : int  1 2 2 2 1 2 1 2 1 2 ...
##  $ PANAS.T.2           : Factor w/ 77 levels "P 13   N 16",..: 69 38 46 41 37 44 28 2 21 64 ...
##  $ Language            : Factor w/ 2 levels "Jargon","Plain": 2 1 1 2 2 1 1 2 2 1 ...
##  $ Delivery            : Factor w/ 3 levels "Face","Phone",..: 3 3 2 2 3 1 3 1 3 2 ...
##  $ Note                : int  0 0 0 0 0 0 0 0 0 0 ...

Data$Language %>% unique()

## [1] Plain  Jargon
## Levels: Jargon Plain

Data$Delivery %>% unique()

## [1] Written Phone   Face   
## Levels: Face Phone Written

Data$Validity.1 %>% unique()

## [1] 5 1 4 3 2

Data$Validity.2 %>% unique()

## [1] 2 1

You should note that I added two grouping variables: Language (Plain vs Jargon) and Delivery (Phone vs Face vs Written). This makes your data set more flexible for exploring the effect of language independently from the effect of Delivery in addition to how these two combined might affect your outcome measures.

It is notable also that out of the two variables you mentioned in the email, one seems to be categorical, am I right? Validate: (yes=1/no=2)? I will make this a character instead of a numerical variable. But we need to make sure my interpretation is correct. We may need to switch things around.

Validity2_mod <- c()

for (i in 1:length(Data$Validity.2)) {
  if (Data$Validity.2[i]==1) {
    Validity2_mod[i] <- "no"
  } else if (Data$Validity.2[i]==2){
    Validity2_mod[i] <- "yes"
  } 
}
  assign ("Validity2_mod",Validity2_mod,.GlobalEnv)


## add the vector time_period to the data set and make it a factor (meaning a variable you can use for grouping purposes!)

Data_2 <- data.frame (Data, Validity2_mod)
# check out Data_combined2 to see the factor.

Data_2$Validity2_mod <- as.character(levels(Data_2$Validity2_mod)[Data_2$Validity2_mod])

# You can treat Validity2_mod like we treated the variable "promotion decision" in class 4. The RMD from that class will be helpful in that sense.

To clean things up, I will extract from the data set, the variables that you indicated will be of interest to your Project. I will put those there and other descriptors of the sample that might be of interest to explore!

Data_final <- data.frame(Data_2$Age, Data_2$Gender, Data_2$Language, Data_2$Delivery, Data_2$Validity.1, Data_2$Validity.2, Data_2$Validity2_mod)

names(Data_final) <- c("Age", "Gender", "Language", "Delivery", "Understanding", "Validation_num","Validation_ch")

So, I gave you a head start by creating a data frame for you to use in the Project: Data_final.

I indicated above that for the categorical variable (Validation), you should think about working with contingency table to see whether validation correlates with other factors in your data set (e.g., with Language and/or with Delivery method).

Let’s assume your second variable (Understanding) can be treated as numeric (1 - 5; likert scaling?) To work with this variable, you should follow the same type of strategies I used to tell the story about Life Expectancy in Class 3. The RMDs (in class and homework) from that class is where you will find most the code you will need.

Because of how the data set is ordered, if you wish to create histogram for your variable “Understanding” for each level of your factors (language and/or delivery), you will have to first extract the subset of the data you want to plot, assign it to a dataframe and then use the name of that Data frame when ploting. See example:

#get the subset of the data you want to plot (say only look at jargon)

Data_jargon <- subset(Data_final, Language =="Jargon", select = c(Language, Understanding))

## If you click on Data_jargon, you will see that the only condition available in this variable is jargon. So you can use this data set to create a histogram for this condition.

#check range for that particular subset of the data to histogram parameters

range(Data_jargon$Understanding, na.rm = TRUE)

## [1] 1 4

#plot histogram as usual for only that subset of the data 

ggplot (data = Data_jargon, aes(Understanding)) + 
  geom_histogram (breaks=seq(1, 6, by = .87), col = "black", fill ="red")+ 
  labs(title = "Histogram: Understanding (Jargon)", x = "Understanding", y = "Count")+
  theme_classic()

## To compare how the distribution of the variable "Understanding" changes when plain language is used, just make another histogram after selecting the data for this condition.

## You can of course also select data based on delivery if the question to be addressed is about how understanding changed depending on this factor.

You will have to do similar data subsetting to get scatterplots for each specific time-period independently (not sure scatterplots are the best option for you because you are not working with two continuous outcome measures). But just in case..

## For example...

#get the subset of the data you want to plot (e.g., time-period from 1961-1979); note that here you have to select the two variables you would like to correlate

Data_jargon <- subset(Data_final, Language =="Jargon", select = c(Language, Validation_num, Validation_ch, Understanding, Delivery))


ggplot(Data_jargon, aes(x=Validation_ch, y=Understanding)) + 
  geom_point() +
  theme_classic()

## you can look at this relatioship per strata as well.

For making plots designed to investigate how your study variable “Understanding” differ by Language and delivery (e.g., box plots, bar or line plots of of means and standard deviations) the organization of your data set is perfect. Use the RMD related to class 3 to help you!

Your turn! Project starts here!

Part 1: The data set(s)

a. The data is from an experiment completed at Thomas More University. The participants were randomly selected from a population of undergraduate students who were completing research credit requirements for their general psychology class. Each received one research credit for their participation. The data contains participant numbers, followed by age, gender, major, minor, a column that indicated whether the participant had previous experience with medical terminology, group/condition, PANAS score time 1, Validity 1, Validity 2, Validity 3, Validity 4, and PANAS score time 2. The stiumulus was a narrative written by the experimenter that described a medical patient experiencing a heart attack. The experimental version included medical jargon and the control version included plain language. The narrative was presented to the participants through three methods of communication (face to face, phone, written). After listening to or reading the narrative, participants were asked to rate their level of understanding (validity 1), their willingness to validate what they heard or read (validity 2), and whether they felt further explanation would have been helpful in either of those choices (validity 3 and 4.

b. The goal was to examine if language and/or method of communication affected understanding and/or willingness to validate. Often, social workers are assigned abuse cases in which children sustain injuries. Communication between these social workers and the medical professionals involved in evaluating and treating the children often occurs by phone or fax, which are less effective means of communication compared to face to face. The communication process is often complicated by the use of medical jargon that the social workers are not trained to understand. While this study was not conducted with social workers and medical professionals and is not generalizable to that population, the undergraduate participants were largely unfamiliar with medical terminology similar to social workers. The experimenter did attempt to demonstrate how the use of medical jargon and less efficient means of communication affects understanding in a group of participants that were not trained in medical terminology in the hopes that significant results would prompt further research in cross discipline communications that occur widely in society. Examples would be law and social work, law and medical professionals, education and psychology and many others.

Part 2: Research questions

Research quesion 1:

For project 1, we have concentrated on the experimental (jargon) group and attempted to isolate two independent variable methods of communication (face to face, written). The dependent variables were validity 1 and validity 2. Validity 1 measured the participants level of understanding of the narrative on a 5 point likert scale. Specifically, we want to compare the responses of participants in the experimental face to face condition, to responses of participants in the experimental written condition. We expect that understanding will be higher in the face to face v. written condition.

Research quesion 2:

Validity 2 measured whether participants were willing to validate what they heard or read on a yes or no scale. Again, we’d like to compare responses of participants from the experimental written condition to responses of participants in the experimental face to face condition. We expect that willingness to validate will be higher in the experimental face to face condition v. the experimental written.

Part 3: Exploratory data analysis

#Validity 1 Research Question 1 we isolated the variables and created a bar graph for side by side comparison of understanding by written and face to face delivery methods.  

Data_jargon_written <- subset(Data_jargon, Delivery =="Written", select = c(Language, Understanding, Delivery, Validation_ch))

Data_jargon_face <- subset(Data_jargon, Delivery =="Face", select = c(Language, Understanding, Delivery, Validation_ch))

ggplot (data = Data_jargon_written, aes(Understanding)) +
  geom_histogram (breaks=seq(1, 4, by = .3), col = "blue", fill ="yellow")+
  labs(title = "Histogram: Level of Understanding (Written)", x = "Level of Understanding", y = "Count")+
  theme_classic()

ggplot (data = Data_jargon_face, aes(Understanding)) +
  geom_histogram (breaks=seq(1, 4, by = .3), col = "black", fill ="blue")+
  labs(title = "Histogram: Level of Understanding (Face)", x = "Level of Understanding", y = "Count")+
  theme_classic()

#Valididy 2 research question 2 we isolated the variables and created a bar graph for side by side comparison of willingness to validate by written and face to face delivery methods.

Data_FaceSelfValid <- subset(Data_jargon, Delivery =="Face", select = c(Validation_num, Delivery))

Data_WrittenSelfValid <- subset(Data_jargon, Delivery =="Face", select = c(Validation_num, Delivery))

ggplot (data = Data_FaceSelfValid, aes(Validation_num)) +
  geom_histogram (breaks=seq(1, 3, by = .25), col = "black", fill ="blue")+
  labs(title = "Histogram: Willingness to Validate (Face)", x = "Willingness to Validate", y = "Count")+
  theme_classic()

ggplot (data = Data_WrittenSelfValid, aes(Validation_num)) +
  geom_histogram (breaks=seq(1, 3 , by = .25), col = "blue", fill ="yellow")+
  labs(title = "Histogram: Willingness to Validate (Written)", x = "Willingness to Validate", y = "Count")+theme_classic()

Step 3

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Data_jargon_face %>%
  summarise(mean_face_Understanding = mean(Understanding))

##   mean_face_Understanding
## 1                2.470588

Data_jargon_written %>%
  summarise(mean_written_Understanding = mean(Understanding))

##   mean_written_Understanding
## 1                     2.5625

Data_jargon_face %>%
  summarise(face_yes_proportion = sum(Validation_ch == "yes") / n())

##   face_yes_proportion
## 1           0.7647059

Data_jargon_written %>%
  summarise(written_yes_proportion = sum(Validation_ch == "yes") / n())

##   written_yes_proportion
## 1                   0.75

Project1_Group5

Paula

9/24/2019