Psychometrics HW #1

For this homework assignment, the ‘psych’ and ‘ggplot2’ packages will be used for some problems

library(psych)
library(ggplot2)

Importing Data

For the purpose of this assignment, we will be using a text file. Make sure the text file is saved in your working directory for easier recall. The text file we will be working with is called “HW1Data.txt”

getwd() # this tells you where your current working directory is

data<-read.delim("HW1Data.txt", header = TRUE) #header = T allows variable names to be displayed correctly

After we’ve imported our text file, let’s check the structure of our data to ensure it was read in properly

str(data) 
View(data)

Delete cases with MORE than three missing values.

To identify which cases in our data have more than 3 missing cases, we can easily achieve this through the the command below:

data[!complete.cases(data),] #'data' is just whatever you named your dataset

##     Form_number Gender Age marital_status HADS1 HADS2 HADS3 HADS4 HADS5
## 36           37     NA  NA             NA     2     1     1     0     3
## 74           76      2  54             NA     2     1     2     0     2
## 88           90     NA  NA             NA     4     2     4     2     4
## 112         114      2  59              3     3     2     3     2     3
## 119         122     NA  NA             NA     4     2     4     2     3
## 128         131     NA  NA             NA     2     2     2     1     2
## 200         206     NA  NA             NA     2     2     2     2     2
## 202         208     NA  NA             NA     4     1     4     2     4
##     HADS6 HADS7 HADS8 HADS9 HADS10 HADS11 HADS12 HADS13 HADS14
## 36      2     0     2     1      4      2      0      1      1
## 74      1     1     2     1      4      2      1      3      1
## 88      4     1     3     2      3      4      2      4      2
## 112     4     0    NA    NA     NA     NA     NA     NA     NA
## 119     4     1     4     2      4      4      2      4      2
## 128     2     0     1     0      4      1      2      2      1
## 200     4     2     3     1      4      3      2      3      2
## 202     4     2     4     2      4      4      2      4      2

Participant ‘form_numbers’ 114 (line 112) is the only individual who has more than 3 missing values in the data.

Now, let’s delete these cases from the data and create a new clean dataset called ‘clean.data’

Below you will find a quick command that allows us to delete specific rows. The negative sign in front of the ‘c’ tells R that we would like these rows removed in our new dataset.

clean.data<- data[-c(112),]

Notice now that we should have 1 less observations in our new dataset. Also, notice that the row numbers that are entered above differ from the participant ‘form_number.’ Make sure you are deleting the correct rows that correspond to the people you actually want to omit.

Compute total scores for positive affect and life satisfaction.

There are several ways to compute total/sum scores in R. Since we have complete cases for these variables, different computation practices are equivalent. For this example, variables are purely added with the ‘+’ sign. However, you may use the ‘sum’ or rowSums function in R as well.

#For positive affect, the items are 1,3,5,7,9,11 and 13 
#'Pa_Total' is the name of our new created vbl. We add it to the dataset by putting our data set name$ and then our new defined sum variable.
clean.data$PA_Total<- clean.data$HADS1+clean.data$HADS3+clean.data$HADS5+clean.data$HADS7+clean.data$HADS9+ clean.data$HADS11+clean.data$HADS13

Now, let’s check the distribution of our newly created variable through a histogram

We use ‘xlab’ to add title to x-axis, and ‘main’ to add title of the histogram.

hist(clean.data$PA_Total, xlab="Positive Affect", main= "Histogram of PA")

Let’s do the same thing, but now for ‘life satisfaction.’ For creating this sum score, the life satisfaction items in the dataset are ‘2,4,6,8,10,12 and 14’

clean.data$LS_Total<- clean.data$HADS2 + clean.data$HADS4 + clean.data$HADS6 + clean.data$HADS8 + clean.data$HADS10 + clean.data$HADS12 + clean.data$HADS14

Similarly, let’s also check over the distribution of our newly created variable with the ‘hist’ command.

hist(clean.data$LS_Total, main="Histogram of LS", xlab="Life Satisfaction Total")

Explore and report the distribution of the total scores for each of these subscales (e.g., central tendency and variability)

For this question, we will be using the ‘psych’ package to report various descriptive statistics. This package should already be loaded in your library. The ‘describe’ function gives us valuable information in a quick and easy format. This allows us to see our mean, median, standard deviation, etc., for a particular variable of interest.

library(psych) 
describe(clean.data$PA_Total)

##    vars   n  mean   sd median trimmed  mad min max range  skew kurtosis
## X1    1 207 16.79 4.36     17   17.02 4.45   5  24    19 -0.47    -0.51
##     se
## X1 0.3

Now, let’s run the same code for life satisfaction:

describe(clean.data$LS_Total)

##    vars   n  mean   sd median trimmed  mad min max range  skew kurtosis
## X1    1 207 15.45 3.77     16   15.78 4.45   5  20    15 -0.66     -0.6
##      se
## X1 0.26

Create a bar graph comparing total positive affect scores for men and women.

First, to make it easier let’s assign levels to gender so that 1 = ‘male’ and 2 = ‘female’

clean.data$Gender<- factor(clean.data$Gender, levels = c(1,2), labels = c("Male","Female"))

There are various ways to create a bar graph in R. Below is a simple example that uses the package ‘ggplot2’ that should also be loaded already. Here, we are able to seperate gender by our variable of interest (Positive Affect Total Score)

ggplot(clean.data, aes((Gender), PA_Total, fill = Gender)) + #fill represents your grouping variable
  geom_bar(stat="identity", position = "dodge") + 
scale_fill_brewer(palette = "Set1") + #scale_fill_brewer works  the same as 'scale_colour_brewer' but for fill colors
  ylab("Positive Affect Total Score") + #title of y axis
  xlab("Gender") + #title of x axis
  ggtitle("Positive Affect by Gender") #title of chart - can mess with centering if you like

Create a bivariate scatter plot showing the relationship b/t positive affect & life satisfaction scores, & compute their correlation.

Computing a correlation for positive affect & life satisfaction can also be achieved a number of ways in R, the easiest being through the ‘cor.test’ command. Keep in mind that this will default to a pearson correlation and listwise deletes any missing data. However, “kendall”, or “spearman” correlations can also be computed with specifying this type of ‘method’ in the command.

cor.test(clean.data$PA_Total, clean.data$LS_Total)

## 
##  Pearson's product-moment correlation
## 
## data:  clean.data$PA_Total and clean.data$LS_Total
## t = 13.121, df = 205, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5939827 0.7434942
## sample estimates:
##       cor 
## 0.6756261

The correlation between these two variables is statistically significant and is around .676

Now, let’s create a Scatterplot for positive affect and life satisfaction also using the ggplot2 package:

Scatterplot<-ggplot(clean.data, aes(x=PA_Total, y=LS_Total)) + geom_point(colour = "blue", size = 2) #size refers to point size
print(Scatterplot + ggtitle("Relationship of Positive Affect and Life Satisfaction")
      + labs(x="Positive Affect", y="Life Satisfaction"))

Perform a linear transformation of the life satisfaction scores. Rescale them to have a mean of 100 and a standard deviation of 10.

For this question, we will be using the ‘rescale’ command in the psych package. This allows us to compute a linear transformation of life satisfaction with a specified mean and standard deviation of our choosing. Similar to when we computed sum scores, we are computing a new variable called “LS_Total_Rescale” and saving it back into our original clean dataset.

clean.data$LS_Total_Rescale<-rescale(clean.data$LS_Total, mean = 100, sd = 10, df=F)

Repeat step 6 using the transformed life satisfaction scores. What did you find?

Correlation between the transformed Life Satisfaction variable and Positive Affect:

cor.test(clean.data$LS_Total_Rescale, clean.data$PA_Total)

## 
##  Pearson's product-moment correlation
## 
## data:  clean.data$LS_Total_Rescale and clean.data$PA_Total
## t = 13.121, df = 205, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.5939827 0.7434942
## sample estimates:
##       cor 
## 0.6756261

Now, we will create another scatterplot using the ggplot2 package - this time with the new linearly transformed ‘life satisfaction variable.’ We can almost use the exact same code as before. Below the only difference is changing the ‘y’ variable to our newly transformed variable (and color was also changed to purpledifferentiate the two plots).

Scatterplot.revised<-ggplot(clean.data, aes(x=PA_Total, y=LS_Total_Rescale)) + geom_point(colour = "purple", size = 2) 
print(Scatterplot.revised + ggtitle("Relationship of Positive Affect and Transformed Life Satisfaction")
      + labs(x="Positive Affect", y="Life Satisfaction"))

So what did we find?

As the objective was to perform a linear transformation on the life satisfaction sum score, the correlation between positive affect and the transformed life satisfaction variable should still be the same as the original correlation (around .676)

Remember, a linear transformaton will not impact the association between two variables or change the distribution.