Lab 2: Regression Example

Before we get started, this lab will use some new packages in R. Each lab, we may use some new R packages that require you run some code to install new packages.

There are 2 parts to this: installing the package, then loading the library.

#install.packages("doBy")
#install.packages("psych")
#install.packages("dplyr")
#install.packages("DT")

library(psych)
library(doBy)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:doBy':
## 
##     order_by

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(DT)

In this Lab, we’re going to talk about (and practice) REGRESSION. Regression is a statistical technique used to PREDICT some outcome.

READING IN THE DATA First, to get started, let’s read in a dataset that we’ll use for the duration of the lab. Download the Lab2RegressionExample.csv data file from ELMS if you have not already done so. Remember to save the file ON YOUR VIRTUAL DESKTOP if you are using R from your virtual desktop. Or, if you’re using R on your own computer, save it onto your own machine. Remember to save it to a place where you can find it easily.

Then, to import the dataset, look on the top righthand side of the screen. Remember that this is where your saved objects and variables go when you run code for them. There is a button that says, “Import Dataset.” Click the button, and choose the first option: “From text (base)”. Navigate to the place where you saved the Lab2RegressionExample.csv file, and select it from the menu. Make sure the cirlce next to “Heading” is checked yes. Then you can import the data.

Lab2RegressionExample <- read.csv("~/PSYC 420 DATA/Lab 2 and 3/Lab2RegressionExample-1.csv")

Let’s name the dataset as something that’s a bit easier for us to reference it later. Remember to do that, we’ll provide a short, easy name, and then put an arrow (<-), and then reference the dataset we just read in.

lab<- Lab2RegressionExample

Let’s take a look at the dataset, using the View function:

lab

##      ID SharedValues Exclusion Extroversion Enjoyment Type Time
## 1     5            1       120            3        10    B   30
## 2    11            1       118            4        10    B   80
## 3     9            1       103            1        11    B   20
## 4    61            1       102            2        11    B   30
## 5   109            1       108            4        11    B   42
## 6     1            1       111            7        12    B   33
## 7     7            1       104            2        12    B   41
## 8    78            1       108            7        14    L   50
## 9    58            1        97            6        15    B  120
## 10   62            1       102            3        16    B  100
## 11   57            1       100            1        18    B   50
## 12    2            1       107            1        19    L   32
## 13   18            1       109            1        19    B   43
## 14   86            1        96            7        19    B   56
## 15   10            2        86            5        19    B   83
## 16   14            2        90            7        20    B   48
## 17   59            2        88            1        20    B   50
## 18    3            1       102            6        21    B   92
## 19    4            2        84            3        21    L   31
## 20   23            2        85            2        21    B   10
## 21   15            1       109            6        22    L   12
## 22   22            2        91            2        23    L   20
## 23   33            2        93            7        23    B   47
## 24   91            2        82            1        24    B   87
## 25    6            3        59            7        24    B   88
## 26   20            2        73            4        25    L   34
## 27   73            2        77            1        25    L   43
## 28   85            2        80            1        25    B   90
## 29   94            2        72            5        25    L   21
## 30   35            3        61            4        25    B    2
## 31   71            3        66            7        25    B   45
## 32   24            2        73            6        26    B   30
## 33   53            2        82            5        26    B   10
## 34   81            2        86            1        26    L   13
## 35  113            2        78            7        26    B    8
## 36   34            2        70            7        27    L   26
## 37   76            2        79            5        27    B   72
## 38   19            3        68            6        28    B   83
## 39   68            3        68            4        28    B   49
## 40  112            3        61            1        28    L   40
## 41   17            2        62            3        29    L   38
## 42  108            2        69            3        29    L   37
## 43   95            3        61            7        32    L   33
## 44  106            3        61            4        32    B   22
## 45  118            3        63            4        33    L   25
## 46   52            3        50            3        34    B   32
## 47   90            3        52            6        34    B   39
## 48   25            3        52            2        36    B   30
## 49   32            3        59            4        36    L   28
## 50   16            3        53            7        37    L   57
## 51   66            3        62            3        37    L   60
## 52   97            3        62            2        38    L   63
## 53   67            4        42            5        38    L   62
## 54   13            3        40            3        39    L   74
## 55   21            3        45            1        39    B   21
## 56   39            4        39            2        39    L   28
## 57   29            3        53            1        40    B   30
## 58   38            3        45            2        40    L   32
## 59   72            4        39            1        40    B   30
## 60   87            4        39            7        40    L   33
## 61  117            4        40            4        40    L   38
## 62   26            3        43            2        41    B   93
## 63   30            3        44            5        41    L   82
## 64   60            3        44            5        41    B   40
## 65    8            4        40            4        41    B   41
## 66   83            4        38            6        41    L   44
## 67  110            4        37            3        43    L   48
## 68   27            4        30            2        45    L   46
## 69   31            4        38            5        45    B   48
## 70   63            4        32            2        45    L   34
## 71   77            4        38            6        45    B   38
## 72   79            4        29            2        45    B   20
## 73   12            4        32            5        46    B   11
## 74   45            4        32            4        47    L   13
## 75   75            4        31            4        47    L    8
## 76  107            4        29            7        47    B   15
## 77   56            5        22            7        47    L   18
## 78   64            5        30            1        47    L   22
## 79   70            4        33            2        48    L   30
## 80   28            5        18            2        48    B   28
## 81   49            5        25            2        48    L   29
## 82   80            6        11            5        48    L   31
## 83   51            5        18            5        49    L   38
## 84  115            5        23            6        49    B   40
## 85   44            6        10            4        49    B   41
## 86   84            6         7            6        49    L   30
## 87   99            6         6            1        49    L   28
## 88   93            6         9            2        50    B   22
## 89  100            5        24            6        51    B   21
## 90   55            5        26            4        52    B   92
## 91   96            5        17            1        52    L   97
## 92   46            6         7            6        52    B   20
## 93   65            5        18            4        53    L   23
## 94   54            5        13            3        54    L   43
## 95  111            5        17            1        55    L   35
## 96   37            6         7            1        55    L   54
## 97   43            6         9            3        55    L   57
## 98   48            6         8            6        55    B   45
## 99   40            5        21            1        57    L   43
## 100  69            5        14            7        58    L   98
## 101  98            5        12            5        58    L  100
## 102 114            6         7            5        58    L  110
## 103  89            5        18            6        59    B   30
## 104 104            6        12            6        59    L   11
## 105  50            5        13            2        60    L    3
## 106  74            6         1            3        60    L   30
## 107 101            5        17            4        61    B   37
## 108 102            5        20            7        62    L   76
## 109  42            6         3            7        63    L   63
## 110  47            6         4            7        66    L   50
## 111 119            6         2            5        68    L   58
## 112  88            7         4            7        69    L   59
## 113  82            7         9            6        70    L   34
## 114 116            7         6            6        71    B   37
## 115 105            7         1            3        75    L   38
## 116  36            7         4            4        76    L   32
## 117  41            7         3            2        76    L   10
## 118 103            7         9            4        78    L   12
## 119  92            7         4            3        82    L   18

We can also print out some information about the dataset, like:

The top of the document

head(lab)

##    ID SharedValues Exclusion Extroversion Enjoyment Type Time
## 1   5            1       120            3        10    B   30
## 2  11            1       118            4        10    B   80
## 3   9            1       103            1        11    B   20
## 4  61            1       102            2        11    B   30
## 5 109            1       108            4        11    B   42
## 6   1            1       111            7        12    B   33

The column names / variables in the dataset:

colnames(lab)

## [1] "ID"           "SharedValues" "Exclusion"    "Extroversion" "Enjoyment"   
## [6] "Type"         "Time"

DESCRIPTIVE STATISTICS Let’s take a look at some descriptives for this data. Let’s start with some summary statistics about the dataset, including the minimum and maximum scores, and the MEDIAN and MEAN scores, for each variable in the dataset.

describe(lab)

##              vars   n  mean    sd median trimmed   mad min max range  skew
## ID              1 119 60.00 34.50     60   60.00 44.48   1 119   118  0.00
## SharedValues    2 119  3.68  1.81      4    3.64  1.48   1   7     6  0.16
## Exclusion       3 119 48.03 34.30     40   46.16 41.51   1 120   119  0.38
## Extroversion    4 119  3.98  2.08      4    3.98  2.97   1   7     6  0.02
## Enjoyment       5 119 39.92 17.27     40   39.34 20.76  10  82    72  0.23
## Type*           6 119  1.54  0.50      2    1.55  0.00   1   2     1 -0.15
## Time            7 119 42.41 25.28     37   39.90 19.27   2 120   118  0.97
##              kurtosis   se
## ID              -1.23 3.16
## SharedValues    -1.07 0.17
## Exclusion       -1.10 3.14
## Extroversion    -1.35 0.19
## Enjoyment       -0.67 1.58
## Type*           -1.99 0.05
## Time             0.38 2.32

Let us say that we might be interested in using the TYPE variable. Let’s take another look at the dataset:

lab

##      ID SharedValues Exclusion Extroversion Enjoyment Type Time
## 1     5            1       120            3        10    B   30
## 2    11            1       118            4        10    B   80
## 3     9            1       103            1        11    B   20
## 4    61            1       102            2        11    B   30
## 5   109            1       108            4        11    B   42
## 6     1            1       111            7        12    B   33
## 7     7            1       104            2        12    B   41
## 8    78            1       108            7        14    L   50
## 9    58            1        97            6        15    B  120
## 10   62            1       102            3        16    B  100
## 11   57            1       100            1        18    B   50
## 12    2            1       107            1        19    L   32
## 13   18            1       109            1        19    B   43
## 14   86            1        96            7        19    B   56
## 15   10            2        86            5        19    B   83
## 16   14            2        90            7        20    B   48
## 17   59            2        88            1        20    B   50
## 18    3            1       102            6        21    B   92
## 19    4            2        84            3        21    L   31
## 20   23            2        85            2        21    B   10
## 21   15            1       109            6        22    L   12
## 22   22            2        91            2        23    L   20
## 23   33            2        93            7        23    B   47
## 24   91            2        82            1        24    B   87
## 25    6            3        59            7        24    B   88
## 26   20            2        73            4        25    L   34
## 27   73            2        77            1        25    L   43
## 28   85            2        80            1        25    B   90
## 29   94            2        72            5        25    L   21
## 30   35            3        61            4        25    B    2
## 31   71            3        66            7        25    B   45
## 32   24            2        73            6        26    B   30
## 33   53            2        82            5        26    B   10
## 34   81            2        86            1        26    L   13
## 35  113            2        78            7        26    B    8
## 36   34            2        70            7        27    L   26
## 37   76            2        79            5        27    B   72
## 38   19            3        68            6        28    B   83
## 39   68            3        68            4        28    B   49
## 40  112            3        61            1        28    L   40
## 41   17            2        62            3        29    L   38
## 42  108            2        69            3        29    L   37
## 43   95            3        61            7        32    L   33
## 44  106            3        61            4        32    B   22
## 45  118            3        63            4        33    L   25
## 46   52            3        50            3        34    B   32
## 47   90            3        52            6        34    B   39
## 48   25            3        52            2        36    B   30
## 49   32            3        59            4        36    L   28
## 50   16            3        53            7        37    L   57
## 51   66            3        62            3        37    L   60
## 52   97            3        62            2        38    L   63
## 53   67            4        42            5        38    L   62
## 54   13            3        40            3        39    L   74
## 55   21            3        45            1        39    B   21
## 56   39            4        39            2        39    L   28
## 57   29            3        53            1        40    B   30
## 58   38            3        45            2        40    L   32
## 59   72            4        39            1        40    B   30
## 60   87            4        39            7        40    L   33
## 61  117            4        40            4        40    L   38
## 62   26            3        43            2        41    B   93
## 63   30            3        44            5        41    L   82
## 64   60            3        44            5        41    B   40
## 65    8            4        40            4        41    B   41
## 66   83            4        38            6        41    L   44
## 67  110            4        37            3        43    L   48
## 68   27            4        30            2        45    L   46
## 69   31            4        38            5        45    B   48
## 70   63            4        32            2        45    L   34
## 71   77            4        38            6        45    B   38
## 72   79            4        29            2        45    B   20
## 73   12            4        32            5        46    B   11
## 74   45            4        32            4        47    L   13
## 75   75            4        31            4        47    L    8
## 76  107            4        29            7        47    B   15
## 77   56            5        22            7        47    L   18
## 78   64            5        30            1        47    L   22
## 79   70            4        33            2        48    L   30
## 80   28            5        18            2        48    B   28
## 81   49            5        25            2        48    L   29
## 82   80            6        11            5        48    L   31
## 83   51            5        18            5        49    L   38
## 84  115            5        23            6        49    B   40
## 85   44            6        10            4        49    B   41
## 86   84            6         7            6        49    L   30
## 87   99            6         6            1        49    L   28
## 88   93            6         9            2        50    B   22
## 89  100            5        24            6        51    B   21
## 90   55            5        26            4        52    B   92
## 91   96            5        17            1        52    L   97
## 92   46            6         7            6        52    B   20
## 93   65            5        18            4        53    L   23
## 94   54            5        13            3        54    L   43
## 95  111            5        17            1        55    L   35
## 96   37            6         7            1        55    L   54
## 97   43            6         9            3        55    L   57
## 98   48            6         8            6        55    B   45
## 99   40            5        21            1        57    L   43
## 100  69            5        14            7        58    L   98
## 101  98            5        12            5        58    L  100
## 102 114            6         7            5        58    L  110
## 103  89            5        18            6        59    B   30
## 104 104            6        12            6        59    L   11
## 105  50            5        13            2        60    L    3
## 106  74            6         1            3        60    L   30
## 107 101            5        17            4        61    B   37
## 108 102            5        20            7        62    L   76
## 109  42            6         3            7        63    L   63
## 110  47            6         4            7        66    L   50
## 111 119            6         2            5        68    L   58
## 112  88            7         4            7        69    L   59
## 113  82            7         9            6        70    L   34
## 114 116            7         6            6        71    B   37
## 115 105            7         1            3        75    L   38
## 116  36            7         4            4        76    L   32
## 117  41            7         3            2        76    L   10
## 118 103            7         9            4        78    L   12
## 119  92            7         4            3        82    L   18

If we hover our cursor over the Type variable, we can see some details about it. Here, the Type variable is reported as a “character” variable, but we can’t really use a “character” type of variable in analyses because it doesn’t contain numerical information. For instance, you can’t perform mathematical functions with letters. So, how do we account for this? We convert these letters/characters into numbers so that we can use them in analyses.

Specifically, let’s create a NEW variable/column in the dataset. We’ll call it TypeR, where the R stands for “Revised.” And, we’ll convert the previously existing Type variable from a CHARACTER variable into a FACTOR variable. A factor variable is a number that really represents a category. In our case, R will convert the two letters we previously had - B for business, and L for leisure - into 0’s and 1’s in a new variable called TypeR - and we can then use the new variable, TypeR, in analyses.

lab2 <- lab %>%
      mutate(TypeR = ifelse(Type == "B",0,1))
lab2

##      ID SharedValues Exclusion Extroversion Enjoyment Type Time TypeR
## 1     5            1       120            3        10    B   30     0
## 2    11            1       118            4        10    B   80     0
## 3     9            1       103            1        11    B   20     0
## 4    61            1       102            2        11    B   30     0
## 5   109            1       108            4        11    B   42     0
## 6     1            1       111            7        12    B   33     0
## 7     7            1       104            2        12    B   41     0
## 8    78            1       108            7        14    L   50     1
## 9    58            1        97            6        15    B  120     0
## 10   62            1       102            3        16    B  100     0
## 11   57            1       100            1        18    B   50     0
## 12    2            1       107            1        19    L   32     1
## 13   18            1       109            1        19    B   43     0
## 14   86            1        96            7        19    B   56     0
## 15   10            2        86            5        19    B   83     0
## 16   14            2        90            7        20    B   48     0
## 17   59            2        88            1        20    B   50     0
## 18    3            1       102            6        21    B   92     0
## 19    4            2        84            3        21    L   31     1
## 20   23            2        85            2        21    B   10     0
## 21   15            1       109            6        22    L   12     1
## 22   22            2        91            2        23    L   20     1
## 23   33            2        93            7        23    B   47     0
## 24   91            2        82            1        24    B   87     0
## 25    6            3        59            7        24    B   88     0
## 26   20            2        73            4        25    L   34     1
## 27   73            2        77            1        25    L   43     1
## 28   85            2        80            1        25    B   90     0
## 29   94            2        72            5        25    L   21     1
## 30   35            3        61            4        25    B    2     0
## 31   71            3        66            7        25    B   45     0
## 32   24            2        73            6        26    B   30     0
## 33   53            2        82            5        26    B   10     0
## 34   81            2        86            1        26    L   13     1
## 35  113            2        78            7        26    B    8     0
## 36   34            2        70            7        27    L   26     1
## 37   76            2        79            5        27    B   72     0
## 38   19            3        68            6        28    B   83     0
## 39   68            3        68            4        28    B   49     0
## 40  112            3        61            1        28    L   40     1
## 41   17            2        62            3        29    L   38     1
## 42  108            2        69            3        29    L   37     1
## 43   95            3        61            7        32    L   33     1
## 44  106            3        61            4        32    B   22     0
## 45  118            3        63            4        33    L   25     1
## 46   52            3        50            3        34    B   32     0
## 47   90            3        52            6        34    B   39     0
## 48   25            3        52            2        36    B   30     0
## 49   32            3        59            4        36    L   28     1
## 50   16            3        53            7        37    L   57     1
## 51   66            3        62            3        37    L   60     1
## 52   97            3        62            2        38    L   63     1
## 53   67            4        42            5        38    L   62     1
## 54   13            3        40            3        39    L   74     1
## 55   21            3        45            1        39    B   21     0
## 56   39            4        39            2        39    L   28     1
## 57   29            3        53            1        40    B   30     0
## 58   38            3        45            2        40    L   32     1
## 59   72            4        39            1        40    B   30     0
## 60   87            4        39            7        40    L   33     1
## 61  117            4        40            4        40    L   38     1
## 62   26            3        43            2        41    B   93     0
## 63   30            3        44            5        41    L   82     1
## 64   60            3        44            5        41    B   40     0
## 65    8            4        40            4        41    B   41     0
## 66   83            4        38            6        41    L   44     1
## 67  110            4        37            3        43    L   48     1
## 68   27            4        30            2        45    L   46     1
## 69   31            4        38            5        45    B   48     0
## 70   63            4        32            2        45    L   34     1
## 71   77            4        38            6        45    B   38     0
## 72   79            4        29            2        45    B   20     0
## 73   12            4        32            5        46    B   11     0
## 74   45            4        32            4        47    L   13     1
## 75   75            4        31            4        47    L    8     1
## 76  107            4        29            7        47    B   15     0
## 77   56            5        22            7        47    L   18     1
## 78   64            5        30            1        47    L   22     1
## 79   70            4        33            2        48    L   30     1
## 80   28            5        18            2        48    B   28     0
## 81   49            5        25            2        48    L   29     1
## 82   80            6        11            5        48    L   31     1
## 83   51            5        18            5        49    L   38     1
## 84  115            5        23            6        49    B   40     0
## 85   44            6        10            4        49    B   41     0
## 86   84            6         7            6        49    L   30     1
## 87   99            6         6            1        49    L   28     1
## 88   93            6         9            2        50    B   22     0
## 89  100            5        24            6        51    B   21     0
## 90   55            5        26            4        52    B   92     0
## 91   96            5        17            1        52    L   97     1
## 92   46            6         7            6        52    B   20     0
## 93   65            5        18            4        53    L   23     1
## 94   54            5        13            3        54    L   43     1
## 95  111            5        17            1        55    L   35     1
## 96   37            6         7            1        55    L   54     1
## 97   43            6         9            3        55    L   57     1
## 98   48            6         8            6        55    B   45     0
## 99   40            5        21            1        57    L   43     1
## 100  69            5        14            7        58    L   98     1
## 101  98            5        12            5        58    L  100     1
## 102 114            6         7            5        58    L  110     1
## 103  89            5        18            6        59    B   30     0
## 104 104            6        12            6        59    L   11     1
## 105  50            5        13            2        60    L    3     1
## 106  74            6         1            3        60    L   30     1
## 107 101            5        17            4        61    B   37     0
## 108 102            5        20            7        62    L   76     1
## 109  42            6         3            7        63    L   63     1
## 110  47            6         4            7        66    L   50     1
## 111 119            6         2            5        68    L   58     1
## 112  88            7         4            7        69    L   59     1
## 113  82            7         9            6        70    L   34     1
## 114 116            7         6            6        71    B   37     0
## 115 105            7         1            3        75    L   38     1
## 116  36            7         4            4        76    L   32     1
## 117  41            7         3            2        76    L   10     1
## 118 103            7         9            4        78    L   12     1
## 119  92            7         4            3        82    L   18     1

Now, let’s take a look at some summary statistics based upon whether people attended a business or leisure gathering. We’ll use functions from the packages we installed at the top of this markdown file. Let’s use the categorical variable, Type, as a factor to break down some summary statistics of some other variables.

describeBy(lab2, lab2$Type)

## 
##  Descriptive statistics by group 
## group: B
##              vars  n  mean    sd median trimmed   mad min max range  skew
## ID              1 55 52.67 35.33     53   51.44 47.44   1 116   115  0.21
## SharedValues    2 55  2.96  1.64      3    2.82  1.48   1   7     6  0.52
## Exclusion       3 55 62.20 33.91     61   62.56 43.00   6 120   114 -0.06
## Extroversion    4 55  4.15  2.11      4    4.18  2.97   1   7     6 -0.19
## Enjoyment       5 55 32.27 15.33     28   31.64 17.79  10  71    61  0.36
## Type*           6 55  1.00  0.00      1    1.00  0.00   1   1     0   NaN
## Time            7 55 44.76 27.34     40   42.76 14.83   2 120   118  0.83
## TypeR           8 55  0.00  0.00      0    0.00  0.00   0   0     0   NaN
##              kurtosis   se
## ID              -1.27 4.76
## SharedValues    -0.71 0.22
## Exclusion       -1.31 4.57
## Extroversion    -1.42 0.28
## Enjoyment       -0.87 2.07
## Type*             NaN 0.00
## Time            -0.17 3.69
## TypeR             NaN 0.00
## ------------------------------------------------------------ 
## group: L
##              vars  n  mean    sd median trimmed   mad min max range  skew
## ID              1 64 66.30 32.74   68.0   67.02 41.51   2 119   117 -0.14
## SharedValues    2 64  4.30  1.72    4.0    4.31  1.48   1   7     6 -0.13
## Exclusion       3 64 35.84 29.85   30.5   32.60 32.62   1 109   108  0.77
## Extroversion    4 64  3.84  2.06    4.0    3.81  2.97   1   7     6  0.20
## Enjoyment       5 64 46.48 16.19   47.0   46.04 16.31  14  82    68  0.16
## Type*           6 64  1.00  0.00    1.0    1.00  0.00   1   1     0   NaN
## Time            7 64 40.39 23.38   34.0   37.81 17.79   3 110   107  1.03
## TypeR           8 64  1.00  0.00    1.0    1.00  0.00   1   1     0   NaN
##              kurtosis   se
## ID              -1.16 4.09
## SharedValues    -1.05 0.22
## Exclusion       -0.40 3.73
## Extroversion    -1.26 0.26
## Enjoyment       -0.69 2.02
## Type*             NaN 0.00
## Time             0.81 2.92
## TypeR             NaN 0.00

Let’s look at the enjoyment variable. We can see a difference in average enjoyment between business and leisure gatherings, with leisure gatherings having a higher mean. However, we don’t know if these differences are statistically significant! This is just a difference in means - but a specific statistical test, like a t-test, would help us to compare these means and determine if the difference is statistically significant. Let’s do that now - let’s run a t-test to examine differences in enjoyment across gathering type.

t.test(Enjoyment ~ Type, data=lab)

## 
##  Welch Two Sample t-test
## 
## data:  Enjoyment by Type
## t = -4.9116, df = 115.88, p-value = 2.992e-06
## alternative hypothesis: true difference in means between group B and group L is not equal to 0
## 95 percent confidence interval:
##  -19.942629  -8.480666
## sample estimates:
## mean in group B mean in group L 
##        32.27273        46.48438

This output tells us the results of the t-test. Notice that the p-value is presented in scientific format, and that’s because the value is VERY SMALL. In this case, if we wanted to see if a p-value is less than .05, this readout of the p-value tells us that the p-value is actually very small (close to zero), and so we do have a significant t-test. The t-test compared the mean enjoyment of business gatherings and leisure gatherings - and this result tells us that people enjoyed gatherings differently on average based on whether it was a professional or leisure meeting. You can see at the bottom of the output that, mimicked what we saw previously from the summary, that Leisure (L) had a higher average enjoyment than Business (B).

Let’s try another example of this t-test. Let’s say that a researcher thinks that people may believe there is a difference in SharedValues amongst gatherings of business contacts compared to other social gatherings for leisure purposes. That is, people may think that the type of gathering matters in terms of people’s shared values. Examine this by testing whether SharedValues is influenced by gathering type.

t.test(SharedValues~Type, data=lab2)

## 
##  Welch Two Sample t-test
## 
## data:  SharedValues by Type
## t = -4.312, df = 115.72, p-value = 3.42e-05
## alternative hypothesis: true difference in means between group B and group L is not equal to 0
## 95 percent confidence interval:
##  -1.9456548 -0.7208224
## sample estimates:
## mean in group B mean in group L 
##        2.963636        4.296875

Does gathering type predict shared values? #No because the P-Val is too high to be signifficant

CORRELATIONS Let’s take a look at the relationship between some of these other variables.

If the correlation’s p-value is below .05, we would conclude that there is a significant correlation. And, from there, we want to pay attention to two specific aspects of the correlation estimate we get:

Its DIRECTION - is the value we receive positive, or negative? If the correlation is POSITIVE, this means that as 1 variable increases, the other variable also generally increases as well.
The STRENGTH - how powerful is the relationship between the two variables? Bigger absolute values mean that the strength of the relationship between the two variables is stronger.

Here, we’ll use the function cor.test to print out the p-values and correlation estimate (r) for our variables.

Let’s first take a look at the correlation between SharedValues and Exclusion. After all, participants are reflecting back on an experience in a large social gathering. Maybe people tended to feel less excluded from the group if they believe that the group shared the same values as them. Or, thought about just a bit differently, maybe people’s experiences of a lack of exclusion led them to expect that the group shared the same values as them? To get an initial impression of the relationship between these variables, let’s run a correlation between SharedValues and Exclusion.

Note that in this code, we want to reference the variables from our dataset - called lab2 - by adding lab2 and a dollar sign ($) in front of the variable names.

plot(SharedValues ~ Exclusion, data=lab2)

cor.test(lab2$SharedValues,lab2$Exclusion)

## 
##  Pearson's product-moment correlation
## 
## data:  lab2$SharedValues and lab2$Exclusion
## t = -38.323, df = 117, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.9737200 -0.9463357
## sample estimates:
##        cor 
## -0.9623997

Note that we see in our output a few pieces of information. Let’s focus on the p-value, and the correlation estimate. The p-value should read: p-value < 2.2e-16. Note that, again, when you see p-values in this format, it’s just scientific notation for a really small number. In this case, p is a lot less than .05. Given that the p is less than .05, we’d conclude this was a significant correlation.

Moving onto the correlation estimate, the value we get is -0.96. We know 2 things about the relationship based on this value: 1. Direction - here, we see it’s negative. This means that as 1 variable increases, the other decreases. 2. Strength - here, we see the value is -0.96. That’s a very strong correlation! It’s almost 1!

Based on this result, we get quite a lot of information about these 2 variables. We know that based on the DIRECTION of the relationship (negative), as 1 variable, say exclusion, increases, the other variable, SharedValues, decreases. The more excluded people felt, the less they felt the group shared their values. Alternatively, we could posit that as more people felt the group shared their values, the less excluded they felt.

We also know that, based on the STRENGTH of the relationship, that as we got information about 1 variable, we got a LOT of information about the other variable. So, for instance, knowing how much people felt the group shared their values tells us quite a lot about how excluded they may have felt, too. This is a perspective that will help us to understand regression - our main topic.

INTRO TO REGRESSION Regression is a model about prediction. You are trying to predict some outcome using some predictor.

Let’s take a look at the t.test analysis we conducted already, earlier in this code document.Here is the code: t.test(Enjoyment ~ TypeR, data=lab2)

Notice the format: you have 1 variable, a tilde (~), and then another variable. In R, the “~” symbol (without quotation marks) often means “predicted by.” For instance, the example of code listed above would mean:

Enjoyment predicted by gathering type

This notation is useful for remembering and understanding the way we denote regression in R. Regression is a PREDICTION MODEL, where we have some outcome that we are interested in (e.g., Enjoyment), and we want to determine whether certain other variables are useful in predicting that outcome. Regression models try to fit a line through the graph of the data to try to accurately predict the outcome based on some other values (i.e., predictors). The more informative the predictors are, the better the line will fit to the data. Here is an example of what data may look like:

mean(lab$Enjoyment)

## [1] 39.91597

plot(Enjoyment ~ Exclusion, data=lab2)

reg<- lm(Enjoyment ~ Exclusion, data=lab2)

plot(Enjoyment ~ Exclusion, data=lab2) +abline(reg)

## integer(0)

Here, you can see some data points (i.e., the circles) sprinkled around the graph. Then, there is a line that is drawn through something like the center of the data. That is the REGRESSION LINE. In this case, this graph illustrates for us that this line does a fairly good job of cutting through the data in an informative way. We know that because the line hews fairly closely to the data.

The closer the data points (circles) are to the line, the better the regression model did in predicting the actual outcome data.

The farther the data points are from the line, the worse the regression model did in predicting the actual outcome data.

Let’s take a look at another example from this data:

plot(SharedValues ~Extroversion, data=lab2)

reg2 <- lm(SharedValues~Extroversion, data=lab2)

plot(SharedValues ~Extroversion, data=lab2) +abline(reg2)

## integer(0)

In this example, what do we see? We see lots of circles kind of uniformly distributed in columns across the graph, and the regression line is somewhere in the middle. But, how far are the dots from the line?

They’re pretty far, right? Based on this visualization, we might say the regression is NOT doing a great job of predicting the data, because the circles are pretty far, generally, from the line. There are more circles far from the line than close to it.

REGRESSION EQUATIONS The regression line is formed through a basic equation. A typical regression equation might look something like this, where we have 2 predictors, X1 and X2:

Y = b0 + b1(x1) + b2(x2)

Predictied Shared Values = Where it crosses the Y axis + Slope of how the Y changes when we increase extraversion by 1 unit

where: Y = the outcome we are trying to predict b0 = the intercept, which tells us what the value of Y would be when both X1 and X2 are equal to 0 b1 = the slope (i.e., rate of change/effect) for X1 and Y X1 = a predictor variable b2 = the slope for X2 and Y X2 = a predictor variable

Remember that the fit of the regression line (i.e., how well it maps onto the data) is dependent upon how well the predictors do in predicting the outcome. The better your predictors, the better your regression fit - i.e., the better your model. Often, your task is to try to find strong predictors that are closely related to your outcome so that the line is positioned well in your graph to make sense of the data. With useful predictors, you can infer what your outcome (e.g., Enjoyment) would be at many different values of your X variables.

Regression analyses allow you to examine how well a variable predicts an outcome by telling you about its slope.

Let’s run a simple linear regression that just has 1 predictor variable. In this case, let’s take a look at whether SharedValues predict Enjoyment at social gatherings. The code below creates an object, fit1, that runs the regression, where SharedValues tries to predict Enjoyment. Then, it prints the results.

fit1<- lm(Enjoyment ~ SharedValues, data=lab2)
summary(fit1)

## 
## Call:
## lm(formula = Enjoyment ~ SharedValues, data = lab2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.0605  -3.5938   0.4062   3.8478  11.8228 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.3605     1.0773   5.904 3.54e-08 ***
## SharedValues   9.1167     0.2629  34.676  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.164 on 117 degrees of freedom
## Multiple R-squared:  0.9113, Adjusted R-squared:  0.9106 
## F-statistic:  1202 on 1 and 117 DF,  p-value: < 2.2e-16

There are 2 results printed in the coefficients table of the result. The INTERCEPT row tells you what the outcome (Enjoyment) is when the X variable (SharedValues) is 0. Here, the intercept is essentially telling us what the average enjoyment is expected to be for someone who believes that there is absolutely no overlap in values between themselves and other attendees of the social gathering they last went to pre-pandemic.

The second row tells us about the relationship between Y (Enjoyment) and our X variable (SharedValues). The “estimate” column tells us that, with a 1-unit increase in X, there is a 9.12 unit increase in Y. Said another way - as participants increased 1 unit in perceived shared values score, participants experienced a 9.12 unit increase in enjoyment.

There is also a p-value column, telling us that the slope is significant - i.e., that SharedValues is a useful predictor of our outcome, Y. In our analyses here, we would determine that, without any other predictors in this model, SharedValues can give us valuable information about people’s enjoyment at their last social gathering.

Using the equation

Note that, for this regression with only 1 X variable in it, our regression line is defined by this equation: Y = b0 + b1X1

And, armed with information about the intercept and the slope of the X variable, we could predict what Y (enjoyment) would be given any value of X. We would simply plug the value for the intercept and the slope into the equation, and multiply by a value of X:

intercept slope Y = 6.3605 + (9.1167)*X

Using this equation, if X = 0, then Y would yield:

y1<- 6.3605 + (9.1167)*0
y1

## [1] 6.3605

Note that this is value is equal to the intercept. This is because the intercept reflects the value of Y when X = 0. So, for people who believe there is 0 shared values between themselves and other attendees, their enjoyment at the party is only 6.3605.

If X = 1, then Y would yield:

y2<- 6.3605 + (9.1167)*1
y2

## [1] 15.4772

We can see that our Y increased when X was 1, relative to when X was 0. In fact, we know how much it increased: it increased by the amount of the slope: 9.1167. That is what the slope tells us: as X increases by 1 unit, this equation is simply adding another 9.1167 to the value of Y. To confirm this, let us subtract the value of y when X was 1, from the value of y when X was 0:

diff<- y2 - y1
diff

## [1] 9.1167

This confirms that the equation simply adds another 9.1167 to the Y value with a 1-unit increase in X. This equation is useful, because it means that we can use this model to anticipate the average Enjoyment of people with SharedValues at any value. For example:

y3<- 6.3605 + (9.1167)*5
y3

## [1] 51.944

This value tells us what the Enjoyment should be for people who have a score of 5 on the SharedValue measure.

y4<- 6.3605 + (9.1167)*50
y4

## [1] 462.1955

This value tells us what the Enjoyment should be for people who scored a 50 on their SharedValue measure.

Note that there are cautions that should be made when trying to make conclusions beyond your data. For instance, this model doesn’t account for any individual differences, or the fact that Enjoyment could stagnate or plateau. It could be the case that people become bored in parties with people who all have identical values, for instance. So, while this model is useful as a prediction tool, caution must be used when inferring results from predictive models, and you should typically not try to use a regression to predict for values well beyond the range of values you have in your dataset.

For example, SharedValues has a range of 1 to 7 in our dataset. It would likely be inappropriate to extrapolate outcome data for values of SharedValues well beyond 7, like 50 (in our last regression equation), because we don’t know that the data is reliable that far out of the range of what our dataset tells us. Like with many prediction tools, model quality is limited based upon the quality of the data that we “feed” into the regression.

OTHER METRICS Also note that there are other metrics that are printed with your data other than the intercept or slopes. Let’s take a look at our initial model one more time:

summary(fit1)

## 
## Call:
## lm(formula = Enjoyment ~ SharedValues, data = lab2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -13.0605  -3.5938   0.4062   3.8478  11.8228 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    6.3605     1.0773   5.904 3.54e-08 ***
## SharedValues   9.1167     0.2629  34.676  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.164 on 117 degrees of freedom
## Multiple R-squared:  0.9113, Adjusted R-squared:  0.9106 
## F-statistic:  1202 on 1 and 117 DF,  p-value: < 2.2e-16

Notice that we also get an F-statistic and a p value. This may look familiar to you from prior classes that covered ANOVAs. This is because ANOVAs and regression are in the same family of statistical tests, and perform similar procedures to the data - but they look a bit differently.

Also notice that there is a line with an “Adjusted R-squared” value. R squared is a value that attempts to tell you how much of the variance in your outcome (in this case, Enjoyment) is explained by your predictive model. Typically, the more useful predictors you add to the model, the more variance will be explained in your outcome.

However, not all predictors are useful for your model, and adding a lot of poor predictors to your model could artificially inflate your R squared due to random chance. For this reason, you should use the ADJUSTED R-SQUARED value that is reported in the summary. This value tells you how much variance in your outcome is explained by your model, but is adjusted based upon how many predictors you’ve added to the model. If you add many poor predictors, your Adjusted R-Squared value may decrease. But, like with R-squared, if you add more useful predictors to the model, the value should increase. We will use this information to our advance for the next section.

MULTIPLE REGRESSION Multiple regression involves multiple predictors, rather than just 1 predictor. Let’s take a look at our model when we add a second predictor variable, Exclusion, to it. In this case, we will have two X variables. First, let’s run a regression to examine whether both SharedValues and Exclusion predicts Enjoyment:

fit2<- lm(Enjoyment ~ SharedValues + Exclusion, data=lab2)
summary(fit2)

## 
## Call:
## lm(formula = Enjoyment ~ SharedValues + Exclusion, data = lab2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.1136  -2.8812   0.1041   2.8870  14.7768 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   26.9096     5.6808   4.737 6.20e-06 ***
## SharedValues   5.8610     0.9199   6.372 3.92e-09 ***
## Exclusion     -0.1784     0.0485  -3.678 0.000358 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.908 on 116 degrees of freedom
## Multiple R-squared:  0.9206, Adjusted R-squared:  0.9192 
## F-statistic: 672.3 on 2 and 116 DF,  p-value: < 2.2e-16

Based on these results, we see the intercept, and estimates and p-values for both SharedValues and for Exclusion.

Remember that the intercept is interpreted as the value of Y when X=0, but in this case, it reflects the value of Y when both X1 (SharedValues) and X2 (Exclusion) are equal to 0. Thus, the intercept tells us that people who were had no shared values amongst other party attendees, but were not excluded at all by these other attendees, had an average enjoyment score of 26.91.

For the other rows reflecting predictor variables, note that, when two predictors are entered into the model at once, each row reflects the UNIQUE variance of Y that is explained by that predictor. This means that: * the row for “SharedValues” tells us what amount of Enjoyment is explained by people’s SharedValues response - when accounting for the other predictor, Exclusion * the row for “Exclusion” tells us the amount of Enjoyment that is explained by people’s Exclusion - when accounting for the other predictor, SharedValues

In interpreting these effects: The row for SharedValues tells us that it is a significant predictor (i.e., the p-value is less than .05), and so it is providing us valuable information. The row also tells us the slope of SharedValues. It indicates that with a 1-unit increase in SharedValues, there is a 5.86 unit increase in Enjoyment when accounting for whether people have felt excluded by other attendees.

The row for Exclusion tells us what it is a significant predictor (the p-value is less than .05), and so it is also providing us with valuable information. The row also tells us the slope of Exclusion. In this case, it is negative. This value tells us that, with a 1 unit increase in Exclusion, Enjoyment should DECLINE by 0.1784 units, when accounting for people’s beliefs that attendees share their same political and moral values.

Let’s also take a look at the Adjusted R-squared value. Here, the value is 0.9192, which tells us that about 91.92% of the variance in our outcome, Enjoyment, is explained by the 2 predictors that we have in our model. Note that our previous Adjusted R-squared value with only 1 predictor was .9106, suggesting that when only SharedValues was used, the model did not explain quite as much (only 91.06%) variance or Enjoyment. So, our Adjusted R-square increased a bit with the introduction of a second predictor.

This increase in Adjusted R-squared tells us that our newly introduced predictor variable, Exclusion, may be adding some variance in explaining our outcome (though it may not be a whole lot). If our newly added variable was not statistically significant, or there was not an increase in Adjusted R-Squared, it could be advisable to exclude that variable from our model, or to test a different model.

We can once again use our regression equation to predict expected Enjoyment for individuals based on this model. We would once again plug in the values to our regression equation:

Y = b0 + b1X1 + b2X2 Y = 26.9096 + (5.8610 * X1) + (-0.1784 * X2)

What would be the expected Enjoyment for someone with average SharedValues and average Exclusion? To answer this question, let’s first get the mean SharedValues and mean Exclusion:

mean(lab2$SharedValues)

## [1] 3.680672

mean(lab2$Exclusion)

## [1] 48.02521

Then, let’s plug those means for SharedValues and Exclusion into the equation, to get the expected Enjoyment for people who experienced average exclusion and average SharedValues:

y5<- 26.9096 + (5.8610 * 3.680672) + (-0.1784 * 48.02521)
y5

## [1] 39.91432

Here, we have the average enjoyment for people with average experiences of exclusion and average perceptions of shared values.

MODEL COMPARISONS With each regression we run - we’re testing a model. Thus far, we’ve run 2 regressions - one where SharedValues predicts Enjoyment, and another where both SharedValues and Exclusion predicts Enjoyment. We saw that the Adjusted R-square increases a bit when we add the second predictor, but not by much. The question is: is it enough?

There are two important things for us to understand: Accuracy and Parsimony. We want models that are ACCURATE - models that accurately fit the data. Remember the regression graph we had, where the regression line tries to run through the data? We want the line to accurately cut through the data and hew closely to the data.

However, we also want a model that is SIMPLE, and avoids needless complexity. We call this parsimony. Parsimony reflects the extent to which a model avoids unnecesary components.

For instance, if we wanted the most accurate model possible, we could continuously throw predictors into a regression. And, even if these predictors are not very good, we might still see slight improvements in our R-squared value. But, this isn’t efficient, and it makes models needlessly complicated. Instead, we want to use a model that is accurate, but as simple as we can get it.

Long story short: we want a model that’s accurate, but simple.

How do we get this? Each time we add a predictor to a regression, we test it to determine if this added predictor is useful enough to warrant the added complexity. Imagine that are spending capital to add predictors to our regression models. Thus, whenever we add an additional predictor to the model, we want to ask: is this predictor worth it?

How do we do this in R? We use a function called modelCompare. Basically, we reference the two regressions we already ran before - called fit1 and fit2 - and we compare them.

anova(fit1, fit2)

## Analysis of Variance Table
## 
## Model 1: Enjoyment ~ SharedValues
## Model 2: Enjoyment ~ SharedValues + Exclusion
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    117 3120.5                                  
## 2    116 2794.7  1    325.82 13.524 0.0003584 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

This function tested whether the models with and without Exclusion (our 2nd predictor, present in fit2, but not in fit1) were significantly different. The results suggest that, based upon the p-value less than .05, they are indeed different. So what do we do? We use the more complex model, because it is significantly better than the other model. So, we feel justified in using 2 predictors over just 1 predictor.

What about a third predictor in our model? Let’s look at what would happen if we try to use Extroversion as a third predictor:

fit3<- lm(Enjoyment ~ SharedValues + Exclusion + Extroversion, data=lab2)
summary(fit3)

## 
## Call:
## lm(formula = Enjoyment ~ SharedValues + Exclusion + Extroversion, 
##     data = lab2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.1202  -2.9203   0.0703   2.8100  14.7207 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  26.97497    5.71904   4.717 6.79e-06 ***
## SharedValues  5.87463    0.92758   6.333 4.81e-09 ***
## Exclusion    -0.17782    0.04882  -3.642 0.000407 ***
## Extroversion -0.03559    0.21966  -0.162 0.871577    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.929 on 115 degrees of freedom
## Multiple R-squared:  0.9206, Adjusted R-squared:  0.9185 
## F-statistic: 444.5 on 3 and 115 DF,  p-value: < 2.2e-16

Here, we see Extroversion is NOT a significant predictor when controlling for the other 2 predictors, SharedValues and Exclusion. Let’s test the model we had before, fit2, with this new model, fit3:

anova(fit2, fit3)

## Analysis of Variance Table
## 
## Model 1: Enjoyment ~ SharedValues + Exclusion
## Model 2: Enjoyment ~ SharedValues + Exclusion + Extroversion
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    116 2794.7                           
## 2    115 2794.0  1   0.63776 0.0262 0.8716

Here, we see the p-value is above .05. This would suggest to us that the model that added Extroversion, fit3, was not significantly better than fit2. So, we would, in this case, stick with the simpler model - fit2.

PREDICTION Recall that we said that regression is a prediction model, and we could use our regression equation to predict an outcome of Y based upon our equation.

We could even use the regression equations we developed on the dataset thus far and apply it to a brand new dataset as well, predicting the outcome.

First, download the lab2practice.csv data file, and save it - again, either to the virtual desktop, or to your own computer - preferably in the same place the other data file was saved. Then, load the data in (top right, Import dataset, browse to the file).

lab2practice <- read.csv("~/PSYC 420 DATA/Lab 2 and 3/lab2practice-1.csv")

Let’s take a look at it.

lab2practice

##     ID SharedValues Exclusion Type
## 1  200            1       120    B
## 2  201            7        59    B
## 3  202            2        28    B
## 4  203            3        84    L
## 5  204            6        12    L
## 6  205            5        66    B
## 7  206            5        19    L
## 8  207            7         3    L
## 9  208            2        54    B
## 10 209            4        29    L
## 11 210            2        33    B

Notice this data file looks similar to the last one, but: 1. Type is once again a numeric variable, not a factor, and 2. It’s missing an Enjoyment variable

Let’s first convert Type to a numeric variable once again:

lab2practice$TypeR<- as.factor(lab2practice$Type)

Now, let’s use the regression equation we previously developed on the other dataset to this one to CREATE a column that represents the expected Enjoyment for these new cases.

How do we do this? Let’s take a look again at the regression equation used to get the fit4 model.

summary(fit2)

## 
## Call:
## lm(formula = Enjoyment ~ SharedValues + Exclusion, data = lab2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -12.1136  -2.8812   0.1041   2.8870  14.7768 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   26.9096     5.6808   4.737 6.20e-06 ***
## SharedValues   5.8610     0.9199   6.372 3.92e-09 ***
## Exclusion     -0.1784     0.0485  -3.678 0.000358 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.908 on 116 degrees of freedom
## Multiple R-squared:  0.9206, Adjusted R-squared:  0.9192 
## F-statistic: 672.3 on 2 and 116 DF,  p-value: < 2.2e-16

Let’s substitute in the intercept and slopes for the variables we used into the multiple regression equation:

Y = b0 + b1X1 + b2X2 Y = 26.9096 + (5.8610 * X1) + (-0.1784 * X2)

Now, we use our equation to calculate what the expected Enjoyment for these practice cases would be. We can even add a column to the dataset that populates an Enjoyment score for each case using this equation, using the same procedures as we’ve done before (referencing the datafile name, lab2practice, followed by a dollar sign, $, and then the name of the variable we want to create).

lab2practice$Pred_Enjoyment <- 26.9096 + (5.8610 * lab2practice$SharedValues) + (-0.1784  * lab2practice$Exclusion)

Let’s again take a look at the dataset, which should have our newly created Enjoyment column:

lab2practice

##     ID SharedValues Exclusion Type TypeR Pred_Enjoyment
## 1  200            1       120    B     B        11.3626
## 2  201            7        59    B     B        57.4110
## 3  202            2        28    B     B        33.6364
## 4  203            3        84    L     L        29.5070
## 5  204            6        12    L     L        59.9348
## 6  205            5        66    B     B        44.4402
## 7  206            5        19    L     L        52.8250
## 8  207            7         3    L     L        67.4014
## 9  208            2        54    B     B        28.9980
## 10 209            4        29    L     L        45.1800
## 11 210            2        33    B     B        32.7444

Which 2 individuals had the most enjoyable parties pre-pandemic? What are their IDs? #Numer 5 (ID 204) & 8 (ID 207)