Portfolio Assignment I

Meditation Dataset Analysis


Goals: - Describe the dataset - Analyze the relationship between meditation and gene expression - Use plots to illustrate

Dataset Description: This dataset is extracted from an experiment exploring the effect of meditation on specific gene expression. It has been hypothesized that mindulness meditation could alter biochemical processes within the body. The goal of this experiment was to measure the validity of this statement.

Exploring the Dataset

There are multiple methods to use when exploring datasets in R. But before that, we must set the environment by installing the necessary packages. One of which, is the tidyverse package, it is one of the main packages that make working with data easier and more intuitive.

To initiate tidyverse and load the dataset we will execute

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.5     v dplyr   1.0.7
## v tidyr   1.1.4     v stringr 1.4.0
## v readr   2.0.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
Expression <- read.delim("C:/Users/Glow9/DSDocuments/Misk-DSI-2021-01/Learn_R/data/Expression.txt")

One method to gain an overview of the dataset, is the glimpse function. To execute the glimpse() function

glimpse(Expression)
## Rows: 21
## Columns: 24
## $ Control_RIPK2_1    <dbl> 0.70, 0.39, 1.22, 1.78, 0.59, 1.47, 1.66, 0.76, 1.4~
## $ Control_RIPK2_2    <dbl> 1.43, 1.82, 0.59, 0.47, 1.03, 1.60, 1.48, 1.67, 0.7~
## $ Meditation_RIPK2_1 <dbl> 1.05, 1.69, 1.22, 1.59, 1.20, 1.99, 0.95, 0.53, 1.3~
## $ Meditation_RIPK2_2 <dbl> 0.90, 1.02, 0.41, 0.88, 1.67, 1.82, 1.03, 1.20, 0.8~
## $ Control_COX2_1     <dbl> -0.08, 2.25, 1.10, 2.71, 1.95, 0.81, -0.17, 1.35, 1~
## $ Control_COX2_2     <dbl> 1.62, 1.01, 1.21, 1.68, 0.45, -0.04, 0.95, 0.80, 0.~
## $ Meditation_COX2_1  <dbl> 4.07, 2.33, 0.97, 0.27, 1.58, 1.32, 0.50, -0.46, 1.~
## $ Meditation_COX2_2  <dbl> -0.65, 0.12, 1.21, -0.02, 0.87, 0.49, 0.91, 0.62, 1~
## $ Control_CCR7_1     <dbl> 2.89, 1.44, 1.64, 1.91, 1.65, 1.05, 1.24, 0.73, 1.1~
## $ Control_CCR7_2     <dbl> 1.50, 0.48, 1.88, 1.86, 0.91, 1.49, 1.34, 1.10, 1.5~
## $ Meditation_CCR7_1  <dbl> 1.50, 0.88, 0.71, 0.86, 1.12, 1.49, 1.94, 1.11, 0.5~
## $ Meditation_CCR7_2  <dbl> 0.64, 0.13, 1.75, 0.26, 1.99, -0.02, 1.50, 0.28, 0.~
## $ Control_CXCR1_1    <dbl> 1.87, 0.27, 1.45, 2.11, 1.31, 1.35, 1.32, -0.11, 0.~
## $ Control_CXCR1_2    <dbl> 1.30, 1.16, 1.40, 1.24, 1.67, 2.75, 1.33, 0.80, 1.0~
## $ Meditation_CXCR1_1 <dbl> 1.01, 0.08, 1.25, 1.87, 1.78, 1.17, 1.14, 1.49, 0.2~
## $ Meditation_CXCR1_2 <dbl> 1.54, 2.21, 2.05, 1.15, 0.63, 0.62, 0.49, 1.98, 1.5~
## $ Control_IL.6_1     <dbl> -0.20, 1.39, 0.15, 1.48, 0.70, 1.07, 2.47, 0.40, 2.~
## $ Control_IL.6_2     <dbl> 2.44, 0.87, 0.31, 1.75, 1.07, 0.10, 2.80, 1.64, 0.2~
## $ Meditation_IL.6_1  <dbl> 1.41, 2.73, 0.39, 1.08, 0.47, 0.49, 2.36, 2.47, 1.9~
## $ Meditation_IL.6_2  <dbl> 1.79, 0.91, 0.78, 0.01, 0.67, 0.16, 1.84, 1.33, 0.6~
## $ Control_TNF_1      <dbl> 1.66, 1.83, 1.18, 0.70, 0.65, 0.76, 0.97, 1.08, 1.1~
## $ Control_TNF_2      <dbl> 1.27, 1.17, 1.84, 1.07, 1.26, 2.12, 1.38, 1.25, 0.5~
## $ Meditation_TNF_1   <dbl> 1.14, 1.04, 0.55, 1.06, 1.22, 0.20, 0.75, 1.14, 0.6~
## $ Meditation_TNF_2   <dbl> 1.06, 0.50, 1.08, 1.57, 0.73, 1.73, 2.42, 1.89, -0.~

this gives us an idea of the dataset composition.

We can see that the dataset contains 21 rows and 24 columns, and each column denotes a specific gene divided into an intervention and a control group. We also see that the type of variables is “dbl”, or “double”, meaning that the data type is numeric. In general computing language, numerical vectors are referred to as double.

Another method to explore the data is to compute some summary statistics using the summary() function

summary(Expression)
##  Control_RIPK2_1  Control_RIPK2_2 Meditation_RIPK2_1 Meditation_RIPK2_2
##  Min.   :-0.030   Min.   :0.080   Min.   :-0.270     Min.   :0.1500    
##  1st Qu.: 0.670   1st Qu.:0.760   1st Qu.: 0.975     1st Qu.:0.8650    
##  Median : 1.110   Median :1.190   Median : 1.265     Median :0.9700    
##  Mean   : 1.064   Mean   :1.059   Mean   : 1.274     Mean   :0.9661    
##  3rd Qu.: 1.470   3rd Qu.:1.320   3rd Qu.: 1.685     3rd Qu.:1.1775    
##  Max.   : 1.890   Max.   :1.820   Max.   : 2.340     Max.   :1.8200    
##                                   NA's   :3          NA's   :3         
##  Control_COX2_1   Control_COX2_2    Meditation_COX2_1 Meditation_COX2_2
##  Min.   :-0.250   Min.   :-0.0400   Min.   :-0.460    Min.   :-0.6500  
##  1st Qu.: 0.810   1st Qu.: 0.6500   1st Qu.: 0.735    1st Qu.: 0.2750  
##  Median : 1.340   Median : 0.9500   Median : 1.580    Median : 0.4900  
##  Mean   : 1.331   Mean   : 0.9624   Mean   : 1.502    Mean   : 0.6137  
##  3rd Qu.: 1.950   3rd Qu.: 1.3500   3rd Qu.: 2.230    3rd Qu.: 0.9200  
##  Max.   : 2.970   Max.   : 1.8600   Max.   : 4.070    Max.   : 1.6800  
##                                     NA's   :2         NA's   :2        
##  Control_CCR7_1  Control_CCR7_2  Meditation_CCR7_1 Meditation_CCR7_2
##  Min.   :0.340   Min.   :0.210   Min.   :0.260     Min.   :-0.020   
##  1st Qu.:1.050   1st Qu.:0.710   1st Qu.:0.750     1st Qu.: 0.690   
##  Median :1.350   Median :1.100   Median :1.110     Median : 0.850   
##  Mean   :1.406   Mean   :1.123   Mean   :1.139     Mean   : 1.044   
##  3rd Qu.:1.820   3rd Qu.:1.500   3rd Qu.:1.495     3rd Qu.: 1.595   
##  Max.   :2.890   Max.   :1.880   Max.   :2.280     Max.   : 1.990   
##                                  NA's   :2         NA's   :2        
##  Control_CXCR1_1  Control_CXCR1_2  Meditation_CXCR1_1 Meditation_CXCR1_2
##  Min.   :-0.110   Min.   :-0.040   Min.   :-0.250     Min.   :0.310     
##  1st Qu.: 0.690   1st Qu.: 0.925   1st Qu.: 0.665     1st Qu.:0.770     
##  Median : 1.070   Median : 1.230   Median : 1.160     Median :1.540     
##  Mean   : 1.139   Mean   : 1.171   Mean   : 1.117     Mean   :1.311     
##  3rd Qu.: 1.400   3rd Qu.: 1.335   3rd Qu.: 1.635     3rd Qu.:1.760     
##  Max.   : 2.490   Max.   : 2.750   Max.   : 2.150     Max.   :2.210     
##  NA's   :2        NA's   :2        NA's   :2          NA's   :2         
##  Control_IL.6_1    Control_IL.6_2   Meditation_IL.6_1 Meditation_IL.6_2
##  Min.   :-0.9800   Min.   :-0.130   Min.   :-0.770    Min.   :0.010    
##  1st Qu.: 0.4000   1st Qu.: 0.320   1st Qu.: 0.480    1st Qu.:0.325    
##  Median : 0.7000   Median : 0.910   Median : 1.380    Median :0.670    
##  Mean   : 0.8886   Mean   : 1.075   Mean   : 1.211    Mean   :0.870    
##  3rd Qu.: 1.4800   3rd Qu.: 1.640   3rd Qu.: 1.950    3rd Qu.:1.440    
##  Max.   : 2.4700   Max.   : 2.800   Max.   : 2.730    Max.   :2.060    
##                                     NA's   :2         NA's   :2        
##  Control_TNF_1   Control_TNF_2   Meditation_TNF_1 Meditation_TNF_2 
##  Min.   :0.430   Min.   :0.230   Min.   :0.2000   Min.   :-0.4900  
##  1st Qu.:0.760   1st Qu.:1.070   1st Qu.:0.6100   1st Qu.: 0.5900  
##  Median :1.010   Median :1.270   Median :1.0400   Median : 0.7900  
##  Mean   :1.122   Mean   :1.303   Mean   :0.9247   Mean   : 0.9742  
##  3rd Qu.:1.500   3rd Qu.:1.570   3rd Qu.:1.1800   3rd Qu.: 1.4200  
##  Max.   :2.320   Max.   :2.120   Max.   :1.8000   Max.   : 2.4200  
##                                  NA's   :2        NA's   :2

It created a summary of estimates including the mean, median, IQR for each column.

Note this dataset does not follow the structure of tidy data, where each column has one variable, each row has one observation, and the dataframe has one observational unit.

In order to transform the data from the untidy wide form to the tidy long form, we will use the tidyr package, function pivot_longer:

Expression %>%
  pivot_longer(everything()) -> expression_tidy

View(expression_tidy)

The view function from the utils package opens the dataframe in a new tap so we get a visual of the data to see the reflected changes.

Now that we have more manageable long data, we can further tidy it by grouping the variables in a way that makes working with the 504 rows we got from pivoting the data longer. To begin, we will create two new columns to add the treatment group and its corresponding values.

expression_tidy$Control_before <- "."
expression_tidy$Control_after <- "."
expression_tidy$Meditation_before <- "."
expression_tidy$Meditation_after <- "."
expression_tidy$Change <- "."

View(expression_tidy)

Now we need to fill the columns with values.