Reading data of people with Schizophrenia/ Schizoaffective Disorder

  1. Use the summary function to gain an overview of the data set.
# Load required library
library(readr)

# Load the dataset from the URL without specifying col_types
url <- "https://vincentarelbundock.github.io/Rdatasets/csv/heplots/NeuroCog.csv"
neuro_cog <- read_csv(url)
## New names:
## Rows: 242 Columns: 11
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (2): Dx, Sex dbl (9): ...1, Speed, Attention, Memory, Verbal, Visual, ProbSolv,
## SocialCog...
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...1`
#Use the summary function to gain an overview of the data set
summary(neuro_cog)
##       ...1             Dx                Speed         Attention    
##  Min.   : 14.00   Length:242         Min.   :-1.00   Min.   : 1.00  
##  1st Qu.: 81.25   Class :character   1st Qu.:33.00   1st Qu.:32.00  
##  Median :142.50   Mode  :character   Median :41.00   Median :41.00  
##  Mean   :140.42                      Mean   :40.31   Mean   :39.75  
##  3rd Qu.:202.75                      3rd Qu.:48.75   3rd Qu.:50.00  
##  Max.   :263.00                      Max.   :74.00   Max.   :67.00  
##      Memory          Verbal          Visual         ProbSolv    
##  Min.   : 3.00   Min.   :20.00   Min.   :12.00   Min.   :29.00  
##  1st Qu.:36.00   1st Qu.:33.00   1st Qu.:29.00   1st Qu.:39.00  
##  Median :44.00   Median :40.00   Median :38.00   Median :45.00  
##  Mean   :42.44   Mean   :41.37   Mean   :37.12   Mean   :45.83  
##  3rd Qu.:51.00   3rd Qu.:48.00   3rd Qu.:44.00   3rd Qu.:52.00  
##  Max.   :71.00   Max.   :78.00   Max.   :65.00   Max.   :65.00  
##    SocialCog          Age            Sex           
##  Min.   :10.00   Min.   :18.00   Length:242        
##  1st Qu.:34.00   1st Qu.:32.00   Class :character  
##  Median :44.00   Median :40.00   Mode  :character  
##  Mean   :43.93   Mean   :40.89                     
##  3rd Qu.:53.00   3rd Qu.:50.00                     
##  Max.   :72.00   Max.   :66.00
  1. (Continued) Then display the mean and median for at least two attributes of your data. Using age and memory.
# Calculate the mean and median for the "Age" attribute
mean_age <- mean(neuro_cog$Age)
median_age <- median(neuro_cog$Age)

# Calculate the mean and median for the "Memory" attribute
mean_memory <- mean(neuro_cog$Memory)
median_memory <- median(neuro_cog$Memory)

# Display the mean and median for the selected attributes
cat("Mean Age:", mean_age, "\n")
## Mean Age: 40.89256
cat("Median Age:", median_age, "\n")
## Median Age: 40
cat("Mean Memory:", mean_memory, "\n")
## Mean Memory: 42.44215
cat("Median Memory:", median_memory, "\n")
## Median Memory: 44
  1. Create a new data frame with a subset of the columns AND rows. Make sure to rename the new data set so it simply just doesn’t write it over. Creating a subset data, based on Memory scores greater than the median memory score of 41.
# Select Columns
selected_columns <- c("Age", "Memory", "Dx")

# Create the New Data Frame with Subset of Rows and Columns using subset()
new_data_subset <- subset(neuro_cog, select = selected_columns, Memory > 41)

# Display enough rows to see examples of Step 2
head(new_data_subset, 10)
  1. Create new column names for each column in the new data frame created in step 2.
# Create new column names
new_column_names <- c("ParticipantAge", "MemoryScore", "GroupType")

# Assign the new column names to the data frame
colnames(new_data_subset) <- new_column_names

# Display 5 rows to see examples of Step 3
head(new_data_subset, 5)
  1. Use the summary function to create an overview of your new data frame created in step 2. The print the mean and median for the same two attributes.Please compare (i.e. tell me how the values changed and why).

Comparison of Original Data and New Subset Data: We compared the original data to a new subset focusing on individuals with Schizophrenia/Schizoaffective disorder and the control group. The subset included only those with memory scores greater than 41. As a result, the mean and median memory scores in the subset data are higher, indicating better memory performance. The mean and median age in the subset data are slightly lower, possibly due to chance or the memory score distribution within specific age groups.

#Use the summary function to create an overview of the new data frame
summary(new_data_subset)
##  ParticipantAge   MemoryScore     GroupType        
##  Min.   :19.00   Min.   :42.00   Length:145        
##  1st Qu.:31.00   1st Qu.:45.00   Class :character  
##  Median :38.00   Median :49.00   Mode  :character  
##  Mean   :39.62   Mean   :50.43                     
##  3rd Qu.:48.00   3rd Qu.:54.00                     
##  Max.   :64.00   Max.   :71.00
#Calculate Mean and Median for Selected Attributes in the new data frame
mean_participant_age <- mean(new_data_subset$ParticipantAge)
median_participant_age <- median(new_data_subset$ParticipantAge)

mean_memory_score <- mean(new_data_subset$MemoryScore)
median_memory_score <- median(new_data_subset$MemoryScore)

#Print Mean and Median for Selected Attributes
cat("Mean Participant Age (New Data Frame):", mean_participant_age, "\n")
## Mean Participant Age (New Data Frame): 39.62069
cat("Median Participant Age (New Data Frame):", median_participant_age, "\n")
## Median Participant Age (New Data Frame): 38
cat("Mean Memory Score (New Data Frame):", mean_memory_score, "\n")
## Mean Memory Score (New Data Frame): 50.43448
cat("Median Memory Score (New Data Frame):", median_memory_score, "\n")
## Median Memory Score (New Data Frame): 49
  1. For at least 3 different/distinct values in a column please rename so that every value in that column is renamed. Changing “schizophrenia to”SZ”, “schizoaffective” to “SA” and “control” to “CTL”.
#Rename values in the "GroupType" column
new_data_subset$GroupType <- gsub("Schizophrenia", "SZ", new_data_subset$GroupType)
new_data_subset$GroupType <- gsub("Schizoaffective", "SA", new_data_subset$GroupType)
new_data_subset$GroupType <- gsub("Control", "CTL", new_data_subset$GroupType)

#Display the updated data frame
new_data_subset
  1. Display enough rows to see examples of all of steps 1-5 above. This means use a function to show me enough row values that I can see the changes. This has been implemented throughout the code. Could not be applied to step 1 and 4 b/c those required finding the summary.