library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.1.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.1.2
## Warning: package 'stringr' was built under R version 4.1.2
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(dplyr)
library(plotly)
## Warning: package 'plotly' was built under R version 4.1.3
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(tidyr)
library(stringr)
library(psych)
## Warning: package 'psych' was built under R version 4.1.2
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(ggplot2)
metadata_df <- read.delim("https://raw.githubusercontent.com/rfpoulos/pymaceuticals/master/data/Mouse_metadata.csv", header=T, sep=",")
head(metadata_df)
## Mouse.ID Drug.Regimen Sex Age_months Weight..g.
## 1 k403 Ramicane Male 21 16
## 2 s185 Capomulin Female 3 17
## 3 x401 Capomulin Female 16 15
## 4 m601 Capomulin Male 22 17
## 5 g791 Ramicane Male 11 16
## 6 s508 Ramicane Male 1 17
df <- metadata_df %>%
group_by(Drug.Regimen)
head(df)
## # A tibble: 6 x 5
## # Groups: Drug.Regimen [2]
## Mouse.ID Drug.Regimen Sex Age_months Weight..g.
## <chr> <chr> <chr> <int> <int>
## 1 k403 Ramicane Male 21 16
## 2 s185 Capomulin Female 3 17
## 3 x401 Capomulin Female 16 15
## 4 m601 Capomulin Male 22 17
## 5 g791 Ramicane Male 11 16
## 6 s508 Ramicane Male 1 17
results_df <- read.delim("https://raw.githubusercontent.com/rfpoulos/pymaceuticals/master/data/Study_results.csv", header=T, sep=",")
head(results_df)
## Mouse.ID Timepoint Tumor.Volume..mm3. Metastatic.Sites
## 1 b128 0 45 0
## 2 f932 0 45 0
## 3 g107 0 45 0
## 4 a457 0 45 0
## 5 c819 0 45 0
## 6 h246 0 45 0
Pymaceuticals Inc., a fictional burgeoning pharmaceutical company based out of San Diego, CA, specializes in drug-based, anti-cancer pharmaceuticals.They have provided the data to test the efficacy of potential drug treatments for squamous cell carcinoma. In this study, 249 mice identified with Squamous cell carcinoma (SCC) tumor growth, kind of skin cancer, were treated through a variety of drug regimens. Over the course of 45 days, tumor development was observed and measured.The objective is to analyze the data to show how four treatments (Capomulin, Infubinol, Ketapril, and Placebo) compare.
You should phrase your research question in a way that matches up with the scope of inference your dataset allows for.
Question 1: Is Capomulin more or less effective in reducing the tumor size than Infubinol, or Ketapril drugs categories?
Question 2: Is there a correlation between the age, weight and the tumor size growth for each drug category?
Null Hypothesis: There is no difference between the mean percent change in tumor volume for the four drug categories.
Alternate Hypothesis: There is a difference between the mean percent change in tumor volume for the four drug categories.
Approach for answering the research question will be:
1- Calculate the mean percent change in tumor volume for the four drug categories.
2- Perform the Hypothesis test to find out whether or not the difference exist between the mean tumor size for all four drug categories.
3- Perform linear regression to study the correlation between various variables by calculating the correlation coefficient.
4- Finally analyze the results to find out if Capomulin more or less effective in reducing the tumor size of sample mice than Infubinol, or Ketapril drugs categories.
Answer: The metadata_df contain 249 unique mouse id and so are the number of cases that treated with variety of drug regimem .The results_df dataset holds the tumor growth measurments observed for each Mouse ID and carries 1,893 rows results. There are 10 different drug treatments. The total sample size of mouse_id for four treatments (Capomulin, Infubinol, Ketapril, and Placebo) is 100 and the sample size of mouse_id by drug treatments is 25 each.
Answer: Data is collected by the fictitious pharmaceutical company who was testing the efficacy of potential drug treatments for squamous cell carcinoma. I import the data into my .Rmd file from github.
Answer: This is a experimental study.A group of 249 mice were monitored after administration of a variety of drug regimens over a 45-day treatment period. The impact of Capomulin on tumor growth, metastasis and survival rates were monitored, along with Infubinol, Ketapril, and Placebo.
Answer: The citation and data collection links are as follows.
In my search for the experimental datasets, I found the Mouse_metadata and the Study_results on the GitHub link provided below:
https://raw.githubusercontent.com/rfpoulos/pymaceuticals/master/data/Mouse_metadata.csv
https://raw.githubusercontent.com/rfpoulos/pymaceuticals/master/data/Study_results.csv
Upon further research in finding the original source of the the dataset, I found that these datasets are provided by Pymaceuticals Inc., a fictional burgeoning pharmaceutical company based out of San Diego, CA, specializes in drug-based, anti-cancer pharmaceuticals. Below is the link for the original source of the datasets.
https://c-l-nguyen.github.io/web-design-challenge/index.html
Answer: The response variable is the size of tumor, “Tumor.Volume..mm3.” and it holds a numerical data.
What is the explanatory variable, and what type is it (numerical/categorical)?
Answer: The explanatory variable is the “Drug.Regimen” and it holds a categorical data and “Timepoint” which holds numerical data. The ‘Timepoint’ unit is ‘days’.
Provide summary statistics relevant to your research question. For example, if you’re comparing means across groups provide means, SDs, sample sizes of each group. This step requires the use of R, hence a code chunk is provided below. Insert more code chunks as needed.
summary(metadata_df)
## Mouse.ID Drug.Regimen Sex Age_months
## Length:249 Length:249 Length:249 Min. : 1.00
## Class :character Class :character Class :character 1st Qu.: 6.00
## Mode :character Mode :character Mode :character Median :13.00
## Mean :12.73
## 3rd Qu.:19.00
## Max. :24.00
## Weight..g.
## Min. :15.00
## 1st Qu.:25.00
## Median :27.00
## Mean :26.12
## 3rd Qu.:29.00
## Max. :30.00
summary(results_df)
## Mouse.ID Timepoint Tumor.Volume..mm3. Metastatic.Sites
## Length:1893 Min. : 0.00 Min. :22.05 Min. :0.000
## Class :character 1st Qu.: 5.00 1st Qu.:45.00 1st Qu.:0.000
## Mode :character Median :20.00 Median :48.95 Median :1.000
## Mean :19.57 Mean :50.45 Mean :1.022
## 3rd Qu.:30.00 3rd Qu.:56.29 3rd Qu.:2.000
## Max. :45.00 Max. :78.57 Max. :4.000
nrow(metadata_df)
## [1] 249
nrow(results_df)
## [1] 1893
drug_count <- unique(metadata_df$Drug.Regimen)
drug_count
## [1] "Ramicane" "Capomulin" "Infubinol" "Placebo" "Ceftamin" "Stelasyn"
## [7] "Zoniferol" "Ketapril" "Propriva" "Naftisol"
length(drug_count)
## [1] 10
capomulin_df <- filter(metadata_df, Drug.Regimen=="Capomulin")
head(capomulin_df)
## Mouse.ID Drug.Regimen Sex Age_months Weight..g.
## 1 s185 Capomulin Female 3 17
## 2 x401 Capomulin Female 16 15
## 3 m601 Capomulin Male 22 17
## 4 f966 Capomulin Male 16 17
## 5 u364 Capomulin Male 18 17
## 6 y793 Capomulin Male 17 17
nrow(capomulin_df)
## [1] 25
infubinol_df <- filter(metadata_df, Drug.Regimen=="Infubinol")
nrow(infubinol_df)
## [1] 25
ketapril_df <- filter(metadata_df, Drug.Regimen=="Ketapril")
nrow(ketapril_df)
## [1] 25
placebo_df <- filter(metadata_df, Drug.Regimen=="Placebo")
nrow(placebo_df)
## [1] 25
merge_df <- merge(x = metadata_df, y = results_df, all = TRUE)
head(merge_df)
## Mouse.ID Drug.Regimen Sex Age_months Weight..g. Timepoint
## 1 a203 Infubinol Female 20 23 20
## 2 a203 Infubinol Female 20 23 25
## 3 a203 Infubinol Female 20 23 15
## 4 a203 Infubinol Female 20 23 10
## 5 a203 Infubinol Female 20 23 35
## 6 a203 Infubinol Female 20 23 0
## Tumor.Volume..mm3. Metastatic.Sites
## 1 55.17334 1
## 2 56.79321 1
## 3 52.77787 1
## 4 51.85244 1
## 5 61.93165 2
## 6 45.00000 0
glimpse(merge_df)
## Rows: 1,893
## Columns: 8
## $ Mouse.ID <chr> "a203", "a203", "a203", "a203", "a203", "a203", "a2~
## $ Drug.Regimen <chr> "Infubinol", "Infubinol", "Infubinol", "Infubinol",~
## $ Sex <chr> "Female", "Female", "Female", "Female", "Female", "~
## $ Age_months <int> 20, 20, 20, 20, 20, 20, 20, 20, 20, 20, 21, 21, 21,~
## $ Weight..g. <int> 23, 23, 23, 23, 23, 23, 23, 23, 23, 23, 25, 25, 25,~
## $ Timepoint <int> 20, 25, 15, 10, 35, 0, 30, 5, 45, 40, 5, 40, 35, 45~
## $ Tumor.Volume..mm3. <dbl> 55.17334, 56.79321, 52.77787, 51.85244, 61.93165, 4~
## $ Metastatic.Sites <int> 1, 1, 1, 1, 2, 0, 1, 0, 2, 2, 0, 1, 1, 1, 1, 1, 1, ~
merge_df <- merge_df %>% drop_na()
head(merge_df)
## Mouse.ID Drug.Regimen Sex Age_months Weight..g. Timepoint
## 1 a203 Infubinol Female 20 23 20
## 2 a203 Infubinol Female 20 23 25
## 3 a203 Infubinol Female 20 23 15
## 4 a203 Infubinol Female 20 23 10
## 5 a203 Infubinol Female 20 23 35
## 6 a203 Infubinol Female 20 23 0
## Tumor.Volume..mm3. Metastatic.Sites
## 1 55.17334 1
## 2 56.79321 1
## 3 52.77787 1
## 4 51.85244 1
## 5 61.93165 2
## 6 45.00000 0
colnames(merge_df)[1] <- c("Mouse_Id")
colnames(merge_df)[2] <- c("Drug_Regimen")
colnames(merge_df)[5] <- c("Weight_g")
colnames(merge_df)[7] <- c("Tumor_Volume_mm3")
colnames(merge_df)[8] <- c("Metastatic_Sites")
head(merge_df)
## Mouse_Id Drug_Regimen Sex Age_months Weight_g Timepoint Tumor_Volume_mm3
## 1 a203 Infubinol Female 20 23 20 55.17334
## 2 a203 Infubinol Female 20 23 25 56.79321
## 3 a203 Infubinol Female 20 23 15 52.77787
## 4 a203 Infubinol Female 20 23 10 51.85244
## 5 a203 Infubinol Female 20 23 35 61.93165
## 6 a203 Infubinol Female 20 23 0 45.00000
## Metastatic_Sites
## 1 1
## 2 1
## 3 1
## 4 1
## 5 2
## 6 0
merge_df %>% group_by(Mouse_Id, Timepoint)
## # A tibble: 1,893 x 8
## # Groups: Mouse_Id, Timepoint [1,888]
## Mouse_Id Drug_Regimen Sex Age_months Weight_g Timepoint Tumor_Volume_mm3
## <chr> <chr> <chr> <int> <int> <int> <dbl>
## 1 a203 Infubinol Female 20 23 20 55.2
## 2 a203 Infubinol Female 20 23 25 56.8
## 3 a203 Infubinol Female 20 23 15 52.8
## 4 a203 Infubinol Female 20 23 10 51.9
## 5 a203 Infubinol Female 20 23 35 61.9
## 6 a203 Infubinol Female 20 23 0 45
## 7 a203 Infubinol Female 20 23 30 59.5
## 8 a203 Infubinol Female 20 23 5 48.5
## 9 a203 Infubinol Female 20 23 45 68.0
## 10 a203 Infubinol Female 20 23 40 63.6
## # ... with 1,883 more rows, and 1 more variable: Metastatic_Sites <int>
head(merge_df)
## Mouse_Id Drug_Regimen Sex Age_months Weight_g Timepoint Tumor_Volume_mm3
## 1 a203 Infubinol Female 20 23 20 55.17334
## 2 a203 Infubinol Female 20 23 25 56.79321
## 3 a203 Infubinol Female 20 23 15 52.77787
## 4 a203 Infubinol Female 20 23 10 51.85244
## 5 a203 Infubinol Female 20 23 35 61.93165
## 6 a203 Infubinol Female 20 23 0 45.00000
## Metastatic_Sites
## 1 1
## 2 1
## 3 1
## 4 1
## 5 2
## 6 0
df1 <- select(merge_df, Drug_Regimen, Tumor_Volume_mm3, Age_months, Weight_g)
head(df1)
## Drug_Regimen Tumor_Volume_mm3 Age_months Weight_g
## 1 Infubinol 55.17334 20 23
## 2 Infubinol 56.79321 20 23
## 3 Infubinol 52.77787 20 23
## 4 Infubinol 51.85244 20 23
## 5 Infubinol 61.93165 20 23
## 6 Infubinol 45.00000 20 23
df1 <- group_by(df1, Drug_Regimen)
head(df1)
## # A tibble: 6 x 4
## # Groups: Drug_Regimen [1]
## Drug_Regimen Tumor_Volume_mm3 Age_months Weight_g
## <chr> <dbl> <int> <int>
## 1 Infubinol 55.2 20 23
## 2 Infubinol 56.8 20 23
## 3 Infubinol 52.8 20 23
## 4 Infubinol 51.9 20 23
## 5 Infubinol 61.9 20 23
## 6 Infubinol 45 20 23
stats_df <- df1 %>% summarise(
Tumor_Volume_mean = mean(Tumor_Volume_mm3), Tumor_Volume_median = median(Tumor_Volume_mm3), Tumor_Volume_sd = sd(Tumor_Volume_mm3), Tumor_Volume_se = sd(Tumor_Volume_mm3)/sqrt(length((Tumor_Volume_mm3))))
head(stats_df)
## # A tibble: 6 x 5
## Drug_Regimen Tumor_Volume_me~ Tumor_Volume_me~ Tumor_Volume_sd Tumor_Volume_se
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Capomulin 40.7 41.6 4.99 0.329
## 2 Ceftamin 52.6 51.8 6.27 0.470
## 3 Infubinol 52.9 51.8 6.57 0.492
## 4 Ketapril 55.2 53.7 8.28 0.604
## 5 Naftisol 54.3 52.5 8.13 0.596
## 6 Placebo 54.0 52.3 7.82 0.581
library(ggplot2)
# plot mean salaries
ggplot(stats_df,
aes(x = Drug_Regimen,
y = Tumor_Volume_mean)) +
geom_bar(stat = "identity", fill = "cornflowerblue")
ggplot(merge_df,
aes(x = Drug_Regimen,
y = Tumor_Volume_mm3)) +
geom_boxplot() +
labs(title = "Mean distribution by Drug_Regimen")
count_df <- df1 %>% count(Drug_Regimen)
count_df
## # A tibble: 10 x 2
## # Groups: Drug_Regimen [10]
## Drug_Regimen n
## <chr> <int>
## 1 Capomulin 230
## 2 Ceftamin 178
## 3 Infubinol 178
## 4 Ketapril 188
## 5 Naftisol 186
## 6 Placebo 181
## 7 Propriva 161
## 8 Ramicane 228
## 9 Stelasyn 181
## 10 Zoniferol 182
barplot(c(230, 178, 178, 188, 186, 181, 161, 228, 181, 182),
names.arg=c("Capomulin","Ceftamin","Infubinol","Ketapril","Naftisol", "Placebo", "Propriva", "Ramicane", "Stelasyn", "Zoniferol"),
ylim=c(0,250),
col=c("beige","orange","lightgreen","lightblue","yellow", "blue", "green", "pink", "purple", "red"),
ylab="Count of Mice per Drug Regimen")
merge_df <- merge_df[!duplicated(merge_df), ]
head(merge_df)
## Mouse_Id Drug_Regimen Sex Age_months Weight_g Timepoint Tumor_Volume_mm3
## 1 a203 Infubinol Female 20 23 20 55.17334
## 2 a203 Infubinol Female 20 23 25 56.79321
## 3 a203 Infubinol Female 20 23 15 52.77787
## 4 a203 Infubinol Female 20 23 10 51.85244
## 5 a203 Infubinol Female 20 23 35 61.93165
## 6 a203 Infubinol Female 20 23 0 45.00000
## Metastatic_Sites
## 1 1
## 2 1
## 3 1
## 4 1
## 5 2
## 6 0
capomulin_df <- filter(merge_df, Drug_Regimen == "Capomulin")
infubinol_df <- filter(merge_df, Drug_Regimen == "Infubinol")
ketapril_df <- filter(merge_df, Drug_Regimen == "Ketapril")
placebo_df <- filter(merge_df, Drug_Regimen == "Placebo")
head(capomulin_df)
## Mouse_Id Drug_Regimen Sex Age_months Weight_g Timepoint Tumor_Volume_mm3
## 1 b128 Capomulin Female 9 22 5 45.65133
## 2 b128 Capomulin Female 9 22 25 43.26214
## 3 b128 Capomulin Female 9 22 35 37.96764
## 4 b128 Capomulin Female 9 22 10 43.27085
## 5 b128 Capomulin Female 9 22 0 45.00000
## 6 b128 Capomulin Female 9 22 40 38.37973
## Metastatic_Sites
## 1 0
## 2 1
## 3 1
## 4 0
## 5 0
## 6 2
capo_df1 <- select(capomulin_df, Mouse_Id, Timepoint, Tumor_Volume_mm3) %>%
group_by(Mouse_Id) %>%
filter(Timepoint == max(Timepoint, na.rm=TRUE))
head(capo_df1)
## # A tibble: 6 x 3
## # Groups: Mouse_Id [6]
## Mouse_Id Timepoint Tumor_Volume_mm3
## <chr> <int> <dbl>
## 1 b128 45 39.0
## 2 b742 45 38.9
## 3 f966 20 30.5
## 4 g288 45 37.1
## 5 g316 45 40.2
## 6 i557 45 47.7
capo_df2 <- select(capomulin_df, Mouse_Id, Weight_g) %>%
group_by(Mouse_Id) %>%
summarise(Average_weight = mean(Weight_g, na.rm=TRUE))
head(capo_df2)
## # A tibble: 6 x 2
## Mouse_Id Average_weight
## <chr> <dbl>
## 1 b128 22
## 2 b742 21
## 3 f966 17
## 4 g288 19
## 5 g316 22
## 6 i557 24
capo_df <- capo_df1 %>% inner_join(capo_df2, by = "Mouse_Id")
head(capo_df)
## # A tibble: 6 x 4
## # Groups: Mouse_Id [6]
## Mouse_Id Timepoint Tumor_Volume_mm3 Average_weight
## <chr> <int> <dbl> <dbl>
## 1 b128 45 39.0 22
## 2 b742 45 38.9 21
## 3 f966 20 30.5 17
## 4 g288 45 37.1 19
## 5 g316 45 40.2 22
## 6 i557 45 47.7 24
capo_df3 <- select(capomulin_df, Mouse_Id, Age_months) %>%
group_by(Mouse_Id) %>%
summarise(Average_age = mean(Age_months, na.rm=TRUE))
head(capo_df3)
## # A tibble: 6 x 2
## Mouse_Id Average_age
## <chr> <dbl>
## 1 b128 9
## 2 b742 7
## 3 f966 16
## 4 g288 3
## 5 g316 22
## 6 i557 1
capo_df <- capo_df %>% inner_join(capo_df3, by = "Mouse_Id")
head(capo_df)
## # A tibble: 6 x 5
## # Groups: Mouse_Id [6]
## Mouse_Id Timepoint Tumor_Volume_mm3 Average_weight Average_age
## <chr> <int> <dbl> <dbl> <dbl>
## 1 b128 45 39.0 22 9
## 2 b742 45 38.9 21 7
## 3 f966 20 30.5 17 16
## 4 g288 45 37.1 19 3
## 5 g316 45 40.2 22 22
## 6 i557 45 47.7 24 1
capo_df$Tumor_Volume_mm3 %>%
summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 23.34 32.38 38.13 36.67 40.16 47.69
capo_df$Tumor_Volume_mm3 %>% sd()
## [1] 5.715188
# Creating the plot
plot(capo_df$Average_age, capo_df$Tumor_Volume_mm3, pch = 19, col = "blue")
# Regression line
abline(lm(capo_df$Tumor_Volume_mm3 ~ capo_df$Average_age), col = "red", lwd = 3)
# Pearson correlation
text(paste("Correlation:", round(cor(capo_df$Average_age, capo_df$Tumor_Volume_mm3), 2)), x = 25, y = 95)
# Creating the plot
plot(capo_df$Average_weight, capo_df$Tumor_Volume_mm3, pch = 19, col = "blue")
# Regression line
abline(lm(capo_df$Tumor_Volume_mm3 ~ capo_df$Average_weight), col = "red", lwd = 3)
# Pearson correlation
text(paste("Correlation:", round(cor(capo_df$Average_weight, capo_df$Tumor_Volume_mm3), 2)), x = 25, y = 95)
pairs(capo_df[,2:5], pch = 19, col = "blue")
infu_df1 <- select(infubinol_df, Mouse_Id, Timepoint, Tumor_Volume_mm3) %>%
group_by(Mouse_Id) %>%
filter(Timepoint == max(Timepoint, na.rm=TRUE))
### Find the average weight by mice_id in Infubinol_df
infu_df2 <- select(infubinol_df, Mouse_Id, Weight_g) %>%
group_by(Mouse_Id) %>%
summarise(Average_weight = mean(Weight_g, na.rm=TRUE))
### Joining the two df's for adding average weight
infu_df <- infu_df1 %>% inner_join(infu_df2, by = "Mouse_Id")
### Find the average age by mice_id in Capomulin_df
infu_df3 <- select(infubinol_df, Mouse_Id, Age_months) %>%
group_by(Mouse_Id) %>%
summarise(Average_age = mean(Age_months, na.rm=TRUE))
### Joining the two df's for adding average age
infu_df <- infu_df %>% inner_join(infu_df3, by = "Mouse_Id")
head(infu_df)
## # A tibble: 6 x 5
## # Groups: Mouse_Id [6]
## Mouse_Id Timepoint Tumor_Volume_mm3 Average_weight Average_age
## <chr> <int> <dbl> <dbl> <dbl>
## 1 a203 45 68.0 23 20
## 2 a251 45 65.5 25 21
## 3 a577 30 57.0 25 6
## 4 a685 45 66.1 30 8
## 5 c139 45 72.2 28 11
## 6 c326 5 36.3 25 18
infu_df$Tumor_Volume_mm3 %>%
summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 36.32 54.05 60.17 58.18 65.53 72.23
infu_df$Tumor_Volume_mm3 %>% sd()
## [1] 8.602957
# Creating the plot
plot(infu_df$Average_age, infu_df$Tumor_Volume_mm3, pch = 19, col = "green")
# Regression line
abline(lm(infu_df$Tumor_Volume_mm3 ~ infu_df$Average_age), col = "red", lwd = 3)
# Pearson correlation
text(paste("Correlation:", round(cor(infu_df$Average_age, infu_df$Tumor_Volume_mm3), 2)), x = 25, y = 95)
# Creating the plot
plot(infu_df$Average_weight, infu_df$Tumor_Volume_mm3, pch = 19, col = "green")
# Regression line
abline(lm(infu_df$Tumor_Volume_mm3 ~ infu_df$Average_weight), col = "red", lwd = 3)
# Pearson correlation
text(paste("Correlation:", round(cor(infu_df$Average_weight, infu_df$Tumor_Volume_mm3), 2)), x = 25, y = 95)
pairs(infu_df[,2:5], pch = 19, col = "green")
keta_df1 <- select(ketapril_df, Mouse_Id, Timepoint, Tumor_Volume_mm3) %>%
group_by(Mouse_Id) %>%
filter(Timepoint == max(Timepoint, na.rm=TRUE))
### Find the average weight by mice_id in Infubinol_df
keta_df2 <- select(ketapril_df, Mouse_Id, Weight_g) %>%
group_by(Mouse_Id) %>%
summarise(Average_weight = mean(Weight_g, na.rm=TRUE))
### Joining the two df's for adding average weight
keta_df <- keta_df1 %>% inner_join(keta_df2, by = "Mouse_Id")
### Find the average age by mice_id in Capomulin_df
keta_df3 <- select(ketapril_df, Mouse_Id, Age_months) %>%
group_by(Mouse_Id) %>%
summarise(Average_age = mean(Age_months, na.rm=TRUE))
### Joining the two df's for adding average age
keta_df <- keta_df %>% inner_join(keta_df3, by = "Mouse_Id")
head(keta_df)
## # A tibble: 6 x 5
## # Groups: Mouse_Id [6]
## Mouse_Id Timepoint Tumor_Volume_mm3 Average_weight Average_age
## <chr> <int> <dbl> <dbl> <dbl>
## 1 a457 10 49.8 30 11
## 2 c580 30 58.0 25 22
## 3 c819 40 62.2 25 21
## 4 c832 45 65.4 29 18
## 5 d474 40 60.2 27 18
## 6 f278 5 48.2 30 12
keta_df$Tumor_Volume_mm3 %>%
summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 45.00 56.72 64.49 62.81 69.87 78.57
keta_df$Tumor_Volume_mm3 %>% sd()
## [1] 9.94592
# Creating the plot
plot(keta_df$Average_age, keta_df$Tumor_Volume_mm3, pch = 19, col = "purple")
# Regression line
abline(lm(keta_df$Tumor_Volume_mm3 ~ keta_df$Average_age), col = "red", lwd = 3)
# Pearson correlation
text(paste("Correlation:", round(cor(keta_df$Average_age, keta_df$Tumor_Volume_mm3), 2)), x = 25, y = 95)
# Creating the plot
plot(keta_df$Average_weight, keta_df$Tumor_Volume_mm3, pch = 19, col = "purple")
# Regression line
abline(lm(keta_df$Tumor_Volume_mm3 ~ keta_df$Average_weight), col = "red", lwd = 3)
# Pearson correlation
text(paste("Correlation:", round(cor(keta_df$Average_weight, keta_df$Tumor_Volume_mm3), 2)), x = 25, y = 95)
pairs(keta_df[,2:5], pch = 19, col = "purple")
plac_df1 <- select(placebo_df, Mouse_Id, Timepoint, Tumor_Volume_mm3) %>%
group_by(Mouse_Id) %>%
filter(Timepoint == max(Timepoint, na.rm=TRUE))
### Find the average weight by mice_id in Infubinol_df
plac_df2 <- select(placebo_df, Mouse_Id, Weight_g) %>%
group_by(Mouse_Id) %>%
summarise(Average_weight = mean(Weight_g, na.rm=TRUE))
### Joining the two df's for adding average weight
plac_df <- plac_df1 %>% inner_join(plac_df2, by = "Mouse_Id")
### Find the average age by mice_id in Capomulin_df
plac_df3 <- select(placebo_df, Mouse_Id, Age_months) %>%
group_by(Mouse_Id) %>%
summarise(Average_age = mean(Age_months, na.rm=TRUE))
### Joining the two df's for adding average age
plac_df <- plac_df %>% inner_join(plac_df3, by = "Mouse_Id")
head(plac_df)
## # A tibble: 6 x 5
## # Groups: Mouse_Id [6]
## Mouse_Id Timepoint Tumor_Volume_mm3 Average_weight Average_age
## <chr> <int> <dbl> <dbl> <dbl>
## 1 a262 45 70.7 29 17
## 2 a897 45 72.3 28 7
## 3 c282 45 65.8 27 12
## 4 c757 45 69.0 27 9
## 5 c766 45 69.8 26 13
## 6 e227 45 73.2 30 1
plac_df$Tumor_Volume_mm3 %>%
summary()
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 45.00 52.94 62.03 60.51 68.13 73.21
plac_df$Tumor_Volume_mm3 %>% sd()
## [1] 8.874672
# Creating the plot
plot(plac_df$Average_age, plac_df$Tumor_Volume_mm3, pch = 19, col = "lightblue")
# Regression line
abline(lm(plac_df$Tumor_Volume_mm3 ~ plac_df$Average_age), col = "red", lwd = 3)
# Pearson correlation
text(paste("Correlation:", round(cor(plac_df$Average_age, plac_df$Tumor_Volume_mm3), 2)), x = 25, y = 95)
# Creating the plot
plot(plac_df$Average_weight, plac_df$Tumor_Volume_mm3, pch = 19, col = "lightblue")
# Regression line
abline(lm(plac_df$Tumor_Volume_mm3 ~ plac_df$Average_weight), col = "red", lwd = 3)
# Pearson correlation
text(paste("Correlation:", round(cor(plac_df$Average_weight, plac_df$Tumor_Volume_mm3), 2)), x = 25, y = 95)
pairs(plac_df[,2:5], pch = 19, col = "lightblue")
From the plots above, there seems a correlation between weight and Tumor size for capomulin drug regimen but will be checked by calculating the correlation coefficient.