###. The following dataset for my project is about a “Smart pill” and it’s ability to record real time info on gastric emptying, bowel transit time, and total intestinal transit time. The data is based on a group of ill trauma patients and also healthy patients. The goal of the dataset is to compare the difference in bowel transportation across different patients and their respective demographics. The variables I plan to use for my project is Weight, Gender,age, Group, and Small Bowel Mean pH. The reason I chose these variables was to see if based on the collective data from Group (Healthy), weight, and gender it somehow includes a correlation to Small Bowel Mean pH. This is what my data visualization will seek to find. The source for the dataset I’ll be using is as follows:https://www.causeweb.org/tshs/smart-pill/
# Load the datasetdata <-read.csv("/Users/jasonlaucel/Data 110 Folder/SmartPill3.csv")
str(data)
'data.frame': 95 obs. of 22 variables:
$ Group : int 0 0 0 0 0 0 0 0 1 1 ...
$ Gender : int 1 1 1 1 0 1 1 0 1 0 ...
$ Race : int NA NA NA NA NA NA NA NA 1 1 ...
$ Height : num 183 180 180 175 152 ...
$ Weight : num 102.1 102.1 68 69.9 44.9 ...
$ Age : int 25 39 44 53 57 43 38 23 21 24 ...
$ GE.Time : num 74.3 73.3 4.3 NA 13.9 23.3 7.5 5.6 2.73 5.02 ...
$ SB.Time : num 8.4 13.8 6.7 NA 5.1 8.7 3.7 3.4 5.12 3.3 ...
$ C.Time : num NA NA NA NA NA ...
$ WG.Time : num 816 168 240 216 120 ...
$ S.Contractions : int NA NA NA NA NA NA NA NA 145 114 ...
$ S.Sum.of.Amplitudes : num NA NA NA NA NA ...
$ S.Mean.Peak.Amplitude : num NA NA NA NA NA ...
$ S.Mean.pH : num NA NA NA NA NA NA NA NA 2.07 2.28 ...
$ SB.Contractions : int NA NA NA NA NA NA NA NA 298 782 ...
$ SB.Sum.of.Amplitudes : num NA NA NA NA NA ...
$ SB.Mean.Peak.Amplitude : num NA NA NA NA NA ...
$ SB.Mean.pH : num NA NA NA NA NA NA NA NA 7.26 7.21 ...
$ Colon.Contractions : int NA NA NA NA NA NA NA NA 507 50 ...
$ Colon.Sum.of.Amplitudes: num NA NA NA NA NA ...
$ C.Mean.Peak.Amplitude : num NA NA NA NA NA ...
$ C.Mean.pH : num NA NA NA NA NA NA NA NA 7.58 7.21 ...
head(data)
Group Gender Race Height Weight Age GE.Time SB.Time C.Time WG.Time
1 0 1 NA 182.88 102.05820 25 74.3 8.4 NA 816
2 0 1 NA 180.34 102.05820 39 73.3 13.8 NA 168
3 0 1 NA 180.34 68.03880 44 4.3 6.7 NA 240
4 0 1 NA 175.26 69.85317 53 NA NA NA 216
5 0 0 NA 152.40 44.90561 57 13.9 5.1 NA 120
6 0 1 NA 185.42 94.80073 43 23.3 8.7 NA 384
S.Contractions S.Sum.of.Amplitudes S.Mean.Peak.Amplitude S.Mean.pH
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
SB.Contractions SB.Sum.of.Amplitudes SB.Mean.Peak.Amplitude SB.Mean.pH
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
Colon.Contractions Colon.Sum.of.Amplitudes C.Mean.Peak.Amplitude C.Mean.pH
1 NA NA NA NA
2 NA NA NA NA
3 NA NA NA NA
4 NA NA NA NA
5 NA NA NA NA
6 NA NA NA NA
data <-na.omit(data)
# Linear regression analysis# Make a linear model for SB Mean pH using weight, age, and gender as predictorsmodel <-lm(`SB.Mean.pH`~ Weight + Age + Gender, data = data)# Display summary of the regression modelsummary(model)
Call:
lm(formula = SB.Mean.pH ~ Weight + Age + Gender, data = data)
Residuals:
Min 1Q Median 3Q Max
-2.16517 -0.16456 0.07069 0.22328 1.66224
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.310523 0.299028 24.448 <2e-16 ***
Weight -0.004123 0.003884 -1.062 0.292
Age 0.002625 0.004888 0.537 0.593
Gender -0.194384 0.121809 -1.596 0.115
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.4897 on 74 degrees of freedom
Multiple R-squared: 0.07642, Adjusted R-squared: 0.03898
F-statistic: 2.041 on 3 and 74 DF, p-value: 0.1155
#. Group 1 scatterplotcolors_group1 <-c("navy")scatterplot_group1 <-ggplot(data %>%filter(Group ==1), aes(x = Weight, y =`SB.Mean.pH`, color =factor(Group))) +geom_point() +labs(title ="Weight and Small Bowel Mean pH for Group 1",x ="Weight (kg)",y ="Small Bowel Mean pH",color ="Group",caption ="Data source: Smart Pill Dataset: TSHS" ) +theme_minimal() +scale_color_manual(values = colors_group1) +theme(legend.position ="bottom")# Gender Scatterplotscatterplot_both <- scatterplot_group1 +geom_point(data = data %>%filter(Group ==1), aes(color =factor(Gender))) +scale_color_manual(values =c("navy", "magenta")) +labs(color ="Gender")
Scale for colour is already present.
Adding another scale for colour, which will replace the existing scale.
# Display scatterplot scatterplot_both
# Create scatterplot with age on the x-axisscatterplot_age <-ggplot(data %>%filter(Group ==1), aes(x = Age, y =`SB.Mean.pH`, color =factor(Group))) +geom_point() +labs(title ="Age and Small Bowel Mean pH",x ="Age",y ="Small Bowel Mean pH",color ="Group",caption ="Data source: Smart Pill Dataset: TSHS" ) +theme_minimal() +scale_color_manual(values =c("cyan")) +theme(legend.position ="bottom")
# Display scatterplotscatterplot_age
###. In the data set above I created a scatterplot that shows comparisons in Weight and SB Mean pH by age. Group 1 was specified as the healthy patients of the data set. Unfortunately, group 0 which are the ill patients do not have any corresponding data to SB Mean pH. The other scatterplot gives a distinction between Male and Female genders. Here we can see a clear pattern or occurence in both Male and Female. In terms of cleaning the data environment I added na.omit in order to remove any NA data that could cause issues during compiling. I also had to rename some variables to something different to their original naming in the csv file/ excel file I have. Something I wish I could’ve included was group 0 in this data visualization. The variable I chose wasn’t available for the ill patients. Perhaps I will revisit this project in the future and instead of Mean I can use bowel time as a variable as it was available for Group 0 ill patients.