The dataset is related to color and heat absorption science experiment. The Experiment is defined below in experimenters own words. In the experience theromometers are place on 5 different color t-shirts and recorded the temperature at 10 minutes intervals for 1 hour as the clothes are exposed to heat. The heater was then turned off, and the temperatures measured again at 10 minute intervals as the garments cooled. The attached table represent our findings. This data can be used to analyze the rate at which different colors absorb and release heat.
This assignment was accomplished by utilizing these packages for both data analysis and visualizations.
library("tidyr")
library("dplyr")
library("kableExtra")
library("ggplot2")
library("stringr")
The data is captured in the .csv format and updated into GitHub. You will see below that the data is not in a very clean form to conduct analysis easily, therefore this data set needed to be tidy.
theURL <- "https://raw.githubusercontent.com/DataScienceAR/Cuny-Assignments/master/Data-607/Data-Sets/science%20proj%20data.csv"
untidyraw1 <- data.frame(read.csv(file = theURL, header = TRUE, sep = ","))
#Table Structure
glimpse(untidyraw1)
## Observations: 10
## Variables: 9
## $ color <fct> white, red, pink, black, green, white, red, pink, bl...
## $ minute.0 <int> 78, 78, 78, 78, 78, 98, 109, 102, 121, 105
## $ minute.10 <int> 81, 82, 82, 88, 81, 96, 106, 96, 108, 94
## $ minute.20 <int> 83, 90, 84, 92, 85, 93, 95, 90, 98, 90
## $ minute.30 <int> 88, 93, 90, 98, 91, 80, 87, 83, 90, 82
## $ minute.40 <int> 93, 98, 96, 108, 95, 78, 82, 80, 84, 80
## $ minute.50 <int> 96, 106, 99, 116, 102, 78, 80, 78, 79, 78
## $ minute.60 <int> 98, 109, 102, 121, 105, 78, 78, 78, 78, 78
## $ phase <fct> H, H, H, H, H, C, C, C, C, C
#Top 6 rows of the table
head(untidyraw1)
## color minute.0 minute.10 minute.20 minute.30 minute.40 minute.50
## 1 white 78 81 83 88 93 96
## 2 red 78 82 90 93 98 106
## 3 pink 78 82 84 90 96 99
## 4 black 78 88 92 98 108 116
## 5 green 78 81 85 91 95 102
## 6 white 98 96 93 80 78 78
## minute.60 phase
## 1 98 H
## 2 109 H
## 3 102 H
## 4 121 H
## 5 105 H
## 6 78 C
The data needs to be cleaned and manipulated for it to be presentable for analysis.
tidy1 <- gather(untidyraw1,"Time Interval","Minutes",-color,-phase)
# Change column name
names(tidy1)[1] <- "T-shirt_Color"
tidy1$phase<- str_replace_all(tidy1$phase,"C","Cooling")
tidy1$phase<- str_replace_all(tidy1$phase,"H","Heating")
head(tidy1)
## T-shirt_Color phase Time Interval Minutes
## 1 white Heating minute.0 78
## 2 red Heating minute.0 78
## 3 pink Heating minute.0 78
## 4 black Heating minute.0 78
## 5 green Heating minute.0 78
## 6 white Cooling minute.0 98
total_by_color_phase<- tidy1 %>% group_by(`T-shirt_Color`) %>% summarise(sum=sum(Minutes)) %>% arrange(desc(sum))
total_by_color_phase
## # A tibble: 5 x 2
## `T-shirt_Color` sum
## <fct> <int>
## 1 black 1359
## 2 red 1293
## 3 green 1244
## 4 pink 1238
## 5 white 1218
# Distribution of Minutes
hist(tidy1$Minutes,main = "Distribution of Minutes",xlab = "Minutes")
pie(total_by_color_phase$sum, labels = total_by_color_phase$`T-shirt_Color`)