Overview

The dataset is related to color and heat absorption science experiment. The Experiment is defined below in experimenters own words. In the experience theromometers are place on 5 different color t-shirts and recorded the temperature at 10 minutes intervals for 1 hour as the clothes are exposed to heat. The heater was then turned off, and the temperatures measured again at 10 minute intervals as the garments cooled. The attached table represent our findings. This data can be used to analyze the rate at which different colors absorb and release heat.


R Packages Used

This assignment was accomplished by utilizing these packages for both data analysis and visualizations.

library("tidyr")
library("dplyr")
library("kableExtra")
library("ggplot2")
library("stringr")

The DataSet

The data is captured in the .csv format and updated into GitHub. You will see below that the data is not in a very clean form to conduct analysis easily, therefore this data set needed to be tidy.

theURL <- "https://raw.githubusercontent.com/DataScienceAR/Cuny-Assignments/master/Data-607/Data-Sets/science%20proj%20data.csv"
untidyraw1 <- data.frame(read.csv(file = theURL, header = TRUE, sep = ","))
#Table Structure
glimpse(untidyraw1)
## Observations: 10
## Variables: 9
## $ color     <fct> white, red, pink, black, green, white, red, pink, bl...
## $ minute.0  <int> 78, 78, 78, 78, 78, 98, 109, 102, 121, 105
## $ minute.10 <int> 81, 82, 82, 88, 81, 96, 106, 96, 108, 94
## $ minute.20 <int> 83, 90, 84, 92, 85, 93, 95, 90, 98, 90
## $ minute.30 <int> 88, 93, 90, 98, 91, 80, 87, 83, 90, 82
## $ minute.40 <int> 93, 98, 96, 108, 95, 78, 82, 80, 84, 80
## $ minute.50 <int> 96, 106, 99, 116, 102, 78, 80, 78, 79, 78
## $ minute.60 <int> 98, 109, 102, 121, 105, 78, 78, 78, 78, 78
## $ phase     <fct> H, H, H, H, H, C, C, C, C, C
#Top 6 rows of the table
head(untidyraw1)
##   color minute.0 minute.10 minute.20 minute.30 minute.40 minute.50
## 1 white       78        81        83        88        93        96
## 2   red       78        82        90        93        98       106
## 3  pink       78        82        84        90        96        99
## 4 black       78        88        92        98       108       116
## 5 green       78        81        85        91        95       102
## 6 white       98        96        93        80        78        78
##   minute.60 phase
## 1        98     H
## 2       109     H
## 3       102     H
## 4       121     H
## 5       105     H
## 6        78     C

Data Manipulation

The data needs to be cleaned and manipulated for it to be presentable for analysis.


Cleaning the table

  • Transposing the wide table to a longer one
tidy1 <- gather(untidyraw1,"Time Interval","Minutes",-color,-phase)

# Change column name


 names(tidy1)[1] <- "T-shirt_Color"
tidy1$phase<- str_replace_all(tidy1$phase,"C","Cooling")
tidy1$phase<- str_replace_all(tidy1$phase,"H","Heating")
head(tidy1)
##   T-shirt_Color   phase Time Interval Minutes
## 1         white Heating      minute.0      78
## 2           red Heating      minute.0      78
## 3          pink Heating      minute.0      78
## 4         black Heating      minute.0      78
## 5         green Heating      minute.0      78
## 6         white Cooling      minute.0      98

Analysis

total_by_color_phase<- tidy1 %>% group_by(`T-shirt_Color`) %>% summarise(sum=sum(Minutes)) %>% arrange(desc(sum))
total_by_color_phase
## # A tibble: 5 x 2
##   `T-shirt_Color`   sum
##   <fct>           <int>
## 1 black            1359
## 2 red              1293
## 3 green            1244
## 4 pink             1238
## 5 white            1218
# Distribution of Minutes

hist(tidy1$Minutes,main = "Distribution of Minutes",xlab = "Minutes")

pie(total_by_color_phase$sum, labels = total_by_color_phase$`T-shirt_Color`)