For this “tidying” we will utilize Christopher Ayre’s example dataset of Heating and Cooling Absorption.
Discussion board can be found here: https://bbhosted.cuny.edu/webapps/discussionboard/do/message?action=list_messages&course_id=_1705328_1&nav=discussion_board&conf_id=_1845527_1&forum_id=_1908779_1&message_id=_31283025_1
Christopher provided the data in a .csv file, which I’ve uploaded to github:
heating <- read.csv("https://raw.githubusercontent.com/murphystout/data-607/master/heating_cooling.csv")
head(heating)
## color minute.0 minute.10 minute.20 minute.30 minute.40 minute.50
## 1 white 78 81 83 88 93 96
## 2 red 78 82 90 93 98 106
## 3 pink 78 82 84 90 96 99
## 4 black 78 88 92 98 108 116
## 5 green 78 81 85 91 95 102
## 6 white 98 96 93 80 78 78
## minute.60 phase
## 1 98 heating
## 2 109 heating
## 3 102 heating
## 4 121 heating
## 5 105 heating
## 6 78 cooling
Christopher adeptly pointed out several issues with the data set that make it “untidy”. These are:
The variable for “time elapsed” does not have its own column. In this case we see a column for each ten minute interval. These should be collapsed into one column.
Each color should have its own column. We will treat one “observation” as the temperature across all the colors, and so the color is merely a variable for a single observation, hence they should all be placed on a single row.
Multiple observational units are observed in the same table. In particular the “heating” and “cooling” data are in one table.
After we have tidied it up, we will do some exploratory analysis and visualizations on the data.
For this step we will gather multiple timestamp columns into a single column.
As a intermediate step, let’s first rename the columns to that they take on a numerical value, this will help with plotting the value later on.
colnames(heating) <- c("color",0,10,20,30,40,50,60,"phase")
heating <- gather(heating, time, temperature, "0":"60")
head(heating, 20)
## color phase time temperature
## 1 white heating 0 78
## 2 red heating 0 78
## 3 pink heating 0 78
## 4 black heating 0 78
## 5 green heating 0 78
## 6 white cooling 0 98
## 7 red cooling 0 109
## 8 pink cooling 0 102
## 9 black cooling 0 121
## 10 green cooling 0 105
## 11 white heating 10 81
## 12 red heating 10 82
## 13 pink heating 10 82
## 14 black heating 10 88
## 15 green heating 10 81
## 16 white cooling 10 96
## 17 red cooling 10 106
## 18 pink cooling 10 96
## 19 black cooling 10 108
## 20 green cooling 10 94
Finally, let’s make sure the time is stores as numeric values:
heating$time <- as.numeric(heating$time)
Now that’s we’ve gathered up the temperature columns, let’s spread out the color columns to include one temp reading for each color + timestamp combination.
heating <- spread(heating, color, temperature)
heating
## phase time black green pink red white
## 1 cooling 0 121 105 102 109 98
## 2 cooling 10 108 94 96 106 96
## 3 cooling 20 98 90 90 95 93
## 4 cooling 30 90 82 83 87 80
## 5 cooling 40 84 80 80 82 78
## 6 cooling 50 79 78 78 80 78
## 7 cooling 60 78 78 78 78 78
## 8 heating 0 78 78 78 78 78
## 9 heating 10 88 81 82 82 81
## 10 heating 20 92 85 84 90 83
## 11 heating 30 98 91 90 93 88
## 12 heating 40 108 95 96 98 93
## 13 heating 50 116 102 99 106 96
## 14 heating 60 121 105 102 109 98
Thankfully this is as simple as subsetting the data based on the “phase” column. I saved this for the last step to save us from having to perform the tidying operations twice.
cooling <- subset(heating, phase == "cooling")
heating <- subset(heating, phase == "heating")
head(cooling)
## phase time black green pink red white
## 1 cooling 0 121 105 102 109 98
## 2 cooling 10 108 94 96 106 96
## 3 cooling 20 98 90 90 95 93
## 4 cooling 30 90 82 83 87 80
## 5 cooling 40 84 80 80 82 78
## 6 cooling 50 79 78 78 80 78
head(heating)
## phase time black green pink red white
## 8 heating 0 78 78 78 78 78
## 9 heating 10 88 81 82 82 81
## 10 heating 20 92 85 84 90 83
## 11 heating 30 98 91 90 93 88
## 12 heating 40 108 95 96 98 93
## 13 heating 50 116 102 99 106 96
Let’s plot these data in a line graph to get a visual representation of how the colors responded to heating and cooling:
plot(x = heating$time, y = heating$black, type = "l", col = "black", xlab = "Time Elapsed (Minutes)", ylab = "Temp (Farheneit)", main = "Heating/Color Absorption")
lines(x = heating$time, y = heating$red, col = "red")
lines(x = heating$time, y = heating$green, col = "green")
lines(x = heating$time, y = heating$pink, col = "pink")
lines(x = heating$time, y = heating$white, col = "grey")
plot(x = cooling$time, y = cooling$black, type = "l", col = "black", xlab = "Time Elapsed (Minutes)", ylab = "Temp (Farheneit)", main = "Cooling/Color Asborption")
lines(x = cooling$time, y = cooling$red, col = "red")
lines(x = cooling$time, y = cooling$green, col = "green")
lines(x = cooling$time, y = cooling$pink, col = "pink")
lines(x = cooling$time, y = cooling$white, col = "grey")
The graphs look neat, and we can see that black is the fastest heat absorber.
The graphs also look symmetrical, but now that we see it in this form, it might make sense to view the cooling and heating data in one graph.
However, the data requires a bit more finagling to get this correct, such as:
1: Minute “60” of the Heating Data is equivalent of Minute “0” of the Cooling.
2: Minutes elapsed in the Cooling data need to be increased by 60 in order to create one continuous time series.
Let’s do it!
# Remove minute 0 of the cooling dataset:
heat_cool <- cooling[-1,]
# Add 60 to the time column.
heat_cool$time <- as.numeric(heat_cool$time) + 60
# Stack this underneath the heating data
heat_cool <- rbind(heating, heat_cool)
heat_cool
## phase time black green pink red white
## 8 heating 0 78 78 78 78 78
## 9 heating 10 88 81 82 82 81
## 10 heating 20 92 85 84 90 83
## 11 heating 30 98 91 90 93 88
## 12 heating 40 108 95 96 98 93
## 13 heating 50 116 102 99 106 96
## 14 heating 60 121 105 102 109 98
## 2 cooling 70 108 94 96 106 96
## 3 cooling 80 98 90 90 95 93
## 4 cooling 90 90 82 83 87 80
## 5 cooling 100 84 80 80 82 78
## 6 cooling 110 79 78 78 80 78
## 7 cooling 120 78 78 78 78 78
Now we have a nice, neat and tidy dataset showing heating and cooling times. Let’s revisit those graphs we generated previously:
plot(x = heat_cool$time, y = heat_cool$black, type = "l", col = "black", xlab = "Time Elapsed (Minutes)", ylab = "Temp (Farheneit)", main = "Cooling/Color Asborption")
lines(x = heat_cool$time, y = heat_cool$red, col = "red")
lines(x = heat_cool$time, y = heat_cool$green, col = "green")
lines(x = heat_cool$time, y = heat_cool$pink, col = "pink")
lines(x = heat_cool$time, y = heat_cool$white, col = "grey")
Let’s get a bit more quantitative. Let’s calculate the rates of heating and cooling for each of the colors:
heating_rate <- (heating[7,3:7] - heating[1,3:7])/60
heating_rate
## black green pink red white
## 14 0.7166667 0.45 0.4 0.5166667 0.3333333
cooling_rate <- (cooling[7,3:7] - cooling[1,3:7])/60
cooling_rate
## black green pink red white
## 7 -0.7166667 -0.45 -0.4 -0.5166667 -0.3333333
Since the starting and ending temperatures were equivalent, we see the overall heating and cooling rates to be symmetrical to one another.
According to this test, a colors heating rate also dicates its cooling rate (or heat retention), at least on average over 120 minutes.
Looking at these visually:
heating_rate <- gather(heating_rate, color, temp)
barplot(heating_rate$temp, col = heating_rate$color, names.arg = heating_rate$color, main = 'Heating Rates (by Color)', xlab = "Color", ylab = "Rate (Degrees per minute)")
However, this is looking at averages over the hour. But what does temp change look like within each 10 minute interval?
We can find this programmatically:
black_ht <- diff(heating$black)/10
green_ht <- diff(heating$green)/10
pink_ht <- diff(heating$pink)/10
red_ht <- diff(heating$red)/10
white_ht <- diff(heating$white)/10
black_cl <- diff(cooling$black)/10
green_cl <- diff(cooling$green)/10
pink_cl <- diff(cooling$pink)/10
red_cl <- diff(cooling$red)/10
white_cl <- diff(cooling$white)/10
ht_rates <- data.frame(black_ht, black_cl, green_ht, green_cl, pink_ht, pink_cl, red_ht, red_cl, white_ht, white_cl)
ht_rates
## black_ht black_cl green_ht green_cl pink_ht pink_cl red_ht red_cl
## 1 1.0 -1.3 0.3 -1.1 0.4 -0.6 0.4 -0.3
## 2 0.4 -1.0 0.4 -0.4 0.2 -0.6 0.8 -1.1
## 3 0.6 -0.8 0.6 -0.8 0.6 -0.7 0.3 -0.8
## 4 1.0 -0.6 0.4 -0.2 0.6 -0.3 0.5 -0.5
## 5 0.8 -0.5 0.7 -0.2 0.3 -0.2 0.8 -0.2
## 6 0.5 -0.1 0.3 0.0 0.3 0.0 0.3 -0.2
## white_ht white_cl
## 1 0.3 -0.2
## 2 0.2 -0.3
## 3 0.5 -1.3
## 4 0.5 -0.2
## 5 0.3 0.0
## 6 0.2 0.0
Let’s take a look at these visually:
plot(x = seq(10, 60, 10), y = black_ht, type = "l", col = "black", ylim = c(-1.5,1.5), main = "Heating and Cooling Rates", sub = "Postive Values are Heating Rates, Negative are Cooling Rates", xlab = "Time Elapsed (Minutes)", ylab = "Heating and Cooling Rates")
lines(seq(10, 60, 10),y = black_cl, col = "black")
lines(seq(10, 60, 10),y = green_ht, col = "green")
lines(seq(10, 60, 10),y = green_cl, col = "green")
lines(seq(10, 60, 10),y = pink_ht, col = "pink")
lines(seq(10, 60, 10),y = pink_cl, col = "pink")
lines(seq(10, 60, 10),y = red_ht, col = "red")
lines(seq(10, 60, 10),y = red_cl, col = "red")
lines(seq(10, 60, 10),y = white_ht, col = "grey")
lines(seq(10, 60, 10),y = white_cl, col = "grey")
This chart shows both heating and cooling rates. The heating rates are postive (top of chart), while the cooling rates are negative (bottom of chart).
Matching like colors can show you how that color behaved in its heating and cooling phase.
Being that our ultimate averages were very symmetrical (i.e. over the full 120 minute span), we might expect that each smaller interval would be symmetrical too.
However that doesn’t always to be the case in this data. Note the green line is often twice the magnitude of its counterpart.
We can also see that all colors tend to converge to low values at the end of both periods. Perhaps this speaks to a type of heating saturation paired with a similar flatline of cooling.
Some initial conclusions from our exploratory data analysis:
Black has the fastest heating rate, and ~0.72 degrees per minute. White has the slowest heating rate, at ~0.33 degrees per minute. This was probably suspected based on known heuristics, and the data seems to confirm it.
Heating rates and cooling rates were symmetrical over a 120 minute span. However, they don’t seem to be symmetric over smaller 10 minute spans. Lots of variation of rates across that time.
Some questions it raised:
There seems to be a wide varience for temperature changes in the 10 minute intervals. Is this typical? Do temperature changes “slow” or otherwise change during based on when they occur in the time series?
Heat absorbtion may very well not be a linear activity, a perhaps more detailed detail in needed to really understand the dynamics of these rates.