Abstract
Note: Code folding is turned on for ease of reading. All code blocks can be shown or hidden by pressing the appropriate buttons. All at once at the top or block-by-block at the local code buttons.The USDA publishes data on Agricultural Productivity in the US. Table 1a of this dataset contains price indices and implicit quantities of 29 farm outputs and inputs between 1948 and 2015. The data-set is untidy as it has multiple columns for both price and quantity for the various inputs and outputs. The file is also decorated with merged columns and footnotes. Looking at the data before ingestion will help determine where the names are and how to reshape it. The result of the ETL will be a tidy data-set where every column is a distinct variable, aggregate columns will be identified, as well as the direction of the commodity flow (input or output).
The first step will be to process the names. They are stored in a combination of merged columns and multiple rows, so the process will be very manual and based on inspection. There are also many columns with the same names, as they are total or aggregate columns. Giving these columns unique names will greatly ease the ETL process, as it will allow the separation of price and quantity data, to be later joined on commodity name. These aggregate columns will be identified so that analysis can be done without the fear of overcounting. Lastly, the type of commodity will be added as well, also for ease of downstream analysis.
# Ingest
T1RawN <- unlist(fread("./table01a.csv", skip = 5L, nrows = 1L, drop = 60:90))
# Remove Empty entries due to merged columns
T1RawN <- T1RawN[nzchar(T1RawN)]
# Remove names
names(T1RawN) <- NULL
# Split into Input and Output to handle names on PRIOR row
T1RawNa <- T1RawN[1:11]
T1RawNb <- T1RawN[12:length(T1RawN)]
# Some of these names are added by inspection and not explicit ingestion
T1RawNa <- c("Total output", T1RawNa, "Farm-related output")
# Give unique names to totals for later joins or add multi-layered headers
T1RawNa[2] <- "All Livestock"
T1RawNa[6] <- "All Crops"
T1RawNa[12] <- "Other crops"
T1RawNb[1] <- "All Capital"
T1RawNb[6] <- "All Labor"
T1RawNb[9] <- "All Intermediate Goods"
T1RawNb <- c("Total farm input", T1RawNb)
# Combine proper input and output names
T1RawNames <- c(T1RawNa, T1RawNb)
# Identify the inputs and the outputs
T1Flow <- c(rep("Output", length(T1RawNa)), rep("Input", length(T1RawNb)))
# Identify the commodity type. This is by inspection.
T1Types <- c("Farm Output", rep("Livestock", 4L), rep("Crops", 7L), "Other",
"Farm Input", rep("Capital", 5L), rep("Labor", 3L),
rep("Intermediate goods", 7L))
# Identify if the column is an aggregate or total column—by inspection
T1Agg <- c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE,
FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE,
TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE)
# Create a table for later joining
T1Names <- data.table(Commodity = T1RawNames,
Flow = T1Flow, Type = T1Types, Aggregate = T1Agg)
The next step is to process the data. Now that each commodity is uniquely named, the data-set can be split into two tables: one for price and one for quantity. Once split, these are both valid wide tables, since every row means something unique and every column is also a unique data element, although most are the same kind of variable. These will both be reshaped to long format, so that each column is a unique variable type, and the commodity identification is in the rows as well. The to long tables will be inner-joined to each other on the year and the commodity type, and the resulting table will be joined to the table with the flow, commodity type, and aggregate identifier table.
# Process Data - Split into Price and Quantity
T1Raw <- fread("./table01a.csv", skip = 6L, nrows = 68L, drop = 60:90,
data.table = FALSE)
# Take advantage that every other column is a price or a quantity
T1RawPrice <- T1Raw[, c(1, seq(2, 58, 2))]
T1RawQuant <- T1Raw[, c(1, seq(3, 59, 2))]
# Same order and same names
names(T1RawPrice) <- names(T1RawQuant) <- c("Year", T1RawNames)
# Turn data frames into data tables by reference
setDT(T1RawPrice)
setDT(T1RawQuant)
# Now reshape the Price & Quantity tables from "WIDE" to "LONG"
T1P <- melt(T1RawPrice, id.vars = "Year", variable.name = "Commodity",
value.name = "Price", variable.factor = FALSE)
T1Q <- melt(T1RawQuant, id.vars = "Year", variable.name = "Commodity",
value.name = "Quantity", variable.factor = FALSE)
# Now Inner Join the Price to the Quantity on both YEAR and COMMODITY
T1 <- T1P[T1Q, on = c(Year = "Year", Commodity = "Commodity"),
nomatch = NULL][T1Names, on = "Commodity==Commodity", nomatch = NULL]
setkey(T1, Year)
The suggested analyses in the post included:
The dataset is already on an equivalent dollar basis pegged to 2005, so prices are comparable over time.
The analysis is clearly restricted to farm outputs. The first question is to decide which output is considered a “food”. Looking at the commodities, it is reasonable to consider all non-aggregate outputs as food except for Feed Crops, Oil Crops, and Farm-related output.
Below is a plot of the annual prices of the food commodities with a locally smoothed (loess) trend overlay. The slopes and inflection points look similar for most commodities, with Food grains having a greater amplitude. Interestingly, the smoothed trendline for Fruits and nuts increased its slope around 1991 and continues to climb faster than other food commodities.
Food <- T1RawNa[c(3, 4, 5, 7, 10, 11, 12)]
ggplot(T1[Commodity %chin% Food], aes(x = Year, y = Price, color = Commodity)) +
geom_line() + stat_smooth(method = "loess", se = FALSE) +
scale_colour_brewer(type = "qual", palette = "Dark2") +
scale_x_continuous(breaks = seq(1950, 2015, 5))
This question depends on what is meant by a jump. Total change from 1948 to 2015 is shown below:
knitr::kable(T1[Commodity %chin% Food,
{Growth = Price[.N] / Price[1];
.(`Total Growth` = Growth,
`Annual Trend` = Growth ^ (1 / 68) - 1)},
by = Commodity][order(-`Annual Trend`)])
| Commodity | Total Growth | Annual Trend |
|---|---|---|
| Fruits and nuts | 7.335386 | 0.0297382 |
| Meat animals | 5.402912 | 0.0251182 |
| Vegetables and melons | 4.621757 | 0.0227667 |
| Other crops | 3.943467 | 0.0203823 |
| Dairy | 3.658259 | 0.0192564 |
| Food grains | 2.255163 | 0.0120309 |
| Poultry and eggs | 1.652258 | 0.0074118 |
The only two comestible commodities with average annual trend rates of over 2.5% are Fruits and nuts and Meat animals. We can compare those to their respective quantities on the same graph by scaling them to be mean 0 and variance 1.
Expensive <- T1[Commodity %chin% c("Fruits and nuts", "Meat animals"),
c("Year", "Commodity", "Price", "Quantity")]
Expensive[,
c("Scaled Price", "Scaled Quantity") :=
lapply(.SD, function(x) as.vector(scale(x))),
by = Commodity,
.SDcols = c("Price", "Quantity")
][, `:=`(c("Price", "Quantity"), NULL)]
Expensive <- melt(Expensive, id.vars = c("Year", "Commodity"))
ggplot(Expensive, aes(x = Year, y = value, color = Commodity, lty = variable)) +
geom_path() + scale_colour_brewer(type = "qual", palette = "Dark2") +
scale_x_continuous(breaks = seq(1950, 2015, 5))
The plot shows that there is a pretty strong correlation between the prices and available quantities of these items. This would imply that demand is driving supply. For if supply was leading demand, prices should drop.
knitr::kable(T1[Commodity %chin% c("Fruits and nuts", "Meat animals"),
.(`Price-Quantity Correlation` = cor(Price, Quantity)),
by = Commodity])
| Commodity | Price-Quantity Correlation |
|---|---|
| Meat animals | 0.7208833 |
| Fruits and nuts | 0.9510602 |
An analysis of price jumping is more complicated. It is related to change-point analysis, about which the interested reader may find more information in Adler (2015)1 and the bibliography therein.
The simplest approach to answering this question would be to compare the aggregate labor price and quantity with that of the aggregate capital variable. This will be similar to the analysis done on the expensive foods.
LabCap <- T1[Commodity %chin% c("All Labor", "All Capital"),
c("Year", "Commodity", "Price", "Quantity")]
LabCap[,
c("Scaled Price", "Scaled Quantity") :=
lapply(.SD, function(x) as.vector(scale(x))),
by = Commodity,
.SDcols = c("Price", "Quantity")
][, `:=`(c("Price", "Quantity"), NULL)]
LabCap <- melt(LabCap, id.vars = c("Year", "Commodity"))
ggplot(LabCap, aes(x = Year, y = value, color = Commodity, lty = variable)) +
geom_path() + scale_colour_brewer(type = "qual", palette = "Dark2") +
scale_x_continuous(breaks = seq(1950, 2015, 5))
Here we see an inverse relationship between price and quantity. As alluded to earlier, this makes sense when supply drives demand, so when supply drops the cost of labor and capital increase. There is also a positive correlation between labor and capital of 86%.
There are a number of datasets and tables provided by the UN on worldwide immigration. I will be focusing on data in the “Total international migrant stock” files—2019 revision.
The questions posed in the post suggesting this dataset cannot be answered in their entirety from this data. For example, how many immigrants can the US accept is not something the UN has, or even can, determine. What the first table in the file contains is the quinquennial estimates of non-native populations within countries—denoted as the migrant stock—with an additional estimate for 2019, broken into males and females. It also contains the total population estimates by country in the same Excel file. I will save the tabs as individual CSV files and work with those
From a tidy perspective this data is awful. Not only does it repeat column headings for years three times, it also has a three-to-four level heading structure for the countries. Moreover, I can discern no logic in the structure that could be utilized for a programmatic decision as to what row is a major area, a World Bank income group, a geographic region, a Sustainable Development Goal (SDG) region, a sub-region, or a country. Therefore, I deemed it easiest and most efficient to manually create a table of countries in tidy format to be used for all the data, and to extract the numerical data and tidy them accordingly.
Another problem is that not every country has data broken into male and female. Therefore, some questions will not be able to be answered by sex.
# Manually created country list from UN file. This file has the SDG region and
# subregion as additional variables.
Countries <- fread("Countries.csv", encoding = 'UTF-8')
# Read Migrant Data (Excel file table 1 saved as CSV)
RawMig <- fread("UN_MigrantStockTotal_2019.csv", skip = 17L, encoding = 'UTF-8')
# Isolate the male and female numbers, together with region
RawMigBoth <- RawMig[, c(2, 6:12)]
RawMigMale <- RawMig[, c(2, 13:19)]
RawMigFemale <- RawMig[, c(2, 20:26)]
# Rename them
names(RawMigBoth) <- names(RawMigMale) <- names(RawMigFemale) <- c(
"Country", seq(1990, 2015, 5), 2019)
# Remove aggregates keeping only individual countries
MigMale <- RawMigMale[Country %chin% Countries$Country]
# Reshape from wide to long
MigMale <- melt(MigMale, id.vars = "Country", variable.name = "Year",
value.name = "MigCount")
# Add Sex in preparation for joining to female
MigMale[, Sex := "Male"]
# Perform same adjustments to female data
MigFemale <- RawMigFemale[Country %chin% Countries$Country]
MigFemale <- melt(MigFemale, id.vars = "Country", variable.name = "Year",
value.name = "MigCount")
MigFemale[, Sex := "Female"]
# Perform same adjustments to combined data
MigBoth <- RawMigBoth[Country %chin% Countries$Country]
MigBoth <- melt(MigBoth, id.vars = "Country", variable.name = "Year",
value.name = "MigCount")
MigBoth[, Sex := "Combined"]
# Combine the data
MigrantData <- rbind(MigMale, MigFemale, MigBoth)
# Ensure that count is a number and not character; missing will be NA
MigrantData$MigCount <- as.integer(MigrantData$MigCount)
# Join to countries table
MigrantData <- Countries[MigrantData, on = "Country", nomatch = NULL]
# Perform equivalent ETL on the population data
RawPop <- fread("UN_PopulationTotal_2019.csv", skip = 17L, encoding = 'UTF-8')
RawPopBoth <- RawPop[, c(2, 5:11)]
RawPopMale <- RawPop[, c(2, 12:18)]
RawPopFemale <- RawPop[, c(2, 19:25)]
names(RawPopBoth) <- names(RawPopMale) <- names(RawPopFemale) <- c(
"Country", seq(1990, 2015, 5), 2019)
PopMale <- RawPopMale[Country %chin% Countries$Country]
PopMale <- melt(PopMale, id.vars = "Country", variable.name = "Year",
value.name = "PopCount")
PopMale[, Sex := "Male"]
PopFemale <- RawPopFemale[Country %chin% Countries$Country]
PopFemale <- melt(PopFemale, id.vars = "Country", variable.name = "Year",
value.name = "PopCount")
PopFemale[, Sex := "Female"]
PopBoth <- RawPopBoth[Country %chin% Countries$Country]
PopBoth <- melt(PopBoth, id.vars = "Country", variable.name = "Year",
value.name = "PopCount")
PopBoth[, Sex := "Combined"]
PopData <- rbind(PopMale, PopFemale, PopBoth)
# Population data was in 1000s so need to multiply by 1000 too
PopData$PopCount <- as.integer(PopData$PopCount) * 1000
# Inner Join the migrant data to the population data
UNData <- PopData[MigrantData, on = c(Year = "Year", Country = "Country",
Sex = "Sex"), nomatch = NULL]
# Order the columns---really unnecessary
setcolorder(UNData, c("Year", "Country", "SDG Region", "SubRegion", "Continent",
"Sex", "PopCount", "MigCount"))
# Index by Year and Country. This breaks the UN ordering, but should make
# searches more efficient
setkey(UNData, Year, Country)
The only question in the original post that can be answered from this data is the one asking in which survey year was there the largest migrant stock (highest number of foreign-born immigrants). The first question cannot be answered from this file, although there is another file which does contain the necessary information. The third and fifth questions are identical, and cannot be answered by this data, or the UN at all for this matter. The fourth question is one of flow and this data is one of stock. The data does not reflect the year of entry, in only reflects the total number of foreign-born immigrants living in the country at that time. Whether they have been there for 2 or 20 years is not in the data. Therefore, I will answer some other questions. Note that for all of these questions, missing data will be ignored (treated as 0).
In which survey year was there the largest migrant stock? Looking at the total migrant stock numbers, the answer is 2019.
# Add up all migrant stock values for which there is data, group and order by
# year, and pass it to kable to make a prettier table
knitr::kable(x = {UNData[Sex == "Combined",
.(TotMigStock = sum(MigCount, na.rm = TRUE)),
keyby = Year]}, format.args = list(big.mark = ","))
| Year | TotMigStock |
|---|---|
| 1990 | 159,207,561 |
| 1995 | 166,953,713 |
| 2000 | 179,006,588 |
| 2005 | 197,633,172 |
| 2010 | 227,828,148 |
| 2015 | 256,932,403 |
| 2019 | 281,914,164 |
In which survey year was there the greatest disparity between males and females? Looking at the migrant stock numbers by survey year, the answer is 2019.
# First add up all the migrant stock, this time by sex and by year, ignoring the
# combined numbers. Then cast it to a wide format. Then calculate the absolute
# difference between the female and male numbers and return that via kable
knitr::kable(x = dcast(UNData[Sex != "Combined",
.(MigStock = sum(MigCount, na.rm = TRUE)),
keyby = c("Year", "Sex")],
Year ~ Sex, value.var = "MigStock"
)[, Diff := abs(Female - Male)],
format.args = list(big.mark = ","))
| Year | Female | Male | Diff |
|---|---|---|---|
| 1990 | 78,451,915 | 80,755,646 | 2,303,731 |
| 1995 | 82,460,396 | 84,493,317 | 2,032,921 |
| 2000 | 88,285,870 | 90,720,718 | 2,434,848 |
| 2005 | 96,795,340 | 100,837,832 | 4,042,492 |
| 2010 | 110,276,690 | 117,551,458 | 7,274,768 |
| 2015 | 124,077,275 | 132,855,128 | 8,777,853 |
| 2019 | 135,326,598 | 146,587,566 | 11,260,968 |
Which country has the largest population of migrant stock for each survey year?
# Limited to combined data, return a sub-data table of Country & Migrant Count
# for the country with the largest migrant count, by survey year, to kable
knitr::kable(UNData[Sex == "Combined",
.SD[which.max(MigCount)],
by = Year,
.SDcols = c("Country", "MigCount")],
format.args = list(big.mark = ","))
| Year | Country | MigCount |
|---|---|---|
| 1990 | United States of America | 23,251,026 |
| 1995 | United States of America | 28,451,053 |
| 2000 | United States of America | 34,814,053 |
| 2005 | United States of America | 39,258,293 |
| 2010 | United States of America | 44,183,643 |
| 2015 | United States of America | 48,178,877 |
| 2019 | United States of America | 50,661,149 |
Which country has the smallest population of migrant stock for each survey year?
# Limited to combined data, return a sub-data table of Country & Migrant Count
# for the country with the smallest migrant count, by survey year, to kable
knitr::kable(UNData[Sex == "Combined",
.SD[which.min(MigCount)],
by = Year,
.SDcols = c("Country", "MigCount")],
format.args = list(big.mark = ","))
| Year | Country | MigCount |
|---|---|---|
| 1990 | Saint Helena | 178 |
| 1995 | Saint Helena | 117 |
| 2000 | Saint Helena | 108 |
| 2005 | Saint Helena | 163 |
| 2010 | Tuvalu | 220 |
| 2015 | Tuvalu | 230 |
| 2019 | Tuvalu | 238 |
Which country has the highest population percentage of migrant stock for each survey year?
# Limited to combined data, return a sub-data table of Country & Migrant Count
# for the country with the smallest migrant count, by survey year, to kable
knitr::kable(UNData[Sex == "Combined" & PopCount > 0,
.SD[which.max(MigCount / PopCount)],
keyby = Year,
.SDcols = c("Country", "MigCount", "PopCount")
][, Ratio := MigCount / PopCount],
format.args = list(big.mark = ","))
| Year | Country | MigCount | PopCount | Ratio |
|---|---|---|---|---|
| 1990 | Bonaire, Sint Eustatius and Saba | 38,410 | 13,000 | 2.9546154 |
| 1995 | Bonaire, Sint Eustatius and Saba | 42,258 | 15,000 | 2.8172000 |
| 2000 | Bonaire, Sint Eustatius and Saba | 46,105 | 14,000 | 3.2932143 |
| 2005 | Bonaire, Sint Eustatius and Saba | 62,205 | 14,000 | 4.4432143 |
| 2010 | United Arab Emirates | 7,316,697 | 8,549,000 | 0.8558541 |
| 2015 | United Arab Emirates | 7,995,126 | 9,262,000 | 0.8632181 |
| 2019 | United Arab Emirates | 8,587,256 | 9,770,000 | 0.8789412 |
Which country has the lowest population percentage of migrant stock for each survey year?
# Limited to combined data, return a sub-data table of Country & Migrant Count
# for the country with the smallest migrant count, by survey year, to kable
knitr::kable(UNData[Sex == "Combined" & PopCount > 0,
.SD[which.min(MigCount / PopCount)],
keyby = Year,
.SDcols = c("Country", "MigCount", "PopCount")
][, Ratio := MigCount / PopCount],
format.args = list(big.mark = ","))
| Year | Country | MigCount | PopCount | Ratio |
|---|---|---|---|---|
| 1990 | China | 376,361 | 1,197,362,000 | 0.0003143 |
| 1995 | China | 442,198 | 1,262,276,000 | 0.0003503 |
| 2000 | China | 508,034 | 1,312,517,000 | 0.0003871 |
| 2005 | China | 678,947 | 1,353,482,000 | 0.0005016 |
| 2010 | Cuba | 6,640 | 11,225,000 | 0.0005915 |
| 2015 | Cuba | 5,312 | 11,324,000 | 0.0004691 |
| 2019 | Cuba | 4,886 | 11,333,000 | 0.0004311 |
Which SDG Region has the highest population percentage of migrant stock for each survey year?
# Limited to combined data, first calculate a ratio by year and by SDG region.
# Then chain that to a call returning the largest ratio by year piped to kable.
knitr::kable(UNData[
Sex == "Combined" & PopCount > 0,
.(Ratio = sum(MigCount, na.rm = TRUE) / sum(PopCount, na.rm = TRUE)),
by = c("Year", "SDG Region")
][,.SD[which.max(Ratio)], by = "Year"])
| Year | SDG Region | Ratio |
|---|---|---|
| 1990 | Oceania | 0.1450105 |
| 1995 | Oceania | 0.1419453 |
| 2000 | Oceania | 0.1403243 |
| 2005 | Oceania | 0.1462045 |
| 2010 | Oceania | 0.1571951 |
| 2015 | Oceania | 0.1638907 |
| 2019 | Oceania | 0.1706701 |
Which SDG Region has the lowest population percentage of migrant stock for each survey year?
# Limited to combined data, first calculate a ratio by year and by SDG region.
# Then chain that to a call returning the smallest ratio by year piped to kable.
knitr::kable(UNData[
Sex == "Combined" & PopCount > 0,
.(Ratio = sum(MigCount, na.rm = TRUE) / sum(PopCount, na.rm = TRUE)),
by = c("Year", "SDG Region")
][,.SD[which.min(Ratio)], by = "Year"])
| Year | SDG Region | Ratio |
|---|---|---|
| 1990 | Eastern and South-Eastern Asia | 0.0037196 |
| 1995 | Eastern and South-Eastern Asia | 0.0042764 |
| 2000 | Eastern and South-Eastern Asia | 0.0051381 |
| 2005 | Eastern and South-Eastern Asia | 0.0060918 |
| 2010 | Eastern and South-Eastern Asia | 0.0071524 |
| 2015 | Eastern and South-Eastern Asia | 0.0078413 |
| 2019 | Eastern and South-Eastern Asia | 0.0078371 |
This dataset is collated by OpenEI, and contains the sources of renewable energy by sector for the years 1989–2008. Like most of these untidy datasets, it is in wide format with multi-level headings. However, like the others, it is small, so a combination of intelligent ingestion and programmatic cleaning will be the quickest and most efficient way to perform the ETL.
The file will be read in its entirety, and then parsed to individual sector tables by inspection. The sectors will be added to the data, and then all will be combined into a single wide-format table. At this point, the data will be cleaned from footnotes and markers which are effectively zero finally, the data will be transformed into tidy-long format and any remaining numbers stored as character will be converted back to numbers. At this point the data is ready for analysis.
# Read
Raw <- fread("historicalrenewableenergyconsumptionbysectorandenergysource19892008.csv",
skip = 6L)
# Set column names
names(Raw) <- c("Source", 1989:2008)
# Parse by sector and add sector identifier
Residential <- Raw[15:17, ]
Residential[, Sector := "Residential"]
Commercial <- Raw[21:26, ]
Commercial[, Sector := "Commercial"]
Industrial <- Raw[30:36, ]
Industrial[, Sector := "Industrial"]
Transportation <- Raw[39, ]
Transportation[, Sector := "Transportation"]
ElecUtilities <- Raw[46:51, ]
ElecUtilities[, Sector := "ElecUtilities"]
ElecIPP <- Raw[55:60, ]
ElecIPP[, Sector := "ElecIPP"]
# Recombine
Power <- rbind(Residential, Commercial, Industrial, Transportation,
ElecUtilities, ElecIPP)
# Clean Values from * & - and extraneous footnotes
Power <- Power[, lapply(.SD, function(x) gsub("[*-]", "0", x))]
Power$Source <- gsub("[[:digit:]]", "", Power$Source)
# Reshape to long format
Power <- melt(Power, id.vars = c("Sector", "Source"),
variable.name = "Year",
variable.factor = FALSE,
value.name = "BTU_Q")
# Ensure numbers are numbers
Power$Year <- as.double(Power$Year)
Power$BTU_Q <- as.double(Power$BTU_Q)
The analysis asked for in the post was to “…analyze the data to figure out what type of renewable energy is used the most or measure the amount of renewable energy that each sector consumes per year.”
ggplot(Power[, .(BTU_Quad = sum(BTU_Q)), keyby = c("Year", "Sector")],
aes(x = Year, y = BTU_Quad, color = Sector)) + geom_path() +
scale_color_brewer(type = "qual", palette = "Dark2")
ggplot(Power[, .(BTU_Quad = sum(BTU_Q)), keyby = c("Year", "Source")],
aes(x = Year, y = BTU_Quad, color = Source)) + geom_path() +
scale_color_brewer(type = "qual", palette = "Set1")
The heaviest using sector is the electrical power utilities The heaviest used resource is hydroelectricity—whose usage tracks the utility usage closely. This stands to reason as the data indicates that the power utilities use hydro far and away as their largest source.
Another interesting point is that the rise in biomass usage mirrors the rise in the transportation sector. What was surprising to me was that the correspondence is 1:1. Only the transportation sector uses biomass, and it uses nothing else. This may be a miscoding or an older nomenclature, as more recent documentation indicates that biofuels are used in the transportation industry. See this briefing by the Environmental and Energy Study Institute, as an example.
Avraham Adler, A Survey of Approaches to a Changepoint Problem in an Actuarial Context, Variance, 9(1): 64–100, 2015. http://www.variancejournal.org/issues/?fa=article&abstrID=7179↩︎