The main question at hand is: How can we analyze the inputs towards climate change and what possible remedies might have the largest impact on reducing our global CO2 emissions? Namely, how can we target specific industries, particularly focusing on the impact they will have on CO2 emissions. We dived into the Energy sector and the transition to sustainable energy sources, particularly the role electrification and electric vehicles may play in reducing these emissions.
Therefore we came up with a work plan: heads up that our work plan is not necessarily following the flow of our presentation. It is because we wanted to write a report that makes sense and is easy to understand; therefore the steps which we took in our work plan is not the same as the flow of our presentation.
We came up with the following work plan:
Data Acquisition and Preparation:
Collect diverse data sources including EIA reports, CO2 emissions data, and EV sales figures.
Preprocess textual data using NLP techniques to extract key insights and topics for analysis.
Analysis of CO2 Emissions Trends:
Analyze historical CO2 emissions trends in both developed and emerging economies.
Visualize data to identify patterns and variations over time, correlating with socio-economic factors.
Evaluation of EV Impact on Emissions Reduction:
Assess the impact of EV sales trends on CO2 emissions reduction: would there be any trend?.
Clustering and Categorization of Electric Vehicles:
Employ clustering algorithms like PCA to categorize EVs based on attributes.
Evaluate characteristics of each EV cluster to understand performance metrics.
Synthesis and Interpretation of Findings (in each step in our report we analyzed our findings):
Synthesize insights from CO2 emissions analysis, EV impact assessment, and EV clustering.
Documentation and Reporting:
Document methodology, data sources, and findings comprehensively.
Present findings through visualizations and presentations for transparency and knowledge dissemination.
Climate change is a heavily debated topic in our modern society. Here we will attempt to look at the role specific industries play in contributing to climate change. This will allow us to be able to select key industries that contribute towards climate change, giving us the opportunity to cater solutions specifically to these industries in hopes of slowing the rate at which we are warming the Earth.
We started by web scraping data from “climatedata.imf.org” in order to find information on greenhouse gas emissions by country and industry in million metric tons of CO2 from 1970 to 2022. After doing some cleaning on the data we were left with the following:
Code
library(tidyverse)library(readr)# Retrieving Data on Emissions by Countryurl2 ="https://opendata.arcgis.com/datasets/72e94bc71f4441d29710a9bea4d35f1d_0.csv"destfile ="National_GreenhouseGas_Emissions_Country"curl::curl_download(url2, destfile)# Cleaning and converting dataset to a long dataframeNational_GreenhouseGas_Emissions_Country1 <-read_csv(destfile) %>% dplyr::select(-ObjectId, -ISO2, -ISO3, -Source, -CTS_Name, -CTS_Full_Descriptor, -Indicator, -Scale, -CTS_Code, -F2023, -F2024, -F2025, -F2026, -F2027, -F2028, -F2029, -F2030, -Unit) %>% dplyr::rename("Gas Type"= Gas_Type)colnames(National_GreenhouseGas_Emissions_Country1) <-gsub("F","",colnames(National_GreenhouseGas_Emissions_Country1))National_GreenhouseGas_Emissions_Country1$Country <-gsub(",.*| Rep\\. of| Rep\\.|Arab Rep\\. of", "", National_GreenhouseGas_Emissions_Country1$Country)National_GreenhouseGas_Emissions_Country1$Industry <-gsub("^\\S*\\s", "", National_GreenhouseGas_Emissions_Country1$Industry)Cleaned_Emissions <- National_GreenhouseGas_Emissions_Country1 %>% tidyr::pivot_longer(cols =-c('Country', 'Gas Type', 'Industry'), names_to ="Year",values_to ="Emissions",values_drop_na =FALSE) %>% dplyr::mutate(Combined_Industry =case_when( Industry %in%c("Energy", "Energy Industries") ~"Energy", Industry %in%c("Transport", "Road Transportation", "Domestic Navigation", "Other Transportation", "Domestic Aviation", "Railways", "Other Transportation", "CO2 Transport and Storage", "Fuel Combustion Activities") ~"Transportation", Industry %in%c("Manufacturing Industries and Construction", "Other Product Manufacture and Use") ~"Manufacturing Industry", Industry %in%c("Industrial Processes and Product Use", "Other Industrial Processes") ~"Industrial Processes", Industry %in%c("Other", "Other (Not specified elsewhere)", "Non-energy Products from Fuels and Solvent Use", "Fugitive Emissions from Fuels", "Product Uses as Substitutes for ODS", "Applicable") ~"Other", Industry %in%"Buildings and other Sectors"~"Buildings and other Sectors", Industry %in%"Chemical Industry"~"Chemical Industry", Industry %in%"Land-use, land-use change and forestry"~"Land-use, land-use change and forestry", Industry %in%"Mineral Industry"~"Mineral Industry", Industry %in%"Metal Industry"~"Metal Industry", Industry %in%"Electronics Industry"~"Electronics Industry", Industry %in%"Agriculture"~"Agriculture", Industry %in%"Waste"~"Waste")) %>% dplyr::group_by(Country, Combined_Industry, Year) %>% dplyr::summarize(Total_Emissions =sum(Emissions, na.rm =TRUE), .groups ="keep") %>% dplyr::rename("Industry"= Combined_Industry) %>% dplyr::select(Year, everything()) %>% dplyr::filter(Total_Emissions !=0) %>% dplyr::arrange(Year)library(ggplot2)library(plotly)Cleaned_Emission_graph2 <- Cleaned_Emissions %>% dplyr::filter(Country %in%c("United States", "China", "India", "Australia and New Zealand", "Canada", "United Kingdom", "France", "Mexico")) %>% dplyr::filter(Industry !="Land-use, land-use change and forestry") %>%ggplot(aes(x = Country,y = Total_Emissions,fill = Industry )) +geom_bar(stat ="identity") +labs(title ="Total Emissions by Country and Industry", ) +ylab("Million Metric Tons of CO2") +scale_y_continuous(labels = scales::number_format()) +theme(axis.text.x =element_text(angle =45, hjust =1))Cleaned_Emission_graph2
This contained emissions data on 235 different countries around the world based on industry and also the type of gas producing these emissions from 1970 to 2022. We aggregated all the different types of gases into one metric, “Total Emissions” to simplify our analysis as we are trying to target specific industries. For visual purposes, we selected a few developed countries and noticed that China, India, and the United States have significantly higher global emissions, likely due to the size of their economies and population.
We then proceeded to scrape more data from “climatedata.imf.org” where we found data on surface temperature changes from 1970 to 2022 by country. This allowed us to then compare these changes in surface temperature to the emissions data above.
Code
# Retrieving Data on Temperature change by countrylibrary(rvest)library(RSelenium)url ="https://climatedata.imf.org/datasets/4063314923d74187be9596f10d034914_0/explore"rD <-rsDriver(browser ='firefox', chromever =NULL, verbose =FALSE)remDr <- rD[["client"]]Sys.sleep(2)remDr$navigate(url)Sys.sleep(8)for (i in1:10){ remDr$executeScript("document.querySelector('.infinite-scroll-container').scrollTop += 5000;")Sys.sleep(2)}html <- remDr$getPageSource()remDr$close()# Cleaning the data and converting it into a long data framesurface_temp_data <- rvest::read_html(html[[1]]) %>% rvest::html_elements(css ="table") %>% rvest::html_table() %>%as.data.frame() %>% dplyr::select(-CTS.Full.Descriptor, -CTS.Code, -Source, -Indicator, -ISO2, ISO3, -CTS.Name, -ISO3, -X1961, -X1962, -X1963, -X1964, -X1965, -X1966, -X1967, -X1968, -X1969, -Unit)colnames(surface_temp_data) <-gsub("X","",colnames(surface_temp_data))surface_temp_data <- surface_temp_data %>% tidyr::pivot_longer(cols =-c("Country"),names_to ="Year",values_to ="Temp Change",values_drop_na =FALSE) %>% dplyr::select(Year, everything())surface_temp_data$Country <-gsub(",.*| Rep\\. of| Rep\\.|Arab Rep\\. of", "", surface_temp_data$Country)surface_temp_data_graph <- surface_temp_data %>% dplyr::filter(Country %in%c("United States", "India", "Australia and New Zealand", "Canada", "United Kingdom", "France", "Mexico")) %>%ggplot(aes(x = Year,y =`Temp Change`,color = Country )) +geom_bar(stat ="identity", position ="dodge") +labs(title ="Temperature Change by Country 1970 - 2022",x ="Year",y ="Temperature Change (Degrees C)" ) +theme(axis.text.x =element_text(angle =90, hjust =1))surface_temp_data_graph
After joining our two data sets together we then grouped by country, industry, and year. This allowed us to transform our data from a long data frame into a wide one illustrating the emissions produced by different industries shown on the graph below. We noticed a significant trend upwards in temperature change from 1970 to 2022 and decided to investigate this further.
Code
# Joining the two data sets togethercleaned_data <- Cleaned_Emissions %>% tidyr::pivot_wider(names_from = Industry, values_from = Total_Emissions) %>% dplyr::right_join(surface_temp_data, by =c("Year", "Country")) %>% tidyr::drop_na() %>%distinct(Country, Year, .keep_all =TRUE)cleaned_data$Year <-as.Date(paste0(cleaned_data$Year, "-01-01"))cleaned_data_graph <- Cleaned_Emissions %>% dplyr::right_join(surface_temp_data, by =c("Year", "Country")) %>% dplyr::filter(Country %in%c("United States", "India", "Australia and New Zealand", "Canada", "United Kingdom", "France", "Mexico")) %>%ggplot(aes(x = Year,y =`Temp Change`,color = Country,size = Total_Emissions )) +geom_point() +labs(title ="Temp Change by Country, Industry, and Emission", ) +ylab("Temperature Change (Degrees C)") +theme(axis.text.x =element_text(angle =90, hjust =1))cleaned_data_graph
Above is another visual representation we created that allowed us to easily see this trend and also incorporated the emissions generated by each country in the respective size of each data point. To try and capture more of this trend we decided to run a linear regression analysis to see how well our emissions data captured this change in temperature.
# Linear Modellibrary(parsnip)model_lm <- parsnip::linear_reg(mode ="regression") %>% parsnip::set_engine("lm") %>% parsnip::fit(`Temp Change`~ ., data = train_baked)summary(model_lm$fit)
Call:
stats::lm(formula = `Temp Change` ~ ., data = data)
Residuals:
Min 1Q Median 3Q Max
-2.7177 -0.6852 -0.0301 0.5788 3.8480
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.070e-16 3.520e-02 0.000 1.000000
Agriculture -1.655e+00 1.267e+00 -1.306 0.191818
`Buildings and other Sectors` -4.768e-01 4.666e-01 -1.022 0.307162
`Chemical Industry` -4.932e-01 4.947e-01 -0.997 0.319050
`Electronics Industry` 1.839e-01 1.398e-01 1.315 0.188884
Energy -1.141e+01 3.718e+00 -3.069 0.002232
`Industrial Processes` 1.887e+00 1.186e+00 1.591 0.112092
`Manufacturing Industry` -2.592e+00 7.294e-01 -3.553 0.000405
`Metal Industry` 3.875e-01 2.695e-01 1.438 0.150964
`Mineral Industry` -3.496e-01 9.565e-01 -0.366 0.714830
Other 1.794e+01 6.497e+00 2.761 0.005910
Transportation -4.602e-01 1.641e+00 -0.281 0.779160
Waste -2.970e+00 8.073e-01 -3.679 0.000252
`Land-use, land-use change and forestry` -3.479e-01 1.568e-01 -2.219 0.026834
(Intercept)
Agriculture
`Buildings and other Sectors`
`Chemical Industry`
`Electronics Industry`
Energy **
`Industrial Processes`
`Manufacturing Industry` ***
`Metal Industry`
`Mineral Industry`
Other **
Transportation
Waste ***
`Land-use, land-use change and forestry` *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9466 on 709 degrees of freedom
Multiple R-squared: 0.1201, Adjusted R-squared: 0.1039
F-statistic: 7.442 on 13 and 709 DF, p-value: 7.585e-14
We noticed that certain industries, namely the Manufacturing Industry as well as Industrial Processes seem to be statistically significant in our model. However, we also noticed that our adjusted R-squared is particularly low at 6.37%. This signifies that there are other factors contributing to the changes in surface temperature that we have not accounted for in our model and from here we should try and do some more research into what other factors may be causing these changes in surface temperature. We did however want to try and expand further on the data we already collected, in doing so we attempted to perform some machine learning and feature engineering in order to try and predict how the surface temperatures may react to changes in emissions from these industries.
# A tibble: 3 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 mae standard 0.750
2 rmse standard 0.967
3 rsq standard 0.0660
After performing some predictions and further analysis on these predictions we found that they had little significance in their ability to form accurate predictions of surface temperature with relatively high MAE and RMSE metrics.
We then looked at performing a Bayesian regression in order to see if that would fit our data better and give us a better representation on how these industries affect changes in surface temperature, including seeing if predictions from this model would yield better results.
# A tibble: 3 × 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 mae standard 0.736
2 rmse standard 0.964
3 rsq standard 0.0673
This proved to give us a similar output as our linear regression siggesting that our data does indeed fit a linear model the best. We did notice that the confidence intervals between estimates generally have large ranges given our range of temperature changes in the original data is quite small.
Plotting these estimates along with their confidence interval in a whisker plot gave us some more insight into which industries have the largest confidence intervals. We noticed that the Industrial Processes, Energy, Transportation, and Waste industries had the widest confidence intervals.
We then decided to go back to the original data to see try and investigate these 4 industries further. After graphing it on a scatter plot we noticed that the industries with the largest emissions were in fact the Transportation and Energy industries. Without finding more factors that may be connected to changes in surface temperature we relied on empirical evidence that suggests emissions do in fact have an impact on surface temperature changes. From this we dived deeper into both the Energy and Transportation industries where we looked at potential remedies that may reduce emissions from both of these sectors, namely the transition towards electrification and electric vehicles.
remDr$open()# Function to extract paragraphs from a webpageextract_paragraphs <-function(url) { remDr$navigate(url) page_source <- remDr$getPageSource()[[1]] page <-read_html(page_source) paragraphs <-html_nodes(page, "p")html_text(paragraphs)}# List of linkslinks <- href_elements_html# Extract paragraphs from each linkparagraphs_list <-lapply(links, extract_paragraphs)remDr$close()paragraphs_list <-gsub("(united states|United states|United States)", "US", paragraphs_list, ignore.case =TRUE)paragraphs_list <- text_cleaned <-gsub("(energy consumption)", "EC", paragraphs_list, ignore.case =TRUE)paragraphs_list <- text_cleaned <-gsub("\\\\r\\\\n\\\\t", " ", paragraphs_list, ignore.case =TRUE) paragraphs_list <-gsub("(\\\\| tttttttt | \n\ | \ |)", "", paragraphs_list)pattern <-"(t{3,})"paragraphs_list <-gsub(pattern, "", paragraphs_list)paragraphs_list_for_cleaning <-tolower(paragraphs_list)
Code
#cleaning phase - in this phase we remove the stop words, the numbers, puncuation, and extra space in the sentences.corpus <-VCorpus(VectorSource(paragraphs_list_for_cleaning))corpus <-tm_map(corpus, removePunctuation)corpus <-tm_map(corpus, removeWords, stopwords())corpus <-tm_map(corpus, stemDocument)corpus <-tm_map(corpus, stripWhitespace)corpus <-tm_map(corpus, removeNumbers)cleaned_text <-sapply(corpus, as.character)Sys.sleep(2)# the two below paragraphs were repeated everywhere, so this would cause a problem in my NLP model because the number of occurance is very the bedrock of NLP so I have to get rid of these paragraphs. Also all the words are in its root format (ie included, including include - all of them might be as includ)# this way we could get a good representation of where the words fall.cleaned_text <-gsub(c("cmenu crude oil gasolin heat oil diesel propan liquid includ biofuel natur gas liquid explor reserv storag import export product price sale sale revenu price power plant fuel use stock generat trade demand emissionsn energi use home commerci build manufactur transport reserv product price employ product distribut stock import export includ hydropow solar wind geotherm biomass ethanol uranium fuel nuclear reactor generat spent fuel comprehens data summari comparison analysi project integr across energi sourc month year energi forecast analysi energi topic financi analysi congression reportsn"), " ",cleaned_text)Sys.sleep(2)cleaned_text <-gsub(c("financi market analysi financi data major energi compani greenhous gas data voluntari report electr power plant emiss map tool resourc relat energi disrupt infrastructur state energi inform includ overview rank data analys map energi sourc topic includ forecast map intern energi inform includ overview rank data analys region energi inform includ dashboard map data analys tool custom search view specif data set studi detail document access timeseri data free open data avail api excel addin bulk file widget come test product still develop let us know think form use collect energi data includ descript link survey instruct addit inform sign email subcript receiv messag specif product subscrib feed updat product includ today energi what new short time articl graphic energi fact issu trend lesson plan scienc fair experi field trip teacher guid career corner report request congress otherwis deem import"), " ",cleaned_text)Sys.sleep(2)pattern <-"\\b\\w{16,}\\b"cleaned_text <-gsub(pattern, "", cleaned_text)Sys.sleep(2)cleaned_text[-c(2, 6, 8, 10, 41, 42, 43, 44, 45, 46, 48)]str_count(cleaned_text)#here I got rid of the repeatitive words and short formats both of these are meaningless in the context of our analysis.cleaned_text <-gsub(c("eia | ieo"), "", cleaned_text)
Code
dtm <-DocumentTermMatrix(Corpus(VectorSource(cleaned_text)))dtm <-removeSparseTerms(dtm, 0.999)datasett <-as.data.frame(as.matrix(dtm))filtered_matrix <- datasett[rowSums(datasett !=0) >0, ]ap_topic_model <- topicmodels::LDA(filtered_matrix, k =18, control =list(seed =321))#Running our topic modelSys.sleep(2)AP_topics <- tidytext::tidy(ap_topic_model, matrix ="beta")ap_top_terms <- AP_topics %>%group_by(topic) %>%top_n(10, beta) %>%ungroup() %>%arrange(topic, -beta)Sys.sleep(2)first_plot <- ap_top_terms %>%mutate(term =reorder(term, beta)) %>%mutate(topic =paste("Topic #", topic)) %>%ggplot(aes(term, beta, fill =factor(topic))) +geom_col(show.legend =FALSE)+facet_wrap(~ topic, scales ="free")+theme_minimal()+theme(plot.title =element_text(hjust =0.5, size=18),axis.text.y =element_text(size =5),axis.text.x =element_text(size =5))+labs(title ="Most relevent terms grouped together", caption ="Top terms by topic (betas)")+ylab("")+xlab("")+coord_flip()# AEO - the full form is annual energy outlook
Energy is the cornerstone of modern civilization, driving economic prosperity, technological innovation, and societal well-being. Despite its paramount importance, energy remains a sensitive topic often overshadowed by geopolitical tensions, environmental concerns, and social inequalities. Historical incidents, such as the oil crises of the 1970s and nuclear disasters like Chernobyl and Fukushima, underscore the complexities and risks associated with energy production and consumption. As we confront the challenges of climate change and strive for a sustainable future, it is imperative to recognize the centrality of energy in shaping our world and engage in open, informed dialogue to navigate the complexities and opportunities it presents.
With that in mind we web scraped the EIA website to understand the prospects of what does the analysis say about energy sector in USA. Generally, these analysis are long and one analysis talks about different areas, hence difficult to get a general idea about what it is all about. Therefore, we used natural language processing (NLP) to break them into root words, and put them in different topic brackets. We used topic modelling in this case and after cleaning and editing the analysis we generated this chart. All the words are in its root form meaning that for the purpose of NLP analysis we changed the words ( included/including/include = includ) to its root form. This way we would be able to fairly compare and catagorize the words to related topic
Code
first_plot
In the above chart we used Latent Dirichlet Allocation (LDA) model to create a better fit for each word across all articles we web scrapped. It is a mixed membership model where one word could be shared with one or many topics. As a result we chose the number of topics and have a list of words associated with one or more topics. (for the context AEO stands for annual energy outlook.)
How did we do it?
We cleaned the data and changed it to document term matrix: which is a representation of a corpus (collection of texts) in a matrix format. the rows are the paragraphs/articles, and the columns are each words/phrase. Looking at the graph, the horizontal axis is the Beta: the probability of each word being related to each topic
we see that topic 3 has something related to electric sector and fuel, lets dive deep and explore that specific topic
From the chart description, it seems that the topic modeling analysis has revealed several key insights into the energy sector of the USA. Contrary to the mass media propagation and outlet, the analysis says something totally different, what we realized is that the focus is more on policy change, and increase in demand of energy in the form of electricity.
Overall, the LDA model’s findings highlight the complexity and multifaceted nature of the energy sector in the USA, encompassing production, economic implications, resource management, and policy impact.
Speaking of Energy, CO2 emission is the main concern that has caused the shift towards sustainable energy; be it a political agenda or anything related to that: from here onwards we will measure the progress of countries that advocate for decarbonization.
# I filtered China and India because they polluted so much so that it would mess up with the scale.National_GreenhouseGas_Emissions_Country <- National_GreenhouseGas_Emissions_Country %>%mutate( Country =gsub(",.*| Rep\\. of| Rep\\.|Arab Rep\\. of", "", Country)) %>%as.data.frame() %>%filter(! Country %in%c("China", "India")) %>%select(-c(2, 3, 4))
Code
filteredd <- National_GreenhouseGas_Emissions_Country %>%select("Country", "2020")filteredd <- filteredd %>%rename(Emission ="2020")filteredd <-aggregate(Emission ~ Country, data = filteredd, FUN = sum) %>%rename(region = Country)# Get world map dataworld_map <-map_data("world")merged_data <-left_join(world_map, filteredd, by ="region")# Plot the map with emissionsmap2020 <- merged_data %>%ggplot(aes(x=long, y=lat, group = group, text = region))+geom_polygon(aes(fill = Emission), color ="Black") +scale_fill_gradient(name ="Emission 2020", low ="yellow", high ="red", na.value ="grey50")+theme(axis.text.x =element_blank(),axis.text.y =element_blank(), )map2020 <-ggplotly(map2020)
Code
filteredd1 <- National_GreenhouseGas_Emissions_Country %>%select("Country", "2008")filteredd1 <- filteredd1 %>%rename(Emission ="2008")filteredd1 <-aggregate(Emission ~ Country, data = filteredd1, FUN = sum) %>%rename(region = Country)# Get world map dataworld_map1 <-map_data("world")merged_data1 <-left_join(world_map, filteredd1, by ="region")# Plot the map with emissionsmap2008 <- merged_data1 %>%ggplot(aes(x=long, y=lat, group = group, text = region))+geom_polygon(aes(fill = Emission), color ="Black") +scale_fill_gradient(name ="Emission", low ="yellow", high ="red", na.value ="grey50")+theme(axis.text.x =element_blank(),axis.text.y =element_blank(), )map2008 <-ggplotly(map2008)
Code
subplot(map2020, map2008,nrows =2) %>%layout(title ="World emission in Million metric tons of CO2 equivalent (2020 on top) (2008 in bottom)")
We purposefully excluded India and China because of their high emission other countries would not event seem close, in other words it would greatly skew our scale. Also interestingly the Emission of USA was included as ‘Advanced economies’ so we assumed their emission is aggregated with other economies (perhaps Congo or central Africa to be taken as irony because USA tends to hide its mess all the time and blames others) and did not include it.
We can see that most of developed economies’ emission decreased, whereas emerging economies’ emission has increased over time. Now lets look into a different aspect of CO2 emission. Before going to the next paragraph please play around with the graph below; looks like it has less information but in fact we simplified it for better result.
plot <- final_dataa %>%ggplot(aes(x=evsales, y=Emission, col =factor(Year))) +geom_point() +labs(title ="Global EV sales and Emission", col ="Year")plotly_plottt <-ggplotly(plot)
Code
plotly_plottt
For this part we took the emission of the countries that we gathered their EV sales. This way we could see if the EV usage overtime has actually correlated to decrease in their emission.
The sale/use of EV in economies that sell EV moves at the opposite direction as the CO2 emission but not too aggressively, and the only time the emission decreased significantly was during covid19 lock-down in 2020. Given that we had 5 data points we did not bother coding for correlation rather a visual would convey the message better.
Therefore from both the graphs (the maps and the points) we can conclude that developed economies are indeed heading towards decolonization however the progress is very slow and gradual.
The energy sector discussions likely involve fossil fuels (such as oil and gas). When considering EV, a crucial shift is the transition from fossil fuel-based energy sources to cleaner alternatives. EVs primarily rely on electricity, which can be generated from renewable sources like solar, wind, or hydroelectric power. This transition impacts both the energy sector and the EV industry.
In the universe of energy, electric cars aren’t just a new vehicle option; they’re a game-changer. By tapping into the power grid instead of the gas station, these cars are reshaping our approach to transportation and energy consumption. They’re not just driving us forward; they’re propelling us toward a more sustainable future.
Despite their growing popularity, there’s still much we don’t know about electric cars. While they hold promise for reducing emissions and dependence on fossil fuels, their widespread adoption presents a host of unanswered questions. How will they impact the electricity grid? What are the long-term environmental implications of their production and disposal? These are all possible questions to ask, the more important question is rather how to understand these cars because they are not the same as combustion engine cars; not even close.
Within the realm of electric cars, a significant challenge lies in the absence of direct comparisons to traditional combustion engine vehicles. Unlike their gasoline-powered counterparts, electric cars possess unique characteristics that defy conventional metrics of comparison. Factors such as range, charging time, and efficiency take on new dimensions in the context of electric propulsion, requiring novel methodologies for evaluation.
From the above graph we can see that electric vehicles are different beasts when it comes to understanding their nature, If you double click on the bubble right next of the car model on the side it would remove other vehicles and then you can select your desired models and compare them. All of these EVs are the models that are in the market or will be in the market in coming years. So far we have ~ 345 models in this graph.
so why not categorize them based on their attributes: we used all the attributes such as usable battery, 0-100 acceleration, top speed, Range, efficiency, charging speed, price, and drive-train to cluster them. This way we would have a better understanding of which model falls where.
Code
Sys.sleep(1)clustered <- car_results %>%replace(is.na(.), 0) %>%select(-c("Model", "Variant"))clustered <- clustered %>%scale()if (!require(factoextra)) {install.packages("factoextra")install.packages("gridExtra")}library(factoextra)library(cluster)library(gridExtra)# Calculate the clustering indicesalphaa <-fviz_nbclust(clustered, kmeans, method ="wss")+theme(plot.title =element_blank())charliee <-fviz_nbclust(clustered, kmeans, method ="silhouette")+theme(plot.title =element_blank())betaa <-fviz_nbclust(clustered, kmeans, method ="gap_stat")+theme(plot.title =element_blank())# Arrange the plots next to each othersuppressMessages({ k <-grid.arrange(alphaa, charliee, betaa, ncol =3)})
Code
k
TableGrob (1 x 3) "arrange": 3 grobs
z cells name grob
1 1 (1-1,1-1) arrange gtable[layout]
2 2 (1-1,2-2) arrange gtable[layout]
3 3 (1-1,3-3) arrange gtable[layout]
Looking at the analysis we can cluster our list of electric cars into 3 main catagories.
Code
fviz_cluster(kmeans(clustered, centers =3, iter.max =150, nstart =150), data = clustered)
We can see from the above graph that the first principle component is in the vertical axis and second principle component is in the horizontal axis. And since the first PC explains more than half of our data, we divide our data based on it into three main categories; the result is three clusters.
Both Range and Efficiency are vital considerations for electric car buyers, and they are relatively independent of each other. While Range focuses on the total distance a car can travel on a single charge, Efficiency provides insight into how efficiently the car utilizes its battery capacity to cover that distance. These two variables offer complementary information for evaluating the practicality, cost-effectiveness, and environmental footprint of electric vehicles.
Efficiency in electric cars is expressed in terms of energy consumption per unit distance traveled. In our case its measured in watt-hours per kilometer. For example an efficiency of 286 Wh/km means that the car consumes 286 watt-hours of energy to travel a distance of 1 km.
Higher efficiency values indicate that the car can travel longer distances using the same amount of battery capacity, which is desirable for maximizing the range and reducing operating costs of electric vehicles. Efficiency is influenced by various factors such as vehicle weight, aerodynamics, tire rolling resistance, driving behavior, and environmental conditions.
The range of an electric car refers to the distance it can travel on a single charge of its battery. It’s a critical specification for electric vehicles as it directly impacts their usability and practicality for everyday driving and long-distance travel. For instance a Professor who lives in Calgary and need to travel to Edmonton (300 km) would need an electric car that surpasses that range. The best car for such a person would be Lucid Air Grand Touring with price 132,000 EUR; but the problem is trade off between Range and efficiency. High efficient cars have lower range and the other way around. So the optimum models are the cluster 1. If A person can afford more luxurious cars he should go for cluster 2, if looking for affordable then cluster 3. Therefore we can see that how clustering make our life easier by helping us make decisions based on constraints we have.
Hence, our findings could be summarized into the following points and as a result answer the questions we asked at the beginning:
a. Electrification and the adoption of electric vehicles can significantly impact CO2 emissions. By shifting from fossil fuel-based energy sources to cleaner alternatives and promoting the use of electric vehicles, there is potential for substantial reductions in emissions.
b. Understanding consumer attitude and behaviors towards electric vehicles is crucial for analyzing their uptake and identifying barriers to adoption. Factors such as vehicle cost, range anxiety, charging infrastructure availability based on charging speed and range, and perceived environmental benefits influence consumer decisions, these factors can inform strategies to promote electric vehicle adoption.
Electric cars may be a crucial part of reducing our emissions to try and slow climate change in the near future. This may require some serious thought about how to lay out the infrastructure so we can easily transition into a world with more electrification. Who knows, research being done into other renewable sources of energy may overtake electrification before we can create a sustainable and viable infrastructure platform to support the demand these vehicles require. Today, we are already noticing rolling blackouts becoming more popular in places like California as consumers are putting too much load on the existing power grid. Car manufacturers like Toyota are already putting all of their focus into researching hydrogen powered vehicles as an alternative to EV’s. Some speculators say that the EV is simply a transition vehicle while we try to find a more sustainable method of power generation, much like the ATRAC was a short lived transition towards CD’s and eventually digital media.
Source Code
---title: "Climate Change and the Push Towards Electrificaion"author: "Aftikhar Mominzada and Carter Vereschagin"format: html: code-fold: true code-tools: trueeditor: visualself-contained: true---**The main question at hand is**: How can we analyze the inputs towards climate change and what possible remedies might have the largest impact on reducing our global CO2 emissions? Namely, how can we target specific industries, particularly focusing on the impact they will have on CO2 emissions. We dived into the Energy sector and the transition to sustainable energy sources, particularly the role electrification and electric vehicles may play in reducing these emissions.\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_Therefore we came up with a work plan: heads up that our work plan is not necessarily following the flow of our presentation. It is because we wanted to write a report that makes sense and is easy to understand; therefore the steps which we took in our work plan is not the same as the flow of our presentation.[We came up with the following work plan:]{.underline}1. Data Acquisition and Preparation:- Collect diverse data sources including EIA reports, CO2 emissions data, and EV sales figures.- Preprocess textual data using NLP techniques to extract key insights and topics for analysis.2. Analysis of CO2 Emissions Trends:- Analyze historical CO2 emissions trends in both developed and emerging economies.- Visualize data to identify patterns and variations over time, correlating with socio-economic factors.3. Evaluation of EV Impact on Emissions Reduction:- Assess the impact of EV sales trends on CO2 emissions reduction: would there be any trend?.4. Clustering and Categorization of Electric Vehicles:- Employ clustering algorithms like PCA to categorize EVs based on attributes.- Evaluate characteristics of each EV cluster to understand performance metrics.5. Synthesis and Interpretation of Findings (in each step in our report we analyzed our findings):- Synthesize insights from CO2 emissions analysis, EV impact assessment, and EV clustering.6. Documentation and Reporting:- Document methodology, data sources, and findings comprehensively.- Present findings through visualizations and presentations for transparency and knowledge dissemination.\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_Climate change is a heavily debated topic in our modern society. Here we will attempt to look at the role specific industries play in contributing to climate change. This will allow us to be able to select key industries that contribute towards climate change, giving us the opportunity to cater solutions specifically to these industries in hopes of slowing the rate at which we are warming the Earth.We started by web scraping data from "climatedata.imf.org" in order to find information on greenhouse gas emissions by country and industry in million metric tons of CO2 from 1970 to 2022. After doing some cleaning on the data we were left with the following:```{r, message = FALSE, warning = FALSE}library(tidyverse)library(readr)# Retrieving Data on Emissions by Countryurl2 = "https://opendata.arcgis.com/datasets/72e94bc71f4441d29710a9bea4d35f1d_0.csv"destfile = "National_GreenhouseGas_Emissions_Country"curl::curl_download(url2, destfile)# Cleaning and converting dataset to a long dataframeNational_GreenhouseGas_Emissions_Country1 <- read_csv(destfile) %>% dplyr::select(-ObjectId, -ISO2, -ISO3, -Source, -CTS_Name, -CTS_Full_Descriptor, -Indicator, -Scale, -CTS_Code, -F2023, -F2024, -F2025, -F2026, -F2027, -F2028, -F2029, -F2030, -Unit) %>% dplyr::rename("Gas Type" = Gas_Type)colnames(National_GreenhouseGas_Emissions_Country1) <- gsub("F","",colnames(National_GreenhouseGas_Emissions_Country1))National_GreenhouseGas_Emissions_Country1$Country <- gsub(",.*| Rep\\. of| Rep\\.|Arab Rep\\. of", "", National_GreenhouseGas_Emissions_Country1$Country)National_GreenhouseGas_Emissions_Country1$Industry <- gsub("^\\S*\\s", "", National_GreenhouseGas_Emissions_Country1$Industry)Cleaned_Emissions <- National_GreenhouseGas_Emissions_Country1 %>% tidyr::pivot_longer(cols = -c('Country', 'Gas Type', 'Industry'), names_to = "Year", values_to = "Emissions", values_drop_na = FALSE) %>% dplyr::mutate(Combined_Industry = case_when( Industry %in% c("Energy", "Energy Industries") ~ "Energy", Industry %in% c("Transport", "Road Transportation", "Domestic Navigation", "Other Transportation", "Domestic Aviation", "Railways", "Other Transportation", "CO2 Transport and Storage", "Fuel Combustion Activities") ~ "Transportation", Industry %in% c("Manufacturing Industries and Construction", "Other Product Manufacture and Use") ~ "Manufacturing Industry", Industry %in% c("Industrial Processes and Product Use", "Other Industrial Processes") ~ "Industrial Processes", Industry %in% c("Other", "Other (Not specified elsewhere)", "Non-energy Products from Fuels and Solvent Use", "Fugitive Emissions from Fuels", "Product Uses as Substitutes for ODS", "Applicable") ~ "Other", Industry %in% "Buildings and other Sectors" ~ "Buildings and other Sectors", Industry %in% "Chemical Industry" ~ "Chemical Industry", Industry %in% "Land-use, land-use change and forestry" ~ "Land-use, land-use change and forestry", Industry %in% "Mineral Industry" ~ "Mineral Industry", Industry %in% "Metal Industry" ~ "Metal Industry", Industry %in% "Electronics Industry" ~ "Electronics Industry", Industry %in% "Agriculture" ~ "Agriculture", Industry %in% "Waste" ~ "Waste")) %>% dplyr::group_by(Country, Combined_Industry, Year) %>% dplyr::summarize(Total_Emissions = sum(Emissions, na.rm = TRUE), .groups = "keep") %>% dplyr::rename("Industry" = Combined_Industry) %>% dplyr::select(Year, everything()) %>% dplyr::filter(Total_Emissions != 0) %>% dplyr::arrange(Year)library(ggplot2)library(plotly)Cleaned_Emission_graph2 <- Cleaned_Emissions %>% dplyr::filter(Country %in% c("United States", "China", "India", "Australia and New Zealand", "Canada", "United Kingdom", "France", "Mexico")) %>% dplyr::filter(Industry != "Land-use, land-use change and forestry") %>% ggplot( aes( x = Country, y = Total_Emissions, fill = Industry )) + geom_bar(stat = "identity") + labs( title = "Total Emissions by Country and Industry", ) + ylab("Million Metric Tons of CO2") + scale_y_continuous(labels = scales::number_format()) + theme(axis.text.x = element_text(angle = 45, hjust = 1))Cleaned_Emission_graph2```This contained emissions data on 235 different countries around the world based on industry and also the type of gas producing these emissions from 1970 to 2022. We aggregated all the different types of gases into one metric, "Total Emissions" to simplify our analysis as we are trying to target specific industries. For visual purposes, we selected a few developed countries and noticed that China, India, and the United States have significantly higher global emissions, likely due to the size of their economies and population.We then proceeded to scrape more data from "climatedata.imf.org" where we found data on surface temperature changes from 1970 to 2022 by country. This allowed us to then compare these changes in surface temperature to the emissions data above.```{r, message = FALSE, warning = FALSE}# Retrieving Data on Temperature change by countrylibrary(rvest)library(RSelenium)url = "https://climatedata.imf.org/datasets/4063314923d74187be9596f10d034914_0/explore"rD <- rsDriver(browser = 'firefox', chromever = NULL, verbose = FALSE)remDr <- rD[["client"]]Sys.sleep(2)remDr$navigate(url)Sys.sleep(8)for (i in 1:10){ remDr$executeScript("document.querySelector('.infinite-scroll-container').scrollTop += 5000;") Sys.sleep(2)}html <- remDr$getPageSource()remDr$close()# Cleaning the data and converting it into a long data framesurface_temp_data <- rvest::read_html(html[[1]]) %>% rvest::html_elements(css = "table") %>% rvest::html_table() %>% as.data.frame() %>% dplyr::select(-CTS.Full.Descriptor, -CTS.Code, -Source, -Indicator, -ISO2, ISO3, -CTS.Name, -ISO3, -X1961, -X1962, -X1963, -X1964, -X1965, -X1966, -X1967, -X1968, -X1969, -Unit)colnames(surface_temp_data) <- gsub("X","",colnames(surface_temp_data))surface_temp_data <- surface_temp_data %>% tidyr::pivot_longer(cols = -c("Country"), names_to = "Year", values_to = "Temp Change", values_drop_na = FALSE) %>% dplyr::select(Year, everything())surface_temp_data$Country <- gsub(",.*| Rep\\. of| Rep\\.|Arab Rep\\. of", "", surface_temp_data$Country)surface_temp_data_graph <- surface_temp_data %>% dplyr::filter(Country %in% c("United States", "India", "Australia and New Zealand", "Canada", "United Kingdom", "France", "Mexico")) %>% ggplot( aes( x = Year, y = `Temp Change`, color = Country )) + geom_bar(stat = "identity", position = "dodge") + labs( title = "Temperature Change by Country 1970 - 2022", x = "Year", y = "Temperature Change (Degrees C)" ) + theme(axis.text.x = element_text(angle = 90, hjust = 1))surface_temp_data_graph```After joining our two data sets together we then grouped by country, industry, and year. This allowed us to transform our data from a long data frame into a wide one illustrating the emissions produced by different industries shown on the graph below. We noticed a significant trend upwards in temperature change from 1970 to 2022 and decided to investigate this further.```{r, message = FALSE, warning = FALSE}# Joining the two data sets togethercleaned_data <- Cleaned_Emissions %>% tidyr::pivot_wider(names_from = Industry, values_from = Total_Emissions) %>% dplyr::right_join(surface_temp_data, by = c("Year", "Country")) %>% tidyr::drop_na() %>% distinct(Country, Year, .keep_all = TRUE)cleaned_data$Year <- as.Date(paste0(cleaned_data$Year, "-01-01"))cleaned_data_graph <- Cleaned_Emissions %>% dplyr::right_join(surface_temp_data, by = c("Year", "Country")) %>% dplyr::filter(Country %in% c("United States", "India", "Australia and New Zealand", "Canada", "United Kingdom", "France", "Mexico")) %>% ggplot(aes( x = Year, y = `Temp Change`, color = Country, size = Total_Emissions )) + geom_point() + labs( title = "Temp Change by Country, Industry, and Emission", ) + ylab("Temperature Change (Degrees C)") + theme(axis.text.x = element_text(angle = 90, hjust = 1))cleaned_data_graph```Above is another visual representation we created that allowed us to easily see this trend and also incorporated the emissions generated by each country in the respective size of each data point. To try and capture more of this trend we decided to run a linear regression analysis to see how well our emissions data captured this change in temperature.```{r, message = FALSE, warning = FALSE}# Feature engineering our linear regression modellibrary(rsample)cleaned_data_split <- cleaned_data %>% rsample::initial_split(prop = 0.8, strata = Country)training_data <- training(cleaned_data_split)testing_data <- testing(cleaned_data_split)library(recipes)recipe_pipeline_train <- recipes::recipe(`Temp Change` ~ ., data = training_data) %>% recipes::step_rm(Year) %>% recipes::step_rm(Country) %>% recipes::step_normalize(all_numeric()) %>% recipes::prep()train_baked <- recipes::bake(recipe_pipeline_train, training_data)recipe_pipeline_test <- recipes::recipe(`Temp Change` ~ ., data = testing_data) %>% recipes::step_rm(Year) %>% recipes::step_rm(Country) %>% recipes::step_normalize(all_numeric()) %>% recipes::prep()test_baked <- recipes::bake(recipe_pipeline_test, testing_data)``````{r, message = FALSE, warning = FALSE}# Linear Modellibrary(parsnip)model_lm <- parsnip::linear_reg(mode = "regression") %>% parsnip::set_engine("lm") %>% parsnip::fit(`Temp Change` ~ ., data = train_baked)summary(model_lm$fit)```We noticed that certain industries, namely the Manufacturing Industry as well as Industrial Processes seem to be statistically significant in our model. However, we also noticed that our adjusted R-squared is particularly low at 6.37%. This signifies that there are other factors contributing to the changes in surface temperature that we have not accounted for in our model and from here we should try and do some more research into what other factors may be causing these changes in surface temperature. We did however want to try and expand further on the data we already collected, in doing so we attempted to perform some machine learning and feature engineering in order to try and predict how the surface temperatures may react to changes in emissions from these industries.```{r, message = FALSE, warning = FALSE}# Running predictionslibrary(yardstick)predictions_lm <- model_lm %>% stats::predict(new_data = test_baked) %>% dplyr::bind_cols(`Temp Change` = test_baked %>% ungroup() %>% dplyr::select("Temp Change")) %>% yardstick::metrics(truth = `Temp Change`, estimate = .pred) %>% dplyr::arrange(.metric)predictions_lm```After performing some predictions and further analysis on these predictions we found that they had little significance in their ability to form accurate predictions of surface temperature with relatively high MAE and RMSE metrics.We then looked at performing a Bayesian regression in order to see if that would fit our data better and give us a better representation on how these industries affect changes in surface temperature, including seeing if predictions from this model would yield better results.```{r, message = FALSE, warning = FALSE}# Bayesian regressionlibrary(rstanarm)options(mc.cores = parallel::detectCores())model_bayes <- parsnip::linear_reg(mode = "regression") %>% parsnip::set_engine("stan", prior_intercept = rstanarm::normal(), prior = rstanarm::student_t(df = 1), iter = 10000, seed = 123) %>% parsnip::fit(`Temp Change` ~ ., data = train_baked)# Resultslibrary(broom.mixed)broom.mixed::tidy( model_bayes, conf.int = TRUE, conf.level = 0.2) %>% dplyr::mutate(dplyr::across(where(is.numeric), ~round(.x,5)))out_bayes <- model_bayes %>% stats::predict(new_data = test_baked) %>% dplyr::bind_cols(`Temp Change` = test_baked %>% ungroup() %>% dplyr::select(`Temp Change`)) %>% yardstick::metrics(truth = `Temp Change`, estimate = .pred) %>% dplyr::arrange(.metric)out_bayes```This proved to give us a similar output as our linear regression siggesting that our data does indeed fit a linear model the best. We did notice that the confidence intervals between estimates generally have large ranges given our range of temperature changes in the original data is quite small.Plotting these estimates along with their confidence interval in a whisker plot gave us some more insight into which industries have the largest confidence intervals. We noticed that the Industrial Processes, Energy, Transportation, and Waste industries had the widest confidence intervals.```{r, message = FALSE, warning = FALSE}broom.mixed::tidy(model_bayes, conf.int = T) %>% ggplot2::ggplot(aes(x = term)) + geom_point(aes(y = estimate)) + geom_errorbar(aes(ymin = conf.low, ymax = conf.high), width = 1.5) + theme(axis.text.x = element_text(angle = 45, hjust = 1))```We then decided to go back to the original data to see try and investigate these 4 industries further. After graphing it on a scatter plot we noticed that the industries with the largest emissions were in fact the Transportation and Energy industries. Without finding more factors that may be connected to changes in surface temperature we relied on empirical evidence that suggests emissions do in fact have an impact on surface temperature changes. From this we dived deeper into both the Energy and Transportation industries where we looked at potential remedies that may reduce emissions from both of these sectors, namely the transition towards electrification and electric vehicles.```{r, message = FALSE, warning = FALSE}Cleaned_Emissions_graph <- Cleaned_Emissions %>% dplyr::filter(Country %in% c("China", "United States", "India", "World")) %>% dplyr::filter(Industry != "Other") %>% ungroup()Cleaned_Emissions_graph$Year <- as.Date(paste0(Cleaned_Emissions_graph$Year, "-01-01"))Emissions_graph <- Cleaned_Emissions_graph %>% plotly::plot_ly( x = ~ Industry, y = ~ Total_Emissions, type = "scatter")Emissions_graph``````{r message=FALSE, warning=FALSE, error=FALSE, results='hide'}library(xml2)library(jsonlite)library(readxl)library(stringr)library(factoextra)library(gridExtra)library(tm)library(SnowballC)library(purrr)library(furrr)library(topicmodels)library(tidytext)#install.packages("factoextra")#install.packages("maps")#install.packages("topicmodels")#install.packages("tidytext")url <- "https://www.eia.gov/totalenergy/reports.php"# Start Selenium serverremDr$open()remDr$navigate(url)Sys.sleep(2)html_content_data <- remDr$getPageSource()[[1]]Sys.sleep(2)reading_html <- read_html(html_content_data)Sys.sleep(2)href_elements_html <- reading_html %>% html_nodes("a.ico.html") %>% html_attr("href")Sys.sleep(2)href_elements_html <- paste0("https://www.eia.gov/", href_elements_html)href_elements_html <- href_elements_html[-c(1,2,3, 54:59)]Sys.sleep(2)href_elements_html <- href_elements_html[!grepl(c("pdf"), href_elements_html)]remDr$close()``````{r message=FALSE, warning=FALSE, error=FALSE, results='hide'}remDr$open()# Function to extract paragraphs from a webpageextract_paragraphs <- function(url) { remDr$navigate(url) page_source <- remDr$getPageSource()[[1]] page <- read_html(page_source) paragraphs <- html_nodes(page, "p") html_text(paragraphs)}# List of linkslinks <- href_elements_html# Extract paragraphs from each linkparagraphs_list <- lapply(links, extract_paragraphs)remDr$close()paragraphs_list <- gsub("(united states|United states|United States)", "US", paragraphs_list, ignore.case = TRUE)paragraphs_list <- text_cleaned <- gsub("(energy consumption)", "EC", paragraphs_list, ignore.case = TRUE)paragraphs_list <- text_cleaned <- gsub("\\\\r\\\\n\\\\t", " ", paragraphs_list, ignore.case = TRUE) paragraphs_list <- gsub("(\\\\| tttttttt | \n\ | \ |)", "", paragraphs_list)pattern <- "(t{3,})"paragraphs_list <- gsub(pattern, "", paragraphs_list)paragraphs_list_for_cleaning <- tolower(paragraphs_list)``````{r message=FALSE, warning=FALSE, error=FALSE , results='hide' }#cleaning phase - in this phase we remove the stop words, the numbers, puncuation, and extra space in the sentences.corpus <- VCorpus(VectorSource(paragraphs_list_for_cleaning))corpus <- tm_map(corpus, removePunctuation)corpus <- tm_map(corpus, removeWords, stopwords())corpus <- tm_map(corpus, stemDocument)corpus <- tm_map(corpus, stripWhitespace)corpus <- tm_map(corpus, removeNumbers)cleaned_text <- sapply(corpus, as.character)Sys.sleep(2)# the two below paragraphs were repeated everywhere, so this would cause a problem in my NLP model because the number of occurance is very the bedrock of NLP so I have to get rid of these paragraphs. Also all the words are in its root format (ie included, including include - all of them might be as includ)# this way we could get a good representation of where the words fall.cleaned_text <- gsub(c("cmenu crude oil gasolin heat oil diesel propan liquid includ biofuel natur gas liquid explor reserv storag import export product price sale sale revenu price power plant fuel use stock generat trade demand emissionsn energi use home commerci build manufactur transport reserv product price employ product distribut stock import export includ hydropow solar wind geotherm biomass ethanol uranium fuel nuclear reactor generat spent fuel comprehens data summari comparison analysi project integr across energi sourc month year energi forecast analysi energi topic financi analysi congression reportsn"), " ",cleaned_text)Sys.sleep(2)cleaned_text <- gsub(c("financi market analysi financi data major energi compani greenhous gas data voluntari report electr power plant emiss map tool resourc relat energi disrupt infrastructur state energi inform includ overview rank data analys map energi sourc topic includ forecast map intern energi inform includ overview rank data analys region energi inform includ dashboard map data analys tool custom search view specif data set studi detail document access timeseri data free open data avail api excel addin bulk file widget come test product still develop let us know think form use collect energi data includ descript link survey instruct addit inform sign email subcript receiv messag specif product subscrib feed updat product includ today energi what new short time articl graphic energi fact issu trend lesson plan scienc fair experi field trip teacher guid career corner report request congress otherwis deem import"), " ",cleaned_text)Sys.sleep(2)pattern <- "\\b\\w{16,}\\b"cleaned_text <- gsub(pattern, "", cleaned_text)Sys.sleep(2)cleaned_text[-c(2, 6, 8, 10, 41, 42, 43, 44, 45, 46, 48)]str_count(cleaned_text)#here I got rid of the repeatitive words and short formats both of these are meaningless in the context of our analysis.cleaned_text <- gsub(c("eia | ieo"), "", cleaned_text)``````{r message=FALSE, warning=FALSE, error=FALSE, results='hide'}dtm <- DocumentTermMatrix(Corpus(VectorSource(cleaned_text)))dtm <- removeSparseTerms(dtm, 0.999)datasett <- as.data.frame(as.matrix(dtm))filtered_matrix <- datasett[rowSums(datasett != 0) > 0, ]ap_topic_model <- topicmodels::LDA(filtered_matrix, k = 18, control = list(seed = 321))#Running our topic modelSys.sleep(2)AP_topics <- tidytext::tidy(ap_topic_model, matrix = "beta")ap_top_terms <- AP_topics %>% group_by(topic) %>% top_n(10, beta) %>% ungroup() %>% arrange(topic, -beta)Sys.sleep(2)first_plot <- ap_top_terms %>% mutate(term = reorder(term, beta)) %>% mutate(topic = paste("Topic #", topic)) %>% ggplot(aes(term, beta, fill = factor(topic))) + geom_col(show.legend = FALSE)+ facet_wrap(~ topic, scales = "free")+ theme_minimal()+ theme(plot.title = element_text(hjust = 0.5, size=18), axis.text.y = element_text(size = 5), axis.text.x = element_text(size = 5))+ labs(title = "Most relevent terms grouped together", caption = "Top terms by topic (betas)")+ ylab("")+ xlab("")+ coord_flip()# AEO - the full form is annual energy outlook ```Energy is the cornerstone of modern civilization, driving economic prosperity, technological innovation, and societal well-being. Despite its paramount importance, energy remains a sensitive topic often overshadowed by geopolitical tensions, environmental concerns, and social inequalities. Historical incidents, such as the oil crises of the 1970s and nuclear disasters like Chernobyl and Fukushima, underscore the complexities and risks associated with energy production and consumption. As we confront the challenges of climate change and strive for a sustainable future, it is imperative to recognize the centrality of energy in shaping our world and engage in open, informed dialogue to navigate the complexities and opportunities it presents.With that in mind we web scraped the EIA website to understand the prospects of what does the analysis say about energy sector in USA. Generally, these analysis are long and one analysis talks about different areas, hence difficult to get a general idea about what it is all about. Therefore, we used natural language processing (NLP) to break them into root words, and put them in different topic brackets. We used topic modelling in this case and after cleaning and editing the analysis we generated this chart. All the words are in its root form meaning that for the purpose of NLP analysis we changed the words ( included/including/include = includ) to its root form. This way we would be able to fairly compare and catagorize the words to related topic```{r message=FALSE, warning=FALSE, error=FALSE}first_plot```In the above chart we used Latent Dirichlet Allocation (LDA) model to create a better fit for each word across all articles we web scrapped. It is a mixed membership model where one word could be shared with one or many topics. As a result we chose the number of topics and have a list of words associated with one or more topics. (for the context AEO stands for annual energy outlook.)How did we do it?We cleaned the data and changed it to document term matrix: which is a representation of a corpus (collection of texts) in a matrix format. the rows are the paragraphs/articles, and the columns are each words/phrase. Looking at the graph, the horizontal axis is the Beta: the probability of each word being related to each topicwe see that topic 3 has something related to electric sector and fuel, lets dive deep and explore that specific topicFrom the chart description, it seems that the topic modeling analysis has revealed several key insights into the energy sector of the USA. Contrary to the mass media propagation and outlet, the analysis says something totally different, what we realized is that the focus is more on policy change, and increase in demand of energy in the form of electricity.Overall, the LDA model’s findings highlight the complexity and multifaceted nature of the energy sector in the USA, encompassing production, economic implications, resource management, and policy impact.Speaking of Energy, CO2 emission is the main concern that has caused the shift towards sustainable energy; be it a political agenda or anything related to that: from here onwards we will measure the progress of countries that advocate for decarbonization.```{r message=FALSE, warning=FALSE, error=FALSE, results='hide'}National_GreenhouseGas_Emissions_Country <- read_csv(destfile) %>% dplyr::select(-ObjectId, -ISO2, -ISO3, -Source, -CTS_Name, -CTS_Full_Descriptor, -Indicator, -Scale, -CTS_Code, -F2023, -F2024, -F2025, -F2026, -F2027, -F2028, -F2029, -F2030)colnames(National_GreenhouseGas_Emissions_Country) <- gsub("F","",colnames(National_GreenhouseGas_Emissions_Country))National_GreenhouseGas_Emissions_Country``````{r Emission, message=FALSE, warning=FALSE, error=FALSE, results='hide'}# I filtered China and India because they polluted so much so that it would mess up with the scale.National_GreenhouseGas_Emissions_Country <- National_GreenhouseGas_Emissions_Country %>% mutate( Country = gsub(",.*| Rep\\. of| Rep\\.|Arab Rep\\. of", "", Country)) %>% as.data.frame() %>% filter(! Country %in% c("China", "India")) %>% select(-c(2, 3, 4))``````{r message=FALSE, warning=FALSE, error=FALSE, results='hide'}filteredd <- National_GreenhouseGas_Emissions_Country %>% select("Country", "2020")filteredd <- filteredd %>% rename(Emission = "2020")filteredd <- aggregate(Emission ~ Country, data = filteredd, FUN = sum) %>% rename(region = Country)# Get world map dataworld_map <- map_data("world")merged_data <- left_join(world_map, filteredd, by = "region")# Plot the map with emissionsmap2020 <- merged_data %>% ggplot(aes(x=long, y=lat, group = group, text = region))+ geom_polygon(aes(fill = Emission), color = "Black") + scale_fill_gradient(name = "Emission 2020", low = "yellow", high = "red", na.value = "grey50")+ theme( axis.text.x = element_blank(), axis.text.y = element_blank(), )map2020 <- ggplotly(map2020)``````{r message=FALSE, warning=FALSE, error=FALSE, results='hide'}filteredd1 <- National_GreenhouseGas_Emissions_Country %>% select("Country", "2008")filteredd1 <- filteredd1 %>% rename(Emission = "2008")filteredd1 <- aggregate(Emission ~ Country, data = filteredd1, FUN = sum) %>% rename(region = Country)# Get world map dataworld_map1 <- map_data("world")merged_data1 <- left_join(world_map, filteredd1, by = "region")# Plot the map with emissionsmap2008 <- merged_data1 %>% ggplot(aes(x=long, y=lat, group = group, text = region))+ geom_polygon(aes(fill = Emission), color = "Black") + scale_fill_gradient(name = "Emission", low = "yellow", high = "red", na.value = "grey50")+ theme( axis.text.x = element_blank(), axis.text.y = element_blank(), )map2008 <- ggplotly(map2008)``````{r message=FALSE, warning=FALSE, error=FALSE,}subplot(map2020, map2008,nrows = 2) %>% layout(title = "World emission in Million metric tons of CO2 equivalent (2020 on top) (2008 in bottom)")```We purposefully excluded India and China because of their high emission other countries would not event seem close, in other words it would greatly skew our scale. Also interestingly the Emission of USA was included as 'Advanced economies' so we assumed their emission is aggregated with other economies (perhaps Congo or central Africa *to be taken as irony because USA tends to hide its mess all the time and blames others*) and did not include it.We can see that most of developed economies' emission decreased, whereas emerging economies' emission has increased over time. Now lets look into a different aspect of CO2 emission. Before going to the next paragraph please play around with the graph below; looks like it has less information but in fact we simplified it for better result.```{r message=FALSE, warning=FALSE, error=FALSE, results='hide'}Emission2017_2021 <- National_GreenhouseGas_Emissions_Country %>% select(`Country`,`2017` , `2018`, `2019`, `2020`, `2021`) %>% filter(Country %in% c("China", "United States", "Germany", "France", "United Kingdom", "Norway", "Netherlands", "Sweden", "Japan", "Canada" ))Emission2017_2021 <- Emission2017_2021[-1]column_aggregates <- sapply(Emission2017_2021, FUN = function(x) sum(x, na.rm = TRUE)) %>% as.data.frame()column_aggregates <- column_aggregates %>% mutate(Year=c("2017", "2018", "2019", "2020", "2021"))column_aggregates <- column_aggregates %>% rename(column_aggregates, Emission = .)``````{r message=FALSE, warning=FALSE, error=FALSE, results='hide'}linkkwiki <- "https://en.wikipedia.org/wiki/Electric_car_use_by_country"remDr$open()Sys.sleep(1)remDr$navigate(linkkwiki)Sys.sleep(1)html_content_data_wiki <- remDr$getPageSource()html_content <- html_content_data_wiki[[1]]reading_html_wiki <- read_html(html_content)remDr$close()table <- html_nodes(reading_html_wiki, "table.wikitable") %>% html_table(fill = TRUE)Sys.sleep(1)EV_sales2 <- table[[3]] %>% as.data.frame() %>% select(1,3,5,7,9,11)EV_sales2 <- EV_sales2[-1, ] EV_sales2 <- EV_sales2 %>% lapply(function(x) gsub("\\[.*?\\]", "", x)) %>% as.data.frame()EV_sales2$Country <- gsub("\\(.*?\\)", "", EV_sales2$Country) %>% na.omit()EV_sales2 <- EV_sales2[1:(nrow(EV_sales2) - 6), ]EV_sales2 <- EV_sales2 %>% mutate(across(everything(), ~ gsub(",", "", .)))EV_sales2 <- EV_sales2 %>%mutate(across(2:6, as.numeric))Sys.sleep(1)EV_sales2 <- EV_sales2[-1]EV_sales2 <- sapply(EV_sales2, FUN = function(x) sum(x, na.rm = TRUE)) %>% as.data.frame()Sys.sleep(1)EV_sales <- EV_sales2 %>% mutate(Year = c("2021", "2020", "2019", "2018", "2017")) %>% arrange(Year)EV_sales <- EV_sales %>% rename(EV_sales, evsales = .)Sys.sleep(1)final_dataa <- left_join(EV_sales, column_aggregates, by = "Year")``````{r message=FALSE, warning=FALSE, error=FALSE,results='hide'}plot <- final_dataa %>% ggplot(aes(x=evsales, y=Emission, col = factor(Year))) + geom_point() +labs(title = "Global EV sales and Emission", col = "Year")plotly_plottt <- ggplotly(plot)``````{r message=FALSE, warning=FALSE, error=FALSE, }plotly_plottt```For this part we took the emission of the countries that we gathered their EV sales. This way we could see if the EV usage overtime has actually correlated to decrease in their emission.The sale/use of EV in economies that sell EV moves at the opposite direction as the CO2 emission but not too aggressively, and the only time the emission decreased significantly was during covid19 lock-down in 2020. Given that we had 5 data points we did not bother coding for correlation rather a visual would convey the message better.Therefore from both the graphs (the maps and the points) we can conclude that developed economies are indeed heading towards decolonization however the progress is very slow and gradual.The energy sector discussions likely involve fossil fuels (such as oil and gas). When considering EV, a crucial shift is the transition from fossil fuel-based energy sources to cleaner alternatives. EVs primarily rely on electricity, which can be generated from renewable sources like solar, wind, or hydroelectric power. This transition impacts both the energy sector and the EV industry.In the universe of energy, electric cars aren't just a new vehicle option; they're a game-changer. By tapping into the power grid instead of the gas station, these cars are reshaping our approach to transportation and energy consumption. They're not just driving us forward; they're propelling us toward a more sustainable future.Despite their growing popularity, there's still much we don't know about electric cars. While they hold promise for reducing emissions and dependence on fossil fuels, their widespread adoption presents a host of unanswered questions. How will they impact the electricity grid? What are the long-term environmental implications of their production and disposal? These are all possible questions to ask, the more important question is rather how to understand these cars because they are not the same as combustion engine cars; not even close.Within the realm of electric cars, a significant challenge lies in the absence of direct comparisons to traditional combustion engine vehicles. Unlike their gasoline-powered counterparts, electric cars possess unique characteristics that defy conventional metrics of comparison. Factors such as range, charging time, and efficiency take on new dimensions in the context of electric propulsion, requiring novel methodologies for evaluation.```{r}# Load required librariesif (!require("rvest")) install.packages("rvest")if (!require("dplyr")) install.packages("dplyr")if (!require("foreach")) install.packages("foreach")if (!require("doParallel")) install.packages("doParallel")library(rvest)library(dplyr)library(foreach)library(doParallel)# Helper function: Remove non-numeric characters (except comma and dot), replace comma with dot, and convert to numeric.parse_numeric <-function(text) { num <-gsub("[^0-9,\\.]", "", text) num <-gsub(",", ".", num)as.numeric(num)}# URL to scrapeurl <-"https://ev-database.org/"# Read HTML contentpage <-read_html(url)# Extract all car blocks with class 'list-item'car_nodes <- page %>%html_nodes(".list-item")# Convert each node to a character string for serialization across workerscar_html_list <-lapply(car_nodes, as.character)# Set up parallel backend using available coresnumCores <- parallel::detectCores() -1# leave one core freecl <-makeCluster(numCores)registerDoParallel(cl)# Process each car block in parallel using foreachcar_results <-foreach(i =seq_along(car_html_list), .combine = rbind, .packages =c("rvest", "dplyr", "xml2")) %dopar% {# Rebuild the HTML from the character string car_block <-read_html(car_html_list[[i]])# Basic Information: model_name <- car_block %>%html_node(".title span[class]") %>%html_text(trim =TRUE) model_variant <- car_block %>%html_node(".title .model") %>%html_text(trim =TRUE)# Additional Attributes: efficiency_text <- car_block %>%html_node("span.efficiency") %>%html_text(trim =TRUE) weight_text <- car_block %>%html_node("span.weight_p") %>%html_text(trim =TRUE) acceleration_text <- car_block %>%html_node("span.acceleration_p") %>%html_text(trim =TRUE) longdistance_text <- car_block %>%html_node("span.long_distance_total") %>%html_text(trim =TRUE) price_text <- car_block %>%html_node("span.country_uk") %>%html_text(trim =TRUE) battery_text <- car_block %>%html_node("span.battery_p") %>%html_text(trim =TRUE) fastcharge_text <- car_block %>%html_node("span.fastcharge_speed_print") %>%html_text(trim =TRUE) priceperrange_text <- car_block %>%html_node("span.priceperrange_p") %>%html_text(trim =TRUE) seats_text <- car_block %>%html_node("i.seats-5 + span") %>%html_text(trim =TRUE)# Convert extracted texts to numeric values where appropriate efficiency <-if(!is.na(efficiency_text)) parse_numeric(efficiency_text) elseNA weight <-if(!is.na(weight_text)) parse_numeric(weight_text) elseNA acceleration <-if(!is.na(acceleration_text)) parse_numeric(acceleration_text) elseNA long_distance <-if(!is.na(longdistance_text)) parse_numeric(longdistance_text) elseNA price <-if(!is.na(price_text)) parse_numeric(price_text) elseNA battery <-if(!is.na(battery_text)) parse_numeric(battery_text) elseNA fastcharge <-if(!is.na(fastcharge_text)) parse_numeric(fastcharge_text) elseNA price_per_range <-if(!is.na(priceperrange_text)) parse_numeric(priceperrange_text) elseNA seats <-if(!is.na(seats_text)) parse_numeric(seats_text) elseNA# Return a data frame row for this cardata.frame(Model = model_name,Variant = model_variant,Efficiency = efficiency,Weight = weight,Acceleration = acceleration,Range = long_distance,Price = price,Battery = battery,FastChargeSpeed = fastcharge,PricePerRange = price_per_range,Seats = seats,stringsAsFactors =FALSE )}# Stop the parallel clusterstopCluster(cl)# Rename columns to include units for claritycolnames(car_results) <-c("Model", "Variant", "Efficiency (Wh/km)", "Weight (kg)", "Acceleration (sec)", "Range (km)", "Price (£)", "Battery (kWh)", "FastChargeSpeed (kW)", "PricePerRange (€/km)", "Seats")``````{r message=FALSE, warning=FALSE, error=FALSE, results='hide'}plot_car <- plot_ly( data = car_results, x = ~`Acceleration (sec)`, y = ~`Price (£)`, text = ~paste(`Model`, `Variant`, sep = " - "), size = ~`FastChargeSpeed (kW)`, color = ~`Model`, colors = "Set1", type = "scatter", mode = "markers", marker = list(symbol = "circle"), hoverinfo = "text+x+y+size") %>% layout( title = "Electric Car Model Comparison (Bubble size = Fast Charge Speed in kW)", xaxis = list(title = "Acceleration (sec)"), yaxis = list(title = "Price (£)"), legend = list(title = list(text = "Car Model")) )``````{r message=FALSE, warning=FALSE, error=FALSE, }plot_car```From the above graph we can see that electric vehicles are different beasts when it comes to understanding their nature, If you double click on the bubble right next of the car model on the side it would remove other vehicles and then you can select your desired models and compare them. All of these EVs are the models that are in the market or will be in the market in coming years. So far we have \~ 345 models in this graph.so why not categorize them based on their attributes: we used all the attributes such as usable battery, 0-100 acceleration, top speed, Range, efficiency, charging speed, price, and drive-train to cluster them. This way we would have a better understanding of which model falls where.```{r message=FALSE, warning=FALSE, error=FALSE, results='hide'}Sys.sleep(1)clustered <- car_results %>% replace(is.na(.), 0) %>% select(-c("Model", "Variant"))clustered <- clustered %>% scale()if (!require(factoextra)) { install.packages("factoextra") install.packages("gridExtra")}library(factoextra)library(cluster)library(gridExtra)# Calculate the clustering indicesalphaa <- fviz_nbclust(clustered, kmeans, method = "wss")+ theme(plot.title = element_blank())charliee <- fviz_nbclust(clustered, kmeans, method = "silhouette")+ theme(plot.title = element_blank())betaa <- fviz_nbclust(clustered, kmeans, method = "gap_stat")+ theme(plot.title = element_blank())# Arrange the plots next to each othersuppressMessages({ k <- grid.arrange(alphaa, charliee, betaa, ncol = 3)})``````{r message=FALSE, warning=FALSE, error=FALSE}k```Looking at the analysis we can cluster our list of electric cars into 3 main catagories.```{r message=FALSE, warning=FALSE, error=FALSE}fviz_cluster(kmeans(clustered, centers = 3, iter.max = 150, nstart = 150), data = clustered)```We can see from the above graph that the first principle component is in the vertical axis and second principle component is in the horizontal axis. And since the first PC explains more than half of our data, we divide our data based on it into three main categories; the result is three clusters.```{r message=FALSE, warning=FALSE, error=FALSE, results= 'hide'}Clusters <- kmeans(clustered, centers = 3, iter.max = 150, nstart = 150)car_results <- car_results %>% mutate(cluster = Clusters$cluster)``````{r message=FALSE, warning=FALSE, error=FALSE, results= 'hide'}plott_V <- car_results %>% plot_ly( x = ~`Range (km)`, y = ~`Efficiency (Wh/km)`, color = ~factor(cluster), size = ~as.numeric(`Price (£)`), text = ~paste("Model:", Model, "<br>", "Variant:", Variant, "<br>", "Battery (kWh):", `Battery (kWh)`, "<br>", "Efficiency (Wh/km):", `Efficiency (Wh/km)`, "<br>", "Acceleration (0-100/s):", `Acceleration (sec)`, "<br>", "Weight (kg):", `Weight (kg)`, "<br>", "FastCharge Speed (kW):", `FastChargeSpeed (kW)`, "<br>", "Price in £:", `Price (£)`, "<br>", "Price Per Range (€/km):", `PricePerRange (€/km)`, "<br>", "Seats:", Seats )) %>% add_markers() %>% layout( title = "Electric Vehicle Data: Clusters, Price, and Efficiency", xaxis = list(title = "Range (km)"), yaxis = list(title = "Efficiency (Wh/km)"), coloraxis = list(title = "Cluster"), sizeaxis = list(title = "Price (£)") )``````{r message=FALSE, warning=FALSE, error=FALSE}plott_V```Both Range and Efficiency are vital considerations for electric car buyers, and they are relatively independent of each other. While Range focuses on the total distance a car can travel on a single charge, Efficiency provides insight into how efficiently the car utilizes its battery capacity to cover that distance. These two variables offer complementary information for evaluating the practicality, cost-effectiveness, and environmental footprint of electric vehicles.Efficiency in electric cars is expressed in terms of energy consumption per unit distance traveled. In our case its measured in watt-hours per kilometer. For example an efficiency of 286 Wh/km means that the car consumes 286 watt-hours of energy to travel a distance of 1 km.Higher efficiency values indicate that the car can travel longer distances using the same amount of battery capacity, which is desirable for maximizing the range and reducing operating costs of electric vehicles. Efficiency is influenced by various factors such as vehicle weight, aerodynamics, tire rolling resistance, driving behavior, and environmental conditions.The range of an electric car refers to the distance it can travel on a single charge of its battery. It's a critical specification for electric vehicles as it directly impacts their usability and practicality for everyday driving and long-distance travel. For instance a Professor who lives in Calgary and need to travel to Edmonton (300 km) would need an electric car that surpasses that range. The best car for such a person would be Lucid Air Grand Touring with price 132,000 EUR; but the problem is trade off between Range and efficiency. High efficient cars have lower range and the other way around. So the optimum models are the cluster 1. If A person can afford more luxurious cars he should go for cluster 2, if looking for affordable then cluster 3. Therefore we can see that how clustering make our life easier by helping us make decisions based on constraints we have.Hence, our findings could be summarized into the following points and as a result answer the questions we asked at the beginning:\a\. Electrification and the adoption of electric vehicles can significantly impact CO2 emissions. By shifting from fossil fuel-based energy sources to cleaner alternatives and promoting the use of electric vehicles, there is potential for substantial reductions in emissions.b\. Understanding consumer attitude and behaviors towards electric vehicles is crucial for analyzing their uptake and identifying barriers to adoption. Factors such as vehicle cost, range anxiety, charging infrastructure availability based on charging speed and range, and perceived environmental benefits influence consumer decisions, these factors can inform strategies to promote electric vehicle adoption.Electric cars may be a crucial part of reducing our emissions to try and slow climate change in the near future. This may require some serious thought about how to lay out the infrastructure so we can easily transition into a world with more electrification. Who knows, research being done into other renewable sources of energy may overtake electrification before we can create a sustainable and viable infrastructure platform to support the demand these vehicles require. Today, we are already noticing rolling blackouts becoming more popular in places like California as consumers are putting too much load on the existing power grid. Car manufacturers like Toyota are already putting all of their focus into researching hydrogen powered vehicles as an alternative to EV's. Some speculators say that the EV is simply a transition vehicle while we try to find a more sustainable method of power generation, much like the ATRAC was a short lived transition towards CD's and eventually digital media.