Data is extremely valuable. For centuries, oil had been considered one of the world’s most valuable resources, but in today’s “data economy,” data could potentially be more valuable because, like oil, it can be refined into an essential commodity [1]. It is the key to understanding the modern world and the myriad of factors making contemporary society possible. The world today is more complex, requiring us to be able to digest the many things happening into useful information. It translates to gathering available data and analyzing it to reach the best possible understanding of how and why things are.
We must study and analyze data simply because it is ubiquitous. Everything we do daily creates a mass of information that would be meaningless until transformed into insights [2]. It cleans and organizes raw data, which can be scattered or presented in different forms, into a more consistent form that can be further studied, thus unlocking its potential [3]. By making sense of these large amounts of data, we can recognize patterns of behavior and begin to rationalize whether certain choices are beneficial or detrimental to a person, object, or place. With that said, we could imagine the analysis of data as a way to account for history. Studying the data of the past can help us prevent historical misfortunes from repeating themselves and create further advancements from historical milestones. In short, learning how to study and analyze data helps us in decision making. Wherein the outcome of studying and analyzing will serve as a basis. Studying and Analyzing data can help us predict the future. We are not fortune tellers, but given enough data we can find patterns that will reveal to us future outcomes [4].
For example, data analysis has been used in studies to learn about economic development [5]. One of these studies looked into the effects of the Mita forced labor system, how it may have caused a collapse in population, and compared it to other labor systems in the world to create inferences for possible factors a historical event could play in economic development [6]. Data in itself is not useful until it is then analyzed and studied to benefit in achieving a certain goal. Data science allows for innovation to occur, which would be a result of taking advantage of current trends and analyzing future ones, possibly even making their own trends through the innovation of products and services brought by companies or individuals [7]. This is not just applicable or restricted to businesses or industries, but also in the aspects of science and politics, where the focus of innovation may change such as for the greater good instead of for personal gain such as companies or individuals.
Analyzing the data present in the past would allow us to understand the “why’s” of a past phenomenon and, therefore, create implications for improved decisions based on historical data. Aside from being able to predict the possible outcome, learning how to analyze data also provides various skill sets that can be used in everyday activities. Skills such as problem solving skills and deductive reasoning skills, being able to draw conclusions from various pieces of information. Another skill would be communication skills, wherein the ability of being able to present and explain complex data to those who do not understand [8]. These skills are not only applicable in job settings but also in other everyday challenges we face.
Furthermore, modern technology, namely the internet and mobile computers, has made gathering data and information easier than ever. With that said, to choose not to study and analyze data in this time period is disadvantageous, as it prevents us from making use of widely available data that, when analyzed, can provide us with information that will better help us understand the world around us.
A photo of India amid a COVID-19 lockdown
After knowing the importance of studying and analyzing data, here are five (5) implications and/or applications wherein data science is used:
- Advertising for specific target audiences is one application of data science that companies, such as Facebook, utilize. It may affect our everyday decisions, such as buying a product “that you didn’t know you wanted yet.” Despite the controversy surrounding Facebook and their views and actions on their user’s data and privacy [9], they make use of your data, such as your interests and hobbies, for them to analyze and decide which kind of advertisements would be best for you [10]. It would allow Facebook to retain and garner more sponsors that would utilize their platform to advertise their product or service, as using this method may be more effective than conventional advertising methods.
- Nowadays, fuel prices have skyrocketed in which some transportation companies have spent much more money. One type of transportation company affected by this is airline companies. Due to today’s circumstances, these companies would make use of data science to optimize fuel efficiency [11]. Moreover, not only does data science reduce total fuel cost, but it also reduces anthropogenic carbon dioxide (CO2) emissions. Airlines utilize systems with built-in machine learning algorithms to collect and analyze flight data regarding each route’s distance and altitudes, aircraft type and weight, weather, etc. Based on findings from data, systems estimate the optimal amount of fuel needed for a flight for utmost efficiency [12].
- Other than the aforementioned applications of data science in history, it has also played its part in the medical field. Studies utilizing data analysis allow us to study and find other key factors of illnesses and help in their prevention or awareness. To cite a specific example, Age-Related Macular Degeneration (AMD) is one of the leading causes of vision loss in elderly individuals [13]. Through data science, researchers have shown that a family history of the illness will increase the risk of having it. Specifically, the risk of contracting AMD is greatly increased by having an affected first-degree relative, which was found by gathering data from hundreds of AMD-affected individuals and their family members [14]. With that said, data science would then allow us to ensure awareness of the possibilities of contracting illnesses and could push individuals to make a move for prevention or at least be prepared for its possibility.
- Data science can also guide policy-making. It allows for processing raw data into understandable information - producing less uncertainty, particularly when taking a particular course of action [15]. Policy-makers today are being pressured to make use of data science in drafting policies that can affect the public, to make the best possible policy solution through the use of reliable and accurate research [16]. For instance, data science can be used to inform policy-makers of the consequences of lockdowns in a given area. A study conducted in India showed that social distancing, while capable of preventing a number of cases, was not as effective in slowing the spread of COVID as having a lockdown with social distancing [17]. It was found that longer lockdowns would prevent more cases, with the caveat that the economic damage would be most severe. However, it should be noted that this occurred in the early stages of the pandemic, with the whole point of the lockdowns being to slow the spread of the disease in order to give the country more time to prepare its medical facilities. A year later, a harsh lockdown was again implemented. However, this time, data analysis showed that it was no longer as effective as before [18]. Data analyses such as these, especially over a period of time, can reduce uncertainty policy-makers may feel about their decisions, and hopefully allow them to draft the best possible policy solution for the public.
- Some companies use data science to predict the performance of different cryptocurrencies. Data science comes into play in forecasting and predicting the prices of cryptocurrencies because it aims to figure out what changes the value of these digital coins [19]. If you are a ‘crypto enthusiast,’ data science can help you identify which currency will do well and which won’t, which can help you make the highest return on investment and, of course, guide you towards the best decision.
Road widening project in Cotabato
One data science topic that we would like to pursue would be about gas-powered and electric-powered cars, in regards to comparing their availability and price in the Philippines between various manufacturers, as well as comparing the amount of carbon footprint each kind emits. Data regarding these aspects of gas and electric cars are available from studies and articles that tackle and delve into the topic. Most of these, however, are based in foreign countries such as the United States [20] [21] [22] [23] [24]. A statistical method that may prove useful would be using correlation analysis between these aspects - most especially finding a correlation between the availability of electric cars and how much emissions are being released in manufacturing and using them, and comparing them to gas-powered cars. With that said, I would say that consumers, companies who sell these, and environmentalists may benefit from this study.
Another data science topic that we may want to pursue would be the analysis of flight prices from multiple flight destinations, which is dependent on the location you are going to. This flight price prediction will forecast the best time to book your flight and give you the cheapest fare.
As expected, airline companies would want to sell their tickets at a higher rate, especially when your destination is having an event or a celebration, in which the demand for the flights would be high. This flight price prediction will help customers/passengers get cheaper or more cost-friendly flights without having to worry about missing out on the day of the event.
Data or datasets regarding this topic are available online. There are a lot of options: (1) You can check the website of your travel platform and also check the available flights and their fares; (2) The Airline Tariff Publishing Company (ATPCO) which keeps a million of data fares from multiple airlines in its database; (3) Travel Insight by Skyscanner which also keeps information in its database; and (4) public datasets which can be found in websites such as Kaggle.
Factors affecting the flight price prediction include purchase and departure dates, seasonality, holidays, the number of available airlines and flights, fare class, the current market demand, and flight distance [25].
The statistical method that can be used in this study includes Predictive analysis, wherein a wide range of probabilistic techniques such as data mining, big data, predictive modeling, artificial intelligence, and simulations [26] can be used to predict the time perfect for booking. Specifically, statistical hypothesis tests and Linear Regression target the different factors or variables that affect the flight fare. Overall, the population benefiting from this are customers/passengers looking for a cheap price to pay for their flight.
Furthermore, another data science topic we would like to pursue would be small scale traffic efficiency and management. By small scale, we would mean a case to case basis depending on the city and the location of intersections and roads. We would like to look into how we can predict congested traffic and manage it to adequately flow for an improved traffic speed. We believe traffic management can be optimized with improved synchronization of multiple traffic lights depending on road congestion.
With that said, being able to predict when and where traffic congestion will occur with data science can help decrease the burden of harmful emissions, pollution, and vehicular accidents and improve transportation efficiency altogether. In terms of the availability of data, it would depend on your target location. There would be the need to account for traffic data such as road occupancy, vehicle density, and the average speed of the cars within the traffic. The methods required to analyze this issue would include evaluating the congestion factor, which would indicate the general congestion state of the road [27]. It would also be crucial to identify parameters of the mean vehicle speed and density at every given hour [28]. Through this and other methods, we can create an algorithm that involves adaptive traffic management at every intersection depending on the time of day and the vehicular density. Ideally, it will help everyone in the city, from private and public vehicles to bystanders. Especially with the rise of gas prices, lighter traffic congestion will do plenty in helping individuals save on fuel and lessen emissions, pollutants, accidents, and possible gridlocks from happening.
In addition to this, as a group, we have also expressed our interest in studying any correlation between road-widening projects, car sales, and increased traffic congestion in the country. Data analysis can be used to draw conclusions from data sets gathered. Ideally, the results of the study will be used to guide policy-makers in deciding whether or not continued road-widening is a sustainable solution to Philippine transportation needs and whether we should continue to emphasize car ownership in the country. Data for road-widening projects can be found in the Department of Public Works website [29], and data on Philippines car sales can be found online. Data on traffic congestion might be more difficult to find on a national scale but should be accessible on the city-level. A statistical method that can be used to analyze this might be Locality-Preserving Non-negative Matrix Factorization, as described in a study conducted by Han and Moutarde [30]. The general public would be the ones to benefit from this, although it might be localized to specific areas of the country depending on the scale of the analysis.
Lastly, a topic that we would like to pursue is determining the frequency, intensity, and other characteristics of approaching tropical storms. PAGASA records and stores previous climatological data. Excel sheets of this raw data are readily available and can be requested for research or academic purposes. A statistical method that data scientists may use in studying the behaviors of typhoons is Exploratory Data Analysis (EDA) [31], which aids the discovery of patterns and checking of assumptions. In the Philippines, since we are located in the typhoon belt, we experience an average of 20 typhoons every year [32]. Typhoons, especially those of high intensity, can severely damage the Filipinos’ resources like our crops, homes, and infrastructure. Nearly everyone in the Philippines can benefit from these types of studies since findings can improve our emergency response.
[1] T. Fauerbach, “Data reigns in today’s data economy,” The Northridge Group, 28-Sep-2020. [Online]. Available: https://www.northridgegroup.com/blog/more-valuable-than-oil-data-reigns-in-todays-data-economy/. [Accessed: 21-Jun-2022].
[2] A. Ho, A. Nguyen, J. Pafford, and R. Slater, “A Data Science Approach to Defining a Data Scientist,” SMU Data Science Review, vol. 2, no. 3, Article 4, 2019. [Online] Available: https://scholar.smu.edu/datasciencereview/vol2/iss3/4. [Accessed: 22-Jun-2022].
[3] E. Amadebai, “The importance of data analysis in Research,” Analytics for Decisions, 28-Mar-2021. [Online]. Available: https://www.analyticsfordecisions.com/importance-of-data-analysis-in-research/#:~:text=Data%20analysis%20is%20important%20in,them%20derive%20insights%20from%20it. [Accessed: 21-Jun-2022].
[4] J. Edwards and C. writer, “What is predictive analytics? transforming data into future insights,” CIO, 16-Aug-2019. [Online]. Available: https://www.cio.com/article/228901/what-is-predictive-analytics-transforming-data-into-future-insights.html. [Accessed: 23-Jun-2022].
[5] N. Nunn, “THE IMPORTANCE OF HISTORY FOR ECONOMIC DEVELOPMENT,” National Bureau of Economic Research, April 2009. [Online] Available: https://www.nber.org/system/files/working_papers/w14899/w14899.pdf. [Accessed: 22-Jun-2022].
[6] M. Carpio and M. Guerrero, “Did the Colonial mita Cause a Population Collapse? What Current Surnames Reveal in Peru,” The Journal of Economic History, vol. 81, no. 4, December 2021. [Online] Available: https://doi.org/10.1017/S0022050721000498. [Accessed: 22-Jun-2022].
[7] V. Grossi, F. Giannotti, D. Pedreschi, P. Manghi, P. Pagano, and M. Assante, “Data science: A game changer for science and innovation,” International Journal of Data Science and Analytics, vol. 11, no. 4, pp. 263–278, 2021.
[8] “5 reasons why everybody should learn data analytics,” SAS. [Online]. Available: https://www.sas.com/en_au/insights/articles/analytics/5-reasons-why-everybody-should-learn-data-analytics.html. [Accessed: 23-Jun-2022].
[9] “Data Privacy Scandal: Facebook: Terranova security,” Cyber Security Awareness, 02-Mar-2022. [Online]. Available: https://terranovasecurity.com/data-privacy-scandal-facebook/. [Accessed: 23-Jun-2022].
[10] “Data Privacy Scandal: Facebook: Terranova security,” Cyber Security Awareness, 02-Mar-2022. [Online]. Available: https://terranovasecurity.com/data-privacy-scandal-facebook/. [Accessed: 23-Jun-2022].
[11] L. Moro, “Data Science to improve operations: Fuel efficiency optimization,” Datascience.aero, 07-Jan-2021. [Online]. Available: https://datascience.aero/data-science-operations-fuel-efficiency-optimization/. [Accessed: 23-Jun-2022].
[12] “10 Ways Airlines use artificial intelligence and data science to improve operations,” AltexSoft, 21-Feb-2020. [Online]. Available: https://www.altexsoft.com/blog/engineering/ai-airlines/#:~:text=Airlines%20use%20AI%20systems%20with,fuel%20needed%20for%20a%20flight. [Accessed: 23-Jun-2022].
[13] D. Quillen, “Common causes of vision loss in elderly patients,” Am Fam Physician, July 1999. [Online] Available: https://pubmed.ncbi.nlm.nih.gov/10414631/. [Accessed: 22-Jun-2022].
[14] H. Shahid, J. Khan, V. Cipriani, T. Sepp, B. Matharu, C. Bunce, S. Harding, D. Clayton, A. Moore, and J. Yates “Age-related macular degeneration: the importance of family history as a risk factor,” Journal of Ophthalmology, 2012. [Online] Available: https://bjo.bmj.com/content/96/3/427. [Accessed: 22-Jun-2022]. ]
[15] SciDevNet, “Data Visualisation: Contributions to evidence-based decision-making,” Shorthand. [Online]. Available: https://social.shorthand.com/SciDevNet/3geA2Kw4B5c/data-visualisation-contributions-to-evidence-based-decision-making. [Accessed: 23-Jun-2022].
[16] A. Numanović “Data science: The next frontier for data-driven policy making?,” Medium, 11-Jul-2017. [Online]. Available: https://medium.com/@numanovicamar/https-medium-com-numanovicamar-data-science-the-next-frontier-for-data-driven-policy-making-8abe98159748#:~:text=Data%20Science%20in%20Policy%20Making,and%20more%20effective%20public%20policies. [Accessed: 23-Jun-2022].
[17] D. Ray, M. Salvatore, R. Bhattacharyya, L. Wang, S. Mohammed, S. Purkayastha, A. Halder, A. Rix, D. Barker, M. Kleinsasser, Y. Zhou, P. Song, D. Bose, M. Banerjee, V. Baladandayuthapani, P. Ghosh, and B. Mukherjee, “Predictions, role of interventions and effects of a historic National Lockdown in India’s response to the COVID-19 pandemic: Data Science Call To Arms,” medRxiv, 01-Jan-2020. [Online]. Available: https://www.medrxiv.org/content/10.1101/2020.04.15.20067256v1. [Accessed: 23-Jun-2022].
[18] “Increasing public health measures could have helped prevent thousands of COVID-19 deaths in India,” University of Michigan News, 22-Jun-2022. [Online]. Available: https://news.umich.edu/increasing-public-health-measures-could-have-helped-prevent-thousands-of-covid-19-deaths-in-india/. [Accessed: 23-Jun-2022].
[19] S. Karamat, “How data science is used in cryptocurrency predictions,” Yahoo! Finance, 29-Apr-2022. [Online]. Available: https://finance.yahoo.com/news/data-science-used-cryptocurrency-predictions-090019899.html. [Accessed: 21-Jun-2022].
[20] “Emissions from electric vehicles,” Alternative Fuels Data Center: Emissions from Electric Vehicles. [Online]. Available: https://afdc.energy.gov/vehicles/electric_emissions.html. [Accessed: 23-Jun-2022].
[21] IEA, “Electric vehicles – analysis,” IEA, 01-Nov-2021. [Online]. Available: https://www.iea.org/reports/electric-vehicles. [Accessed: 23-Jun-2022].
[22] 2022 C. L. May 25, “Electric vs. gas cars: Is it cheaper to drive an ev?,” NRDC, 25-May-2022. [Online]. Available: https://www.nrdc.org/stories/electric-vs-gas-it-cheaper-drive-ev#:~:text=Without%20spark%20plugs%20to%20replace,repair%20as%20gas-powered%20cars. [Accessed: 23-Jun-2022].
[23] V. Penney, “Electric cars are better for the planet – and often your budget, too,” The New York Times, 15-Jan-2021. [Online]. Available: https://www.nytimes.com/interactive/2021/01/15/climate/electric-car-cost.html. [Accessed: 23-Jun-2022].
[24] “Costs and benefits of electric cars vs. conventional vehicles,” EnergySage. [Online]. Available: https://www.energysage.com/electric-vehicles/costs-and-benefits-evs/evs-vs-fossil-fuel-vehicles/. [Accessed: 23-Jun-2022]
[25] “Flight price predictor: Training models to pinpoint the best time for booking,” AltexSoft, 18-Aug-2021. [Online]. Available: https://www.altexsoft.com/blog/flight-price-predictor/. [Accessed: 23-Jun-2022].
[26] R. Ali, “Predicting your business’s future,” Oracle NetSuite. [Online]. Available: https://www.netsuite.com/portal/resource/articles/financial-management/predictive-modeling.shtml. [Accessed: 23-Jun-2022].
[27] X. Yung, S. Luo, K. Gao, T. Qiao, and X. Cheng, “Application of Data Science Technologies in Intelligent Prediction of Traffic Congestion,” Journal of Advanced Transportation, April 2019. [Online] Available: https://www.hindawi.com/journals/jat/2019/2915369/. [Accessed: 22-Jun-2022].
[28] D. Patel, S. John, and F. Kaliangra, “Managing Traffic Flow Based on Predictive Data Analysis,” Advances in Intelligent Systems and Computing, vol. 174, 2013. [Online] Available: https://link.springer.com/chapter/10.1007/978-81-322-0740-5_130. [Accessed: 22-Jun-2022].
[29] “Program of work,” Program of Work | Department of Public Works and Highways. [Online]. Available: https://www.dpwh.gov.ph/DPWH/projects/infrastructure/program_of_work?data=&data_1=All&data_2=widening. [Accessed: 23-Jun-2022].
[30] Y. Han and F. Moutarde, “Statistical Traffic State Analysis in large‐scale transportation networks using locality‐preserving non‐negative matrix factorisation,” IET Intelligent Transport Systems, vol. 7, no. 3, pp. 283–295, 2013.
[31] S. Stoltzman, “Exploratory data analysis of tropical storms in R: R-bloggers,” R, 21-Sep-2017. [Online]. Available: https://www.r-bloggers.com/2017/09/exploratory-data-analysis-of-tropical-storms-in-r/. [Accessed: 21-Jun-2022].
[32] “Information on disaster risk reduction of the member countries,” Asian Disaster Reduction Center(ADRC). [Online]. Available: https://www.adrc.asia/nationinformation.php?NationCode=608&Lang=en. [Accessed: 21-Jun-2022].