Image from WT Grant Foundation
Vast amounts of data are continuously being generated and collected daily. Grus (2019) enumerates that data is collected by websites that track every click a user makes, pedometers that record heart rate and movement, smart gadgets that collect information about human behavior, and many more. Existing data may not look useful at first glance, but studying, analyzing, and drawing insights from them helps humans make sense of the our changing lifestyle patterns; data analysis, and human actions that follow, helps us adapt to our ever-changing world. Additionally, analyzing data also allows humans to improve and optimize already-existing systems to respond to these modern changes and find effective solutions to contemporary issues. Ultimately, analyzing data leads our world to greater progress.
Poorly handled data costs the United States around USD 3.1 trillion each year (Cooper, 2019). Although the emergence of digital technology has saturated the world with data, most data remains unhelpful unless analysis and processing of data takes place. Its wide-ranging utility when analyzed promoted businesses and organizations to adopt data science The ability of data science in identifying opportunities serves as a specific instance. Cooper (2019) elaborates that the exploratory nature of data scientists compels them to question existing assumptions and processes in order to come up with additional methods and analytical algorithms, as they continually interact with the business’s analytics system.
The analysis of data is important because it allows us to make sense of events in our society. By learning data science, analysts are taught fundamental principles to be utilized in gathering useful information from datasets (Provost & Fawcett, 2013). Data analysis breaks down large sets of numbers and information into manageable and comprehensible pieces, where trends and patterns can be observed. From this, people can then utilize this knowledge to make informed data-driven decisions with a different perspective. Because of the power of data analysis, this technique is prominent in many different fields encountered in daily life.
The world nowadays boasts an abundance of information that not only continuously grows, but is also easy to access. The challenge then comes in the form of filtering and processing all this information into relevant and valuable statistics in which various inferences can be made from. This process of handling data is an invaluable procedure in interpreting what information is already available.
In research, data analysis provides professionals with the ability to interpret raw observations and information into conclusions. Historically, the study and analysis of data has provided the academe with evidence-based breakthroughs, traversing through all fields of study. Research and development continue to revolutionize the world we live in today, utilizing data in strengthening its conceptual basis and translating these findings into innovations that can further the efficiency and quality of human life.
Studying and analyzing data, regardless of what field it’s implemented in, helps predict their corresponding trends and streamlines the information flow. As advancements continue to be a constant phenomena and pile up, the information and knowledge needed to aid in such endeavors will be collected and analyzed in a much more efficient manner, contributing to a much faster cycle in research and development.
Image from Geospatial World
In response to the global pandemic, contact tracing has proven to be an effective strategy for counteracting the quick spread of the virus. Thus, there has been an evident increase in the use of contact tracing apps in numerous establishments. Often, these apps require QR code scanning that could automate manual contact tracing, for example, Ateneo’s Blue Pass system. Levy and Stewart (2020) note that digital contact tracing draws upon several fields, including data science. Indeed, one can observe data science being applied in digital contact tracing through the collection and storage of personal user data and health information. This collection of data, when analyzed, then becomes valuable to national leaders and policy-makers who respond to the COVID-19 situation in the country.
Image from Small Business Mentor
The utility of data science also extends to the mundane. Even before finishing one’s query in their search bar, Google already predicts what users want to know and displays dropdowns of possible search queries based on the data on user’s search histories and general search trends (Beam & Kohane, 2018). This feature eases user experiences as reading suggested search queries generally outpaces typing, streamlining internet browsing which has become a day-to-day task in today’s era (Brysbaert, 2019; Majaranta, MacKenzie, Aula & Räihä, 2006). The availability of suggestions also provide opportunities to browse what users might need but did not necessarily think of initially, ultimately expanding the user choices.
Image from Global Reach
Advertisements are extremely prevalent in today’s digital age, with product placement showing up on many different media, most notably on sites like YouTube and Facebook. There is a lot of work that goes on behind the scenes in the creation and distribution of these materials. Data science is heavily used in today’s advertising landscape, particularly with the concept of targeted advertising. Product companies are noted to invest a significant amount of resources in paying market research firms to collect information about their potential consumers (Iyer et. al., 2004). With this knowledge, companies are able to make informed decisions on what types of advertisements would work best in promoting their products, as well as with which groups of people they would have the most luck targeting with their promotional materials.
Image from Roboticsbiz
An application of data science is medical image analysis and computer vision. Through machine learning and other various methods, an inference can be made using information gathered from an image which is beneficial in identifying diverse medical conditions such as the presence of tumors and artery stenosis.
Image from Florida Politics
In the modern world, data science has become a key player in the election process. Aside from the simulation polls conducted, demographics have revolutionized the way candidates strategize their campaigns to populations with respect to their beliefs and principles. Factors such as social class, religion, and age can influence the type of politician that societies aspire to put in power.
Image from Analytic Steps
Data science is used in weather forecasting — as the weather is a factor that can affect almost everyone, if not all of them, systems are made in predicting the climate. It aids people, businesses, and other entities in preparing accordingly based on the incoming weather. An example of this dynamic is when weather predictions help farmers in planning out for their harvest seasons.
Image from FreshDesk
Description: “I ate dinner with a fork. I ate dinner with a friend.”
Chowdhary (2020) illustrates how the two preceding sentences share almost the same structure, only differing in one word. Nevertheless, most readers can tell that the meanings they hold demonstrate differences that go beyond one word change. The first sentence describes how the speaker used a fork to facilitate the eating process, but the same cannot be said for the latter, as the speaker meant that a friend was in close proximity during dinner, rather than using a friend as a tool for eating.
In the contemporary world progressing towards timeliness and efficiency, human intervention in analyzing the abundant volume of natural language data becomes increasingly difficult, especially within any given time limits (Manning, 1999). The nuances in language can be readily sensed by humans that machines cannot comprehend with basic programming.
Addressing the complexities of human language requires a specific application of data science called Natural Language Processing (NLP). NLP covers wide-ranging applications, but one specific application that demonstrates feasibility and utility is the creation of a Messenger chatbot that can address the university related concerns or student life queries of Ateneo students without the need for mobile data or internet, given the ability of Messenger to function on Free Data. The application of NLP allows the chatbot to understand the nuances of the students’ queries to an extent and generate an appropriate response.
Availability of the Data: Information about Ateneo de Manila University lies across its varying websites, such as Ateneo’s official website and LS One.The NLP-based chatbot can also be supplied with relevant data related to student life from ADMU Freedom Wall in Facebook and ADMU subreddit
Statistical Method/s Needed: Manning (2019) points out that NLP combines the swift computational power of machines with repositories of language data and probability theory to determine the common patterns that occur in language use to decide an appropriate response. The NLP-based chatbot can also fine tune its performance, vocabulary, and responses by employing statistical models for concept and structure prediction based on neural network architectures (Cahn, 2017).
Beneficiaries: The NLP-based chatbot streamlines the information seeking process for Ateneo students by uniting various datasets about the university and its student life into one entity that can be accessed with a quick Messenger chat and question. The chatbot also enables students, especially those in the online setting, who are experiencing either momentary or constant internet issues to get their basic concerns addressed.
Image from Making Music Magazine
Description: One field of data science that I would like to look into is the analysis of trends in music composition and consumption. I would study the different decisions made by composers, songwriters and producers in the formation of their music, which of these prevail as the most common compositional choices, as well as which of these choices result in the most commercial success and favor from the audience. An important concept in composition is that of song structure, and this will be a focus in the study that I would pursue.
Availability of the Data: Web scraping will be an essential tool for this study as a large amount of data can be collected through online music pages. One of the most popular music websites is last.fm, which holds a vast database of music artists and respective songs, genres, tempos, key centers, etc. This is a database that we can scrape using certain codes like this “lastfm-scraper” repository in GitHub.
Statistical Method/s Needed: One tool to be used in analysis for this topic will be the tuneR package offered for the programming language R. “The freely available R (R Development Core Team 2007) package tuneR (Ligges 2006) is a framework for statistical analysis and transcription of music time series which provides many tools (e.g. for reading Wave files, estimating fundamental frequencies, etc.)” (Weihs et al., 2007, p. 24)
As seen in pages 7, 13, and 25 of Brad Osborn’s study on the terminally climactic form in rock music, waveform analysis can reveal a lot about the structural progression of songs (Osborn, 2013). This can be performed through different DAWs (Digital Audio Workstation) such as Cakewalk, Reaper, Logic, and Ableton, as well as simpler and cheaper (even free) alternatives like Audacity. Other tests like vocal pitch tracking (p. 6) can also be done through different functions in these DAWs.
Beneficiaries: One group of beneficiaries would be general music listeners, as they will be better able to understand the decisions that go into the creation of the music they listen to. Others especially benefitting from this study would be the music composers and producers, as they can analyze the commonalities between different works, and compare theirs to the many others around. They can then adjust their work as they see fit, whether they want to capitalize on trends and attempt to make hit songs, or go the opposite direction and find unique sounds and tools to incorporate into their musical structure.
Image from Ilija Mihajlovic
Description: According to Western Governors University (2021), computer vision is a method of enabling a machine to extract visual information from materials such as pictures and videos and make inferences and decisions based on this information with the help of machine learning.
Availability of the Data: The data gathered from computer vision mostly consists of pictures and videos in which the availability varies depending on the field and industry.
Statistical Method/s Needed: Various methods are used in the field of computer vision, mainly involving the use of machine learning as well as image analysis.
Beneficiaries: As computer vision in general encompasses a broad scope, a number of different industries would benefit from this. The medical industry would benefit in having a more advanced method for gathering and analyzing info from real world sources such as photos which can be taken in real time. This would also be beneficial in security, with cameras being able to detect potential threats more efficiently.
Image from BBC
Description: As much as technology can be maneuvered to spread and enhance the accessibility of information, the same powerful platforms can be weaponized to foster misinformation and disinformation. As a countermeasure, data science has been utilized to detect patterns of fraudulent behavior through linguistic cues such as “word patterns, syntax construction, readability features” (Kulkarni, 2021).
Availability of the Data: Relevant data for this study can be easily acquired through online repositories such as Nishit Patel’s Fake News Detection repositories accessible on GitHub.
Statistical Method/s Needed: To facilitate the study of fake news detection, the following methods can be utilized: machine learning algorithms, deep learning, natural language processing techniques, and blockchain (Singh et al., 2020). Although there are a vast range of procedural resources to consult, Singh et al. (2020) were able to narrow down the best technology to approach fake news detection: blockchain for ‘critical fake news detection’ and machine learning and natural language processing on the commercial level. Additionally, Wang et al. (2020) mentioned that the accuracy of these methods have improved through reinforcement learning techniques to filter through both the low and high quality samples, specifically citing deep learning as the focus of their study.
Beneficiaries: Media Sites and General Public — Albeit this study does not have a direct beneficiary, journalists along with their respective media sites can indirectly benefit from the detection of fake news as it distinguishes the truth from rumors and false information. In the long run, battling misinformation wards off attacks against the credibility of journalists and correspondents, safeguarding the integrity of honorable news sites.
Image from TMT International Observatory
Description: According to the Space Telescope Science Institute (2018), DDA is the production of astronomical knowledge built on pre-existing databases, somewhat similar to industrial data science wherein the data sets are a byproduct of other investigations rather than taken with the experiment in mind.
Availability of the Data: The data collected are in astronomically (pun intended) large sets and are available and derived from astronomy research oriented databases like the Sloan Digital Sky Survey and Hubble Legacy Archive.
Statistical Method/s Needed: This field requires a heavy reliance on machine learning and algorithms, given the large data sets needed to be processed. Research and development often involve astronomical data awareness, deep understanding of selection biases, and sophisticated multipoint statistics (STSI, 2018).
Beneficiaries: The community that would primarily benefit from these studies are astrophysicists and researchers in the field. The accumulation of the data can help scientists understand otherworldly phenomena better as well as build on our current knowledge of physics and other sciences.