Undergraduate Student in Data Science at Institut Teknologi Sains Bandung
Data science is currently one of the fastest-growing fields in the digital age. Behind all the technology we use every day is data intelligently processed using programming. However, data can’t speak for itself, programming skills are needed to transform raw data into meaningful information. In this assignment, I’ll answer several questions about the purpose of data science programming, why it’s important, what tools need to expert , and the domains of knowledge that I’m interest.
The main purpose of programming in data science is to transform raw data into information that can support decision-making. Programming also aims to process and clean data, discover patterns, trends, and insights from data, create data visualizations that are understandable to commoner, build models that can predict events and automate decision-making (machine learning), and manage large data sets with computation (big data processing).
We learn about Data Science Programming because programming is the key base for data science. Without programming skills, it will be hard to handle and make sense of big data sets efficiently. Through this learning, we learn how data is handled, starting from gathering it, making it clean, looking at it, then analyzing it and creating models from it. Programing helps automate the analysis process, which makes the work quicker, more precise, and more effective. By learning Data Science Programming, we can turn complicated data into useful information that helps make better decisions.
The tools we have to become experts in data science programming are Python and R, each with its own robust libraries. Key Python libraries such as pandas and numpy for data processing; matplotlib and seaborn for data visualization; and scikit-learn for machine learning. Python is newbie-friendly and flexible, that’s why Phyton is popular in industry. Python is compatible for AI, machine learning, and large-scale data analysis.
R’s core libraries such as tidyverse, dplyr, ggplot2, and plotly for data analysis and visualization; and caret for model building. R is popular in academia and research because of its focus on specialized statistics.
In the domain knowledge that I’m most interested in, I choose healthcare because in this field, data science is used to analyze patient medical records, laboratory results, and even disease outbreak reports. With data science, raw health data can be transformed into useful information. For example, machine learning is used to predict a patient’s likelihood of developing a chronic disease based on their medical history, which can provide early detection and help hospitals allocate resources more efficiently. Data visualization also plays a crucial role in communicating disease trends and patterns to medical personnel, presenting data clearly and easily understood, and making decisions that are aligned with both clinical and policy levels.
Data science programming is the key to unlocking the untapped potential of data. This includes understanding why we need to learn programming, what tools we need to master, such as Python and R, and how it can be applied to areas of interest like healthcare. Most importantly, data science isn’t just about coding or crunching numbers. It’s about how we can use data to answer real questions, solve real problems, and make better decisions.
[1] Siregar, B. (n.d.). Data Science Programming: Introduction To Programming. dsciencelabs. https://bookdown.org/dsciencelabs/data_science_programming/00-Introduction-to-Programming.html
[2] Dhar, V. (2013). Data science and prediction. Communications of the ACM, vol. 56(12). https://doi.org/10.1145/2500499