QnA
Week 2
INSTITUT TEKNOLOGI SAINS BANDUNG
Name : Dhefio Alim Muzakki
Student ID : 52250014
Major : Data Science
Lecturer : Mr. Bakti Siregar, M.Sc., CDS.
library(tidyverse)
library(readr)
library(ggplot2)
library(dplyr)
library(ggridges)
library(knitr)
library(DT)Question And ASK
- What is the main purpose of our study? data scient programming?
Beyond practical data cleaning, the academic purpose of this study is to establish a new way of knowing.
Unified Inquiry: It seeks to unify statistics, computer science, and domain expertise into a single framework to understand “actual phenomena” through data.
The Fourth Paradigm: Academics view data science as the next stage of science (following experimental, theoretical, and computational science). Its purpose is to handle the “deluge of data” that traditional methods can no longer process, allowing us to ask questions that were previously unanswerable.
Data-Driven Smart Computing: The goal is to move beyond simple “analytics” toward “intelligent systems” that can simulate, predict, and automate complex human and natural environments.
- Why do we learn about it?
Programming is considered the foundation of Data Science. We learn it because it enables professionals to:
Process and clean raw data efficiently: Managing data so it is ready for analysis.
Perform Exploratory Data Analysis (EDA): Uncovering trends and hidden patterns.
Build Machine Learning models: Creating systems that can make accurate predictions.
Create Data Visualizations: Building compelling charts and dashboards for effective storytelling.
Bridge Technical Skills with Domain Knowledge: Allowing the use of tools (like Python and R) to solve real world problems in specific industries.
Solving “Wicked Problems”: We learn it because the most pressing challenges of our century climate change, precision medicine, and resource scarcity are too complex for human intuition alone.
- what tools to have to expert about
The Core Languages
Python: The most versatile tool for AI, machine learning, and automation.
R (and RStudio): Essential for academic research, complex statistical modeling, and high quality data visualization (using ggplot2).
SQL: The “Key to the Vault.” Necessary for communicating with databases to extract and filter large scale structured data.
Specialized Libraries Expertise is defined by how well you use these libraries to perform heavy lifting:
Data Wrangling: Pandas and NumPy (Python) or dplyr and tidyr (R). These allow you to clean “messy” data into “tidy” data.
Machine Learning: Scikit-learn for traditional models; TensorFlow or PyTorch for Deep Learning and AI.
Visualization: Matplotlib and Seaborn for static plots; Plotly or Shiny for interactive web-based dashboards.
Integrated Development Environments Experts use environments that allow for experimentation and documentation:
Jupyter Notebook / JupyterLab: The industry standard for “literate programming,” where you combine live code, equations, and narrative text.
VS Code: Preferred for production-level coding, debugging, and integrating with cloud services.
- give your interest to domain knowledge on data science
Defining the Problem In data science, the hardest part isn’t usually the math; it’s asking the right question.
- Domain expertise allows you to transform a vague business or scientific curiosity into a testable hypothesis. Example: In healthcare, a “high correlation” between two variables might be medically impossible. A coder might miss this, but a domain expert knows it’s likely a data entry error or a “confounding variable.”
Art of Science Feature engineering—the process of selecting and transforming raw variables into meaningful inputs for a model—is where domain knowledge shines.
- Raw data is noisy. Domain knowledge acts as a “signal amplifier.” It tells you that in the airline industry, “delayed minutes” matters less than “missed connections,” or that in finance, “volatility” is more telling than “price.”
Impact: A simple model with brilliant, domain-informed features will almost always outperform a complex “black box” model fed with raw, uncontextualized data.
Avoiding “Spurious Correlations” Statistically, you can find patterns in almost anything if the dataset is large enough
- The Expert’s Role: Domain knowledge is your ethical and logical guardrail. It prevents you from presenting “discoveries” that are actually just coincidences, saving your organization from making expensive, misguided decisions.