QnA
Week 2
INSTITUT TEKNOLOGI SAINS BANDUNG
Name : Dhefio Alim Muzakki
Student ID : 52250014
Major : Data Science
Lecturer : Mr. Bakti Siregar, M.Sc., CDS.
library(tidyverse)
library(readr)
library(ggplot2)
library(dplyr)
library(ggridges)
library(knitr)
library(DT)Question And ASK
- What is the main purpose of our study? data scient programming?
Beyond practical data cleaning, the academic purpose of this study is to establish a new way of knowing.
Unified Inquiry: It seeks to unify statistics, computer science, and domain expertise into a single framework to understand “actual phenomena” through data.
The Fourth Paradigm: Academics view data science as the next stage of science (following experimental, theoretical, and computational science). Its purpose is to handle the “deluge of data” that traditional methods can no longer process, allowing us to ask questions that were previously unanswerable.
Data-Driven Smart Computing: The goal is to move beyond simple “analytics” toward “intelligent systems” that can simulate, predict, and automate complex human and natural environments.
- Why do we learn about it?
Programming is considered the foundation of Data Science. We learn it because it enables professionals to:
Process and clean raw data efficiently: Managing data so it is ready for analysis.
Perform Exploratory Data Analysis (EDA): Uncovering trends and hidden patterns.
Build Machine Learning models: Creating systems that can make accurate predictions.
Create Data Visualizations: Building compelling charts and dashboards for effective storytelling.
Bridge Technical Skills with Domain Knowledge: Allowing the use of tools (like Python and R) to solve real world problems in specific industries.
Solving “Wicked Problems”: We learn it because the most pressing challenges of our century climate change, precision medicine, and resource scarcity are too complex for human intuition alone.
- what tools to have to expert about
- The Core Languages (The Foundation) You cannot be an expert without a deep command of these three:
Python: The most versatile tool for AI, machine learning, and automation.
R (and RStudio): Essential for academic research, complex statistical modeling, and high quality data visualization (using ggplot2).
SQL: The “Key to the Vault.” Necessary for communicating with databases to extract and filter large scale structured data.
- Specialized Libraries (The Engine) Expertise is defined by how well you use these libraries to perform heavy lifting:
Data Wrangling: Pandas and NumPy (Python) or dplyr and tidyr (R). These allow you to clean “messy” data into “tidy” data.
Machine Learning: Scikit-learn for traditional models; TensorFlow or PyTorch for Deep Learning and AI.
Visualization: Matplotlib and Seaborn for static plots; Plotly or Shiny for interactive web-based dashboards.
- Integrated Development Environments (The Workshop) Experts use environments that allow for experimentation and documentation:
Jupyter Notebook / JupyterLab: The industry standard for “literate programming,” where you combine live code, equations, and narrative text.
VS Code: Preferred for production-level coding, debugging, and integrating with cloud services.