Assignment Week2~Data Science Programming I
Chandra Rizal Alamsyah
Student Majoring in Data Science at ITSB
NIM: 52250068
Email: chandra240205@gmail.com
1 What is the main purpose of data science programming?
The main purpose of data science programming is to process, analyze, and extract meaningful insights from data in order to support data-driven decision making.
More specifically, it aims to:
Collect and clean raw data (data collection and preprocessing)
Perform exploratory data analysis (EDA)
Build statistical and machine learning models
Generate predictions (predictive modeling)
Automate analytical processes
Communicate findings through visualization and reports
In essence, data science programming transforms raw data into valuable information that can guide strategic decisions.
2 Why do we learn data science?
There are several fundamental reasons why learning data science is important:
- The modern world is data-driven
Major companies such as Google, Amazon, and Netflix rely heavily on data to:
Recommend products and content
Optimize advertisements
Predict customer behavior
- High career demand
Professions such as:
Data Scientist
Data Analyst
Machine Learning Engineer
are in high global demand and offer competitive compensation.
- Objective decision-making
Data science enables organizations to make decisions based on statistical evidence and quantitative analysis rather than intuition alone.
- Broad applicability
Data science is relevant in nearly every sector, including finance, healthcare, education, government, technology, and sports.
3 What tools should be mastered to become an expert?
To become proficient in data science, the following tools and skills are essential:
- Programming Languages
Python (most widely used)
R (strong in statistical analysis)
- Python Libraries
NumPy (numerical computing)
Pandas (data manipulation)
Matplotlib / Seaborn (data visualization)
Scikit-learn (machine learning)
TensorFlow / PyTorch (deep learning)
- Databases
SQL
PostgreSQL / MySQL
- Big Data Technologies (advanced level)
Apache Spark
Hadoop
- Supporting Tools
Jupyter Notebook
Git & GitHub
Microsoft Excel (for initial analysis)
Tableau / Power BI (business visualization tools)
In addition, a strong foundation in:
Statistics
Probability
Linear Algebra
Basic Calculus
is crucial for achieving expertise.
4 What are the main domains in data science?
Data science is a broad interdisciplinary field. Some key domains include:
🔹 Machine Learning
Developing predictive models that learn from data.
🔹 Artificial Intelligence (AI)
Building systems that simulate human intelligence.
🔹 Computer Vision
Analyzing and interpreting visual data such as images and videos.
🔹 Natural Language Processing (NLP)
Enabling computers to understand and process human language.
🔹 Data Engineering
Designing and maintaining data pipelines and infrastructure.
🔹 Business Intelligence
Using data analytics to support strategic business decisions.
🔹 Financial Analytics
Risk modeling and market forecasting.
🔹 Healthcare Analytics
Disease prediction and medical data analysis.
5 Reference List
Davenport, T. H., & Patil, D. J. (2012). Data scientist: The sexiest job of the 21st century. Harvard Business Review, 90(10), 70–76.
Provost, F., & Fawcett, T. (2013). Data science for business: What you need to know about data mining and data-analytic thinking. O’Reilly Media.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2021). An introduction to statistical learning: With applications in Python (2nd ed.). Springer.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer.
VanderPlas, J. (2016). Python data science handbook. O’Reilly Media.
McKinney, W. (2017). Python for data analysis (2nd ed.). O’Reilly Media.
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Kelleher, J. D., Mac Namee, B., & D’Arcy, A. (2020). Fundamentals of machine learning for predictive data analytics (2nd ed.). MIT Press.