ULIN NIKMAH (52250042)

INSTITUT TEKNOLOGI SAINS BANDUNG

Course:Data Science Programming Study Program:Data Science Lecturer:Bakti Siregar, M.SC., CDS.

Introducation

In this assignment, I will discuss several questions about Data Science Programming, including its main purpose, the reasons for studying it, the tools that should be mastered. and the interesting domain within Data Science. The answers are based on the inroductory material from book Data Science Programming.

1. Question 1

What is the main purpose of our study (Data Science Programming)?

Answer:

Based on the book Data Science Programming, the main purpose of studying this course it to understand how programming is used in the overall Data Science process.

In the introduction section, it is explained that Data Science is an interdisciplinary field that combines:

  • Statistics
  • Mathematics
  • Programming
  • Machine Learning
  • Domain Knowledge (understanding the context of the problem)

So, the main goal is not only to learn how to code, but to understand how coding is used to process data and turn it into useful indormation.

The Book also explains that Data Science has several main stages, such as:

  • Data collection
  • Data cleaning and preprocessing
  • Exploratory Data Analysis (EDA)
  • MOdeling
  • Visualization
  • Interpretation and decision making

Through Data Science Programming, we are trained to perform these processes using rogramming languages instead of doing them manually.

In my opinion, the main purpose is to equip with the abiloty to process data systematically and generate insights that can be used for decision making.

2. Question 2

Why do we learn about it?

Answer:

The book explains that today almost every field generates a large amount of data. However, data will not be useful if it is not analyzed properly.

There are several reasons why we study Data Science Programming:

  • Data is becoming larger and more complex

    Modern data is not just small tables whit a few numbers. It can contain millions of rows. Without programming, it would be very difficult to manage and analyze such data.

  • Programming makes the process more efficient

    With programming, we can:

    1. Automate data analysis processes
    2. Reduce human errors
    3. Repeat analysis easily
    4. Handle large scale datasets
  • Supporting data driven decision making

    The book emphasized that Data Science is used to generate insights that support real world decision making. In my opinion, we study this subject so that we do not only become data users, but also people who are capable of processing and understanding data professionally.

3. Question 3

What tools to have to expert about?

Answer:

The book explains that programming is a core component of Data Science. Some important tool that should be mastered include:

  • Programming Langueages

    Python

    1. Widely used in Data Science
    2. Has many libraries for data analysis and machine learning
    3. Beginner friendly

    R

    1. Strong in statistical analysis
    2. Widely used in research and data visualization
  • Libraries or Packages

    In Phyton:

    1. pandas : data manipulation
    2. numpy : numerical computation
    3. matplotlib & seaborn : data visualization
    4. scikit learn : machine learning

    In R:

    1. tidyverse
    2. dplyr
    3. ggplot2
  • Supporting Tools

    1. Jupyter Notebook or RStudio : coding environments
    2. SQL : for retrieving data from databases

The book highlights that programming helps organize the data analysis workflow in a structured and efficient way

In my opinion, to become expert, it is not enough to only know the names of the tools. We also need to understand the concepts and practice regularly using real datasets.

4. Question 4

Give your interest domain knowledge in data science?

Answer:

The book explains that Data Science is not only about technical skills, but also about domain knowledge, which means understanding the real world context where the data is applied. Some interesting domains in Data Science are:

  • Exploratory Data Analysis (EDA)

    The process of understanding patterns, distributions, and relationships in data before building models. This is Important to avoid incorrect conclusions.

  • Statistics & Mathematics

    They are the foundation for understanding probability, distributions, and statistical inference.

  • Machine Learning

    Used to build predictive and classification models based on data.

  • Data Visualization

    Used to present analysis results in graphical form so they are easier to understand.

  • Domain Applications

    For example:

    1. Business
    2. Healthcare
    3. Finance
    4. Education

Without understanding the context, the result of data analysis can be misinterpreted.

I am interested in Machine Learning and Data Visualization because I find it fascinating how data can be predicted and transformed into clear and understandable information. These fields show how complex data can be processed into something more structured and meaningful.

In the domain application that I am most interested in, I choose the business field, especially in analyzing customer behavior and sales data. I find this interesting because data can be used to predict trends, improve marketing strategies, and help companies make better decisions. By combining technical skills and domain knowledge, a Data Scientist can provide more accurate and meaningful insights.

Conclusion

From the discussion, it can be concluded that Data Science Programming is important for processing data into meaningful information and supporting data driving decision making. Therefore, understanding the concepts, tools, and domain knowledge is essential in this field.

Referensi