Assingment Week 2

Naisya

Naisya Hafizh Mufidah

NIM = 52250040

Dosen Pengampu = Mr. Bakti Siregar, M.Sc., CDS.

Institut Teknologi Sains Bandung 🔬 Data Science 📈 Basic Statistics


1 What is the main purpose of our study Data Science Programming ?

The main purpose of studying data science programming is to learn how to use programming to collect, clean, process, analyze, and interpret large amounts of data so that we can extract useful insights and make better data-driven decisions. Programming enables data scientists to manage data accurately and efficiently, automate repetitive tasks, and solve complex real-world problems using data.

If data science is a car, then programming is the engine that powers it.

Programming makes data science much more efficient and effective. Saving a lot of time and effort that would otherwise be wasted. With programming, we can manipulate and analyse large datasets with ease, build and test machine learning models. In short, programming is an essential tool in the data scientist’s toolbox that enables them to unlock the full potential of data science and make data-driven decisions with confidence. Without programming skills, dealing with large and complex datasets would be an incredibly time-consuming and challenging task. By mastering programming languages and tools, data scientists can unlock the power of data and generate valuable insights that can drive business decisions and improve outcomes.

2 What do we learn about it ?

When we study data science programming, we learn several important skills and concepts:

1. Data Collection

We learn how to gather data from different sources such as databases, websites, APIs, or files (CSV, Excel, etc.), method of collecting data and the steps. The collected data answers questions, analyzes performance, and predicts business direction. Programming helps automate the process of retrieving and organizing data efficiently.

2. Data Cleaning and Preparation

Raw data is often incomplete, messy, or inconsistent. We learn how to:

  • Handle missing values.
  • Remove duplicates.
  • Correct errors.
  • Transform data into the correct format.

This step is very important because clean data leads to more accurate analysis.

3. Data Exploration and Analysis

We learn how to explore datasets to understand patterns, trends, and relationships. This includes:

  • Calculating statistical measures (mean, median, variance, etc.).
  • Identifying correlations.
  • Detecting outliers.

Programming allows us to analyze large datasets quickly and accurately.

4. Data Visualization

We learn how to present data in visual forms such as:

  • Charts.
  • Graphs.
  • Dashboards.

Visualization helps communicate insights clearly to stakeholders or decision-makers.

5. Building Predictive Models

We learn how to apply algorithms and machine learning techniques to:

  • Predict future outcomes.
  • Classify data.
  • Detect patterns automatically.

Programming is essential to implement these models and evaluate their performance.

3 What tools should we learn to be experts ?

Programming Languages

  • Python = most widely used for data science because its easy to learn and use, versatile and flexible, large community and libraries, excellent for machine learning and data visualization.

    • Pandas & NumPy: Core for data manipulation and numerical computation.
    • Scikit-learn: Essential for machine learning.
    • Matplotlib/Seaborn: For data visualization.
  • R = specialized for statistical analysis, large community and libraries (e.g., dplyr, ggplot2, tidyr), flexible and easy to use for data manipulation and visualization.

  • SQL = excellent for managing and querying relational databases, good for data cleaning and transformation.

Other Tools

  • Jupyter Notebooks = An interactive environment for blending code, text, and visualizations.
  • Tableau / Power BI = Popular tools for creating dashboards and reports.
  • Excel = Crucial for quick data cleaning, pivot tables, and basic analysis.
  • Big Data & Cloud Tools = like Hadoop, Spark, AWS/Azure for handling huge datasets and deploying models (advanced stage).

4 Give your interest domain knowledge in data science ?

My interest domain in data science is business analytics, particularly in how data can support strategic decision-making and improve organizational performance. Since my study program focuses on business-oriented data science, I am interested in learning how data analysis, predictive modeling, and data visualization can be applied to solve real business problems, optimize operations, and enhance financial and customer insights. As a second-semester student, I am still exploring deeper specializations, but I am especially interested in the role of data in large business environments.

5 Conclusion

In conclusion, studying data science programming is essential for understanding how to transform raw data into meaningful insights that support data-driven decision-making. Through data collection, data cleaning, analysis, visualization, and predictive modeling, students develop both technical and analytical skills that are crucial in solving real-world problems.

Additionally, domain knowledge plays an important role in ensuring that data analysis is relevant and applicable within specific industries, especially in business-oriented environments. As data science continues to grow across various sectors, mastering programming skills and understanding business contexts will help future data professionals contribute effectively to organizational performance and strategic decision-making.

6 References

[1] 1stepGrow. (n.d.). The role of programming in data science. https://1stepgrow.com/the-role-of-programming-in-data-science/

[2] Siregar, B. (n.d.). Data science programming: Introduction to programming. dsciencelabs. https://bookdown.org/dsciencelabs/data_science_programming/00-Introduction-to-Programming.html