Pemrograman Sains Data

2026-03-02

By Morris Alexander Pangaribuan, student at Institut Teknologi Sains Bandung


1. What is main purpose of our study ?

Data science is a discipline that combines statistical, mathematical, and computer science techniques to process and analyze large amounts of data. The goal is to identify patterns, trends, and relationships in the data that can be used to make better, more informed decisions.
Data science is a discipline that combines statistical, mathematical, and computer science techniques to process and analyze large amounts of data. The goal is to identify patterns, trends, and relationships in the data that can be used to make better, more informed decisions.
Data Science Programming also teaches how to use programming languages (such as python and R) to collect, clean, analyze, and visualize large amounts of data.
Data Science Programming also teaches how to use programming languages (such as python and R) to collect, clean, analyze, and visualize large amounts of data.

The main focus is on applying coding logic for information extraction, predictive modeling, and machine learning for data-driven decision making.


2. Why do use learn about it ?

Why do we study data science? Because it allows us to transform raw data into valuable insights for more accurate, faster, and more efficient decision making.
Why do we study data science? Because it allows us to transform raw data into valuable insights for more accurate, faster, and more efficient decision making.

And Programming is crucial in data science for efficiently accessing, cleaning, analyzing, and visualizing large volumes of data. Languages like Python and R are used to manipulate data, build predictive models (machine learning), and automate repetitive tasks. Coding trains the logical thinking and problem-solving skills needed to solve complex data problems.


3. What tools to have to expert about it ?

Here are the details of data science tools based on their function:

  • Programming Languages & Working Environments:

Python: The most popular language with comprehensive libraries (Pandas for data manipulation, NumPy for computation, Scikit-learn for machine learning).

R: Powerful for statistical analysis and graphing (ggplot2, dplyr).

Jupyter Notebook: An interactive environment for writing code and visualization.

SQL: The standard language for retrieving and managing data from relational databases.


  • Data Visualization & BI (Business Intelligence):

Tableau: Powerful and popular interactive visualization.

Power BI: Microsoft’s visualization platform.

Google Data Studio: Web-based data visualization.


  • Big Data & Distributed Processing:

Apache Hadoop: A framework for processing large amounts of data.

Apache Spark: Fast in-memory computing for big data.


  • Collaboration & More:

GitHub: A tool for version control and code collaboration.

SAS: Commercial analytics software for large enterprises.

Excel: A simple analysis tool for spreadsheets.

4. Give your interest domain knowledge of data science ?

  • Primary Interest in Data Science:

Natural Language Processing (NLP): I am particularly interested in the ability to understand, analyze, and generate human language contextually.

Predictive Analytics & Machine Learning (ML): Using historical data to predict future trends and automate cognitive decisions.

Data Visualization & Communication: Transforming complex raw data into understandable, engaging, and actionable insights.

Data Automation (Data Pipeline): Helps speed up data workflows, from data cleaning to modeling.


  • Knowledge and Skills in Data Science:

Language Modeling & Deep Learning: I operate using Deep Learning techniques and Large Language Models to understand complex language contexts and patterns.

Data Preprocessing: The ability to assist in cleaning, restructuring, and transforming raw data.

And with Popular tools like SQL, Python libraries (Pandas, Scikit-learn), and integration with Google Cloud (BigQuery)