Programing Sains Data 1

M.Fitrah Aidil Harahap

M. FITRAH AIDIL HARAHAP

Data Science Major

Data Science
Major Data Science
Data Science
Student ID 52250031
Data Science
Lecturer Bakti Siregar, M. Sc., CSD.
Subject Basic Statistics

1 Question number one

πŸ“š

DATA SCIENCE PROGRAM
What is the Main Purpose
of Our Study Data Science Programming?

Data Science is a multi-step process that transforms raw data into valuable insights and actionable solutions. To achieve this, several key components work together, ensuring that data is collected, processed, analyzed, and utilized effectively. Below are the core components of Data Science.

πŸ—„οΈ

Data Collection

Gathering data from various sources including databases, APIs, sensors, and web scraping. The quality and diversity of collected data directly influence the reliability of all downstream analysis and insights.

🧹

Data Cleaning & Preprocessing

Removing noise, handling missing values, and transforming data for analysis. This stage ensures that raw data is structured, consistent, and ready to support accurate modeling and exploration.

πŸ”

Exploratory Data Analysis (EDA)

Identifying trends, patterns, and relationships in the data. EDA enables data scientists to understand the dataset deeply before modeling, guiding better analytical decisions and hypothesis generation.

πŸ€–

Machine Learning & AI

Building predictive models and automating decision-making. From regression to deep learning, ML and AI are the engines that allow systems to learn from data and generate intelligent, scalable solutions.

πŸ“Š

Data Visualization

Communicating insights using charts, graphs, and dashboards. Effective visualization bridges the gap between complex analytical results and actionable understanding for stakeholders at all levels.

⚑

Big Data Processing

Managing large datasets using distributed computing frameworks such as Hadoop and Spark. Big Data processing ensures that even massive, high-velocity data streams can be handled efficiently and reliably.

πŸš€

Deployment & Decision-Making

Implementing models into real-world applications. This final stage ensures that Data Science delivers tangible impact β€” integrating trained models into workflows, products, and systems that drive evidence-based decisions.

πŸ—„οΈ
Collection
🧹
Cleaning
πŸ€–
ML & AI
πŸ“Š
Visualization
πŸ”
EDA
⚑
Big Data
πŸš€
Deployment
πŸ’‘
Insights

#DataScience #Components #BigData #MachineLearning

← Click to explore β†’

2 Question number two

πŸ“š

DATA SCIENCE PROGRAM
Why Do We Study
Data Science Programming?

Based on the source material, we learn Data Science Programming because programming is the essential foundation of the entire field. Without it, we cannot effectively manage the massive amounts of information available in the digital age.

🧹

Efficient Data Processing

Raw data is often messy and disorganized. Programming allows us to clean and transform large datasets quickly and accurately, making them ready for meaningful analysis.

πŸ”

Uncovering Patterns (EDA)

It enables us to perform Exploratory Data Analysis to find hidden trends and relationships that aren’t visible to the naked eye, revealing the true story behind the numbers.

πŸ€–

Building Predictive Models

We learn it to create Machine Learning and AI models that can forecast future outcomes and automate complex decision-making, going beyond analysis into intelligent prediction.

πŸ“Š

Data Storytelling

It provides the tools to build compelling visualizations β€” charts and dashboards β€” that translate complex numbers into easy-to-understand insights for decision-makers and stakeholders.

πŸ”‘

Unlocking Value

In today’s world, data is a top-tier asset. Programming is the β€œkey” that unlocks that value for businesses, researchers, and governments to drive innovation and competitive advantage.

🧹
Processing
πŸ”
EDA
πŸ€–
ML & AI
πŸ“Š
Visualization
πŸ”‘
Value

πŸ’‘ In short: We learn Data Science Programming to gain the technical power to turn raw data into actionable intelligence β€” driving smarter decisions across businesses, research, and government.

#DataScience #Programming #MachineLearning #Analytics

← Click to explore β†’

3 Question number three

πŸ› οΈ

DATA SCIENCE PROGRAM
Tools & Technologies
in Data Science

πŸ’»

Core Programming Languages

The material highlights two primary languages as the foundation: Python β€” the preferred choice for AI, Machine Learning, and large-scale data computation; and R β€” the go-to tool for specialized statistical analysis and high-quality data visualization.

πŸ“¦

Essential Libraries & Frameworks

Mastering specific libraries is crucial for data manipulation and modeling. Python users rely on pandas, numpy, matplotlib, seaborn, and scikit-learn. R users work with tidyverse, dplyr, ggplot2, and caret to achieve the same goals.

βš™οΈ

Specialized Technical Tools

Depending on project level, expertise is needed in: NLP tools like spaCy, NLTK, and BERT; Computer Vision with OpenCV and YOLO; Deep Learning via TensorFlow, Keras, and PyTorch; and Big Data systems like Hadoop and Spark.

πŸ“Š

Analytical Techniques

An expert must be proficient in executing: Data Collection using APIs, Web Scraping, and Databases; Exploratory Data Analysis (EDA) for identifying trends and patterns; and Deployment β€” moving models from a local environment into real-world applications.

πŸŽ“

Domain Knowledge

While not a software tool, the material emphasizes that Domain Knowledge is a vital component. You must be an expert in the specific field you are analyzing β€” whether finance, medicine, or retail β€” to truly make sense of the data and produce meaningful, impactful results.

πŸ’»
Languages
πŸ“¦
Libraries
βš™οΈ
Tools
πŸ“Š
Analytics
πŸŽ“
Domain

#Python #RLanguage #MachineLearning #DataScience

← Click to explore β†’

4 Question number four

πŸ“£

DOMAIN KNOWLEDGE
My Interest Domain:
Marketing & Social Media Analytics

According to me, in today’s digital landscape, companies are desperate to understand what people are saying about them in real-time. This makes Marketing one of the most exciting and impactful domains to apply Data Science skills.

πŸ’¬

Sentiment Analysis

Classifying tweets or reviews as positive, negative, or neutral. This helps brands monitor their reputation in real-time and respond quickly to public perception shifts across social media platforms.

🚫

Fake News Detection

Using Machine Learning to filter out misinformation, which is crucial for social media platforms. This protects brand integrity and ensures that audiences receive accurate, trustworthy information.

πŸ€–

Chatbot using NLP

Building AI assistants to handle customer service on marketing platforms. NLP-powered chatbots can respond 24/7, improving customer experience while reducing operational costs for businesses.

🐍

Programming Language

Python is usually preferred in this domain because of its superior NLP (Natural Language Processing) libraries, making it the most powerful tool for analyzing human language at scale.

πŸ—£οΈ

NLP Libraries

spaCy, NLTK, and BERT are the core libraries for understanding human language. They enable machines to read, interpret, and respond to text just like a human would.

πŸ“‘

Data Collection

rtweet (for R) or tweepy (for Python) are used to scrape data directly from social media APIs, allowing us to gather thousands of real user opinions and conversations in minutes.

πŸ”’

Text Transformation

TF-IDF and Tidytext are used to turn words into numbers that a computer can understand, transforming raw text into structured data that machine learning models can process.

πŸ’¬
Sentiment
🚫
Fake News
πŸ€–
Chatbot
πŸ—£οΈ
NLP

#Marketing #NLP #SentimentAnalysis #DataScience

← Click to explore β†’