
Exercise 1
The following table shows sample information for three students. Each
observation represents a single student and includes details such as
their unique student ID, name, age, total credits completed, major field
of study, and year level.
This dataset demonstrates a mixture of variable types:
- Nominal: StudentID, Name, Major
- Numeric: Age (continuous), CreditsCompleted
(discrete)
- Ordinal: YearLevel (Freshman → Senior)
S001 |
Alice |
20 |
45 |
Data Sains |
Sophomore |
S002 |
Budi |
21 |
60 |
Mathematics |
Junior |
S003 |
Citra |
19 |
30 |
Statistics |
Freshman |
# 1. Create vectors for each variable
StudentID <- c("S001", "S002", "S003") # Nominal / ID
Name <- c("Alice", "Budi", "Citra") # Nominal / Name
Age <- c(20, 21, 19) # Numeric / Continuous
CreditsCompleted <- c(45, 60, 30) # Numeric / Discrete
# Nominal
Major <- c("Data Sains", "Mathematics", "Statistics")
# Ordinal
YearLevel <- factor(c("Sophomore", "Junior", "Freshman"),
levels = c("Freshman","Sophomore","Junior","Senior"),
ordered = TRUE)
# 2. Combine all vectors into a data frame
students <- data.frame(
StudentID, Name, Age, CreditsCompleted, Major, YearLevel,
stringsAsFactors = FALSE
)
# 3. Display the data frame
print(students)
## StudentID Name Age CreditsCompleted Major YearLevel
## 1 S001 Alice 20 45 Data Sains Sophomore
## 2 S002 Budi 21 60 Mathematics Junior
## 3 S003 Citra 19 30 Statistics Freshman
Exercise 2
Identify Data Types: Determine the type of data for
each of the following variables:
# Install knitr package if not already installed
# install.packages("knitr")
library(knitr)
# Create a data frame for Data Types
variables_info <- data.frame(
No = 1:5,
Variable = c(
"Number of vehicles passing through the toll road each day",
"Student height in cm",
"Employee gender (Male / Female)",
"Customer satisfaction level: Low, Medium, High",
"Respondent's favorite color: Red, Blue, Green"
),
DataType = c(
"Your Answer",
"Your Answer",
"Your Answer",
"Your Answer",
"Your Answer"
),
Subtype = c(
"Your Answer",
"Your Answer",
"Your Answer",
"Your Answer",
"Your Answer"
),
stringsAsFactors = FALSE
)
# Display the data frame as a neat table
kable(variables_info,
caption = "Table of Variables and Data Types")
Table of Variables and Data Types
1 |
Number of vehicles passing through the toll road each
day |
Your Answer |
Your Answer |
2 |
Student height in cm |
Your Answer |
Your Answer |
3 |
Employee gender (Male / Female) |
Your Answer |
Your Answer |
4 |
Customer satisfaction level: Low, Medium, High |
Your Answer |
Your Answer |
5 |
Respondent’s favorite color: Red, Blue, Green |
Your Answer |
Your Answer |
Exercise 3
Classify Data Sources: Determine whether the
following data comes from internal or external
sources, and whether it is structured or
unstructured:
# Install DT package if not already installed
# install.packages("DT")
library(DT)
# Create a data frame for data sources
data_sources <- data.frame(
No = 1:4,
DataSource = c(
"Daily sales transaction data of the company",
"Weather reports from BMKG",
"Product reviews on social media",
"Warehouse inventory reports"
),
Internal_External = c(
"Your Answer",
"Your Answer",
"Your Answer",
"Your Answer"
),
Structured_Unstructured = c(
"Your Answer",
"Your Answer",
"Your Answer",
"Your Answer"
),
stringsAsFactors = FALSE
)
# Display the data frame as a neat table
datatable(data_sources,
caption = "Table of Data Sources",
rownames = FALSE) # hides the index column
Exercise 4
Dataset Structure: Consider the following
transaction table:
2025-10-01 |
2 |
1000 |
Laptop |
High |
2025-10-01 |
5 |
20 |
Mouse |
Medium |
2025-10-02 |
1 |
1000 |
Laptop |
Low |
2025-10-02 |
3 |
30 |
Keyboard |
Medium |
2025-10-03 |
4 |
50 |
Mouse |
Medium |
2025-10-03 |
2 |
1000 |
Laptop |
High |
2025-10-04 |
6 |
25 |
Keyboard |
Low |
2025-10-04 |
1 |
1000 |
Laptop |
High |
2025-10-05 |
3 |
40 |
Mouse |
Low |
2025-10-05 |
5 |
10 |
Keyboard |
Medium |
Your Assignment Instructions: Creating a
Transactions Table above in R
Create a data frame in R called
transactions
containing the data above.
Identify which variables are numeric and which are
categorical
Calculate total revenue for each transaction by
multiplying Qty × Price
and add it as a new column
Total
.
Compute summary statistics:
- Total quantity sold for each product
- Total revenue per product
- Average price per product
Visualize the data:
- Create a barplot showing total quantity sold per
product.
- Create a pie chart showing the proportion of total
revenue per customer tier.
Optional Challenge:
- Find which date had the highest total revenue.
- Create a stacked bar chart showing quantity sold
per product by customer tier.
Hints: Use data.frame()
,
aggregate()
, barplot()
, pie()
,
and basic arithmetic operations in R.
Exercise 5
Create Your Own Data Frame:
Objective: Create a data frame in R with 30
rows containing a mix of data types: continuous, discrete,
nominal, and ordinal.
Instructions
Open RStudio or the R console.
Create a vector for each column in your data
frame:
- Date: 30 dates (can be sequential or random within
a month/year)
- Continuous: numeric values that can take decimal
values (e.g., height, weight, temperature)
- Discrete: numeric values that can only take whole
numbers (e.g., number of items, number of vehicles)
- Nominal: categorical values with no
order (e.g., color, gender, city)
- Ordinal: categorical values with a defined
order (e.g., Low, Medium, High; Beginner, Intermediate,
Expert)
Combine all vectors into a data frame called
my_data
.
Check your data frame using head()
or View()
to ensure it has 30 rows and the
columns are correct.
Optional tasks:
- Summarize each column using
summary()
- Count the frequency of each category for Nominal
and Ordinal columns using
table()
Hints
- Use
seq.Date()
or as.Date()
to generate
the Date column.
- Use
runif()
or rnorm()
for continuous
numeric data.
- Use
sample()
for discrete, nominal, and ordinal
data.
- Ensure the ordinal vector is created with
factor(..., levels = c("Low","Medium","High"), ordered = TRUE)
(or similar).
