Members

INSTITUT TEKNOLOGI SAINS BANDUNG

IDENTITY CARD

Name : Anindya Kristianingputri

Student ID : 52250025

Major : Data Science

Lecturer : Mr. Bakti Siregar, M.Sc., CDS.

IDENTITY CARD

Name : Safina Zahra

Student ID : 52250033

Major : Data Science

Lecturer : Mr. Bakti Siregar, M.Sc., CDS.

IDENTITY CARD

Name : Putri Adria Garini

Student ID : 52250002

Major : Data Science

Lecturer : Mr. Bakti Siregar, M.Sc., CDS.

IDENTITY CARD

Name : Cecilia Mutiara Handayani

Student ID : 52250013

Major : Data Science

Lecturer : Mr. Bakti Siregar, M.Sc., CDS.

IDENTITY CARD

Name : Jihan Ramadhani Deandri

Student ID : 52250024

Major : Data Science

Lecturer : Mr. Bakti Siregar, M.Sc., CDS.

IDENTITY CARD

Name : Hirose Kawarin Sirait

Student ID : 52250012

Major : Data Science

Lecturer : Mr. Bakti Siregar, M.Sc., CDS.

Summary of Basic Statistics

## Column

Chapter 1

Intro to Statistics

What is Statistics?

Statistics is the science that studies how to collect, process, analyze, and present data.

Statistics helps in making decisions based on data.

Data and Variables

Data is a collection of values from observations.

Variables are characteristics whose values can change.

Examples: age, height, income.

Statistical Problems

The main statistical problem from a dataset is determining what information you want to get from the data.

This problem becomes the basis for determining the analysis method to be used.

Types of Statistics

Descriptive Statistics is used to summarize and describe data.

Inferential Statistics is used to draw conclusions from samples to populations.

Choosing the Right Type

If the analysis goal is only to describe data → Descriptive Statistics.

If the analysis goal is to make conclusions or predictions → Inferential Statistics.

Real-World Examples

• Average student grades
• Pass rate percentages
• Survey result predictions
• Research data analysis

Chapter 2

Data Exploration

Purpose of Data Exploration

Data exploration aims to understand the initial characteristics of numerical data before conducting further analysis.

Numerical Variables

Numerical variables are data in the form of numbers that can undergo mathematical operations.

Examples: age, income, test scores.

Statistical Summary

Statistical summaries are used to briefly summarize numerical variables.

Mean (average)
Median
Minimum & maximum
Standard deviation

Summary Functions

Statistical summaries help you see the distribution and tendencies of data.

Extreme values can be detected early on.

Initial Patterns and Trends

Initial patterns are visible from the consistency of data values or upward and downward tendencies.

Trends help understand the direction of data changes.

Anomalies (Outliers)

Anomalies are values that are very different from most of the data.

Outliers need to be analyzed because they can affect statistical results.

Chapter 3

Basic Visualizations

Histogram

Histograms are used to display numerical data distribution.

This visualization helps you see the shape of data spread.

Boxplot

Boxplots are used to display statistical summaries.

Median, quartiles, and outliers can be clearly seen.

Scatter Plot

Scatter plots are used to see relationships between two numerical variables.

Relationship patterns and anomalies are easily identified.

Why Histogram?

Histograms are chosen because they're effective for viewing data distribution patterns, whether symmetric, skewed, or bimodal.

Why Boxplot?

Boxplots are chosen because they can show measures of center, spread, and detect outliers in a concise way.

Why Scatter Plot?

Scatter plots are chosen to identify relationships, trends, and patterns between numerical variables.

Chapter 4

Central Tendency

Variable 1: Height

Mean, median, and mode are calculated for the height variable.

Mean = Σx / n

Median = middle value

Mode = most frequent value

Variable 2: Weight

Measures of central tendency are also calculated for the weight variable.

Mean = Σx / n

Median = middle value

Mode = most frequent value

Comparing Values

Mean is sensitive to extreme values, while median is more stable.

Mode shows the most frequently occurring value.

Height Interpretation

If the mean is greater than the median, the data tends to be right-skewed.

Median is more representative when outliers are present.

Weight Interpretation

The difference between mean and median indicates non-symmetric data distribution.

Mode helps identify common weight values.

Conclusion

Measures of central tendency provide a summary of numerical data characteristics.

Using multiple measures helps achieve more accurate interpretation.

Chapter 5

Statistical Dispersion

Range

Range shows the distance between the maximum and minimum values.

Range = max(x) − min(x)

Variance

Variance measures the average squared difference of data from the mean.

σ² = Σ(xᵢ − x̄)² / n

Standard Deviation

Standard deviation is the square root of variance.

σ = √σ²

Range Interpretation

A large range indicates wide data dispersion.

However, range is sensitive to extreme values.

Variance & SD Interpretation

Variance and standard deviation describe how far data spreads from the average.

Large values indicate high data variation.

Conclusion

Measures of dispersion help understand data consistency.

The smaller the dispersion, the more homogeneous the data.

Chapter 6

Essentials of Probability

Defining the Event

The observed event is students passing the statistics exam.

This event is relevant because it's commonly used in academic evaluation.

Sample Space

The sample space consists of all students taking the exam.

Outcomes include passing and failing.

Calculating Probability

Probability is calculated as the ratio of event occurrences to the entire sample space.

P(A) = n(A) / n(S)

Probability Interpretation

A probability value close to 1 indicates a high likelihood of passing.

A value close to 0 indicates a low likelihood.

Statistical Meaning

Probability helps predict event outcomes based on historical data.

Conclusion

Probability calculation provides quantitative information about the likelihood of an event occurring.

Chapter 7

Probability Distributions

Numerical Variable

The numerical variable being analyzed is height.

The data is ratio-scaled and can be measured quantitatively.

Distribution Visualization

Data distribution is visualized using a histogram.

Histograms effectively display data value frequencies.

Visualization Purpose

Visualization aims to observe the shape of data distribution.

Including symmetry and skewness.

Distribution Shape

The distribution appears to be close to a symmetric shape.

Most data points are concentrated around the center.

Distribution Skewness

If the distribution tail is longer to the right, the data is right-skewed.

The opposite indicates left skewness.

Conclusion

Data distribution provides an overview of variable spread patterns.

This analysis helps understand data characteristics.

Chapter 8

Confidence Interval

Parameter Being Estimated

The parameter being estimated is the population mean of a numerical variable.

This value is estimated using sample data.

Confidence Level

The confidence interval used is 95%.

This level is commonly used in statistical analysis.

Interval Calculation

The 95% confidence interval for the mean is calculated with:

x̄ ± z₀.₉₇₅ (σ / √n)

Interval Interpretation

The confidence interval provides a range of values that likely contains the population mean.

Meaning of 95%

This means if sampling is repeated many times, about 95% of the intervals will contain the population mean.

Conclusion

Confidence intervals help measure the uncertainty of statistical estimates.

The narrower the interval, the more precise the estimate.

Chapter 9

Statistical Inference

Null Hypothesis (H₀)

H₀: The average exam score equals the standard value.

There is no significant difference.

Alternative Hypothesis (H₁)

H₁: The average exam score differs from the standard value.

There is a significant difference.

Test Statistic

The test used is the one-sample t-test.

t = (x̄ − μ₀) / (s / √n)

Decision Rule

H₀ is rejected if the p-value is less than the significance level (α).

Typically α = 0.05.

Test Results

If p-value < 0.05, then H₀ is rejected.

If p-value ≥ 0.05, then H₀ fails to be rejected.

Conclusion

Based on the test results, we can conclude whether the mean differs significantly.

The decision is made based on statistical evidence.

Chapter 10

Nonparametric Methods

Type of Test

The nonparametric test used is the Mann-Whitney Test.

This test compares two independent groups.

Hypotheses

H₀: The distributions of both groups are the same.

H₁: The distributions of both groups differ.

Test Statistic

The test is based on data ranks, not original values.

U = min(U₁, U₂)

Reason for Selection

The nonparametric test is used because the data is not normally distributed.

Additionally, the sample size is relatively small.

Advantages

Does not require normality assumption.

More robust against outliers.

Conclusion

The Mann-Whitney test is suitable for comparing two groups with non-normal data.

The test results provide decisions based on data ranks.

Dataset

## Column

Introduction Dataset

Introduction

This dataset contains information about employees from various departments such as Finance, HR, IT, and Marketing during the period from January to November. The data includes employee profiles (gender), training status, satisfaction levels, number of projects completed, sales achievements, working hours along with their categories, and performance scores. This dataset allows for analysis of the relationship between training, satisfaction, working hours, and employee performance, as well as comparisons between departments or genders. Overall, this data provides an overview of operational aspects, productivity, and monthly performance trends of employees within the company.

Variable Classification

Table

All About Basic Visualizations

## Column

Pie-Chart

Interpretation of the 3D Pie Chart

depicts the percentage distribution of employee satisfaction levels, categorized into Low, Medium, and High. The chart shows that Medium satisfaction dominates, indicating that most employees feel fairly satisfied with their jobs. This suggests that while basic expectations are generally met, there is still room for improvement to enhance overall employee well-being and motivation. The High satisfaction category represents a significant proportion, reflecting that certain groups of employees experience positive working conditions, effective training, or a supportive work environment. This group can be considered a benchmark for identifying best practices within the organization.

In contrast, the Low satisfaction segment accounts for the smallest share, but it remains an important concern. Even though the proportion is relatively small, employees in this category may be at risk of decreased productivity or higher turnover if their concerns are not addressed. Overall, the distribution highlights the need for targeted strategies to shift employees from medium to high satisfaction levels in order to improve organizational performance.

Bar-Chart

The horizontal bar chart clearly shows the distribution of employees across departments. Marketing is the department with the highest number of employees, indicating that the organization places strong emphasis on marketing and market-related activities. This may reflect the company’s focus on customer outreach, sales growth, and brand development.

In contrast, HR has the smallest number of employees, suggesting that human resource functions are handled by a relatively lean team. IT also has a lower employee count compared to Marketing and Finance, which may indicate efficient use of technology or outsourcing of certain technical tasks.

Meanwhile, Finance occupies a middle-to-high position, showing a stable workforce allocation to support financial operations, budgeting, and performance monitoring. Overall, the differences in employee distribution highlight how the organization prioritizes its human resources across departments, with a strong concentration in Marketing and a more compact structure in HR and IT.

Line-Chart

Interpretation of the Line Chart

Based on the line chart of the Average Employee Performance Score by Month, employee performance exhibits noticeable fluctuations throughout the observed period. The lowest average performance score is observed in October, indicating a decline in employee performance during this month. This decrease may be associated with factors such as increased workload pressure, fatigue toward the later part of the year, or reduced motivation after sustained work periods. In contrast, the highest average performance score occurs in November, showing a significant improvement in employee performance compared to previous months. This peak suggests that employees may become more focused and productive toward the end of the year, possibly driven by performance evaluations, completion of annual targets, or organizational incentives.

Overall, the visualization reveals a performance decline leading into October followed by a sharp recovery in November. These findings highlight the importance of monitoring employee workload and well-being during periods of decreased performance, while also identifying and reinforcing the factors that contribute to high performance toward the end of the year.

Central_Tendency with Histogram

Based on the visualization of the Performance Score distribution

the data shows a clear pattern of central tendency supported by the histogram and distribution curve. The mean performance score is 46.03, indicating the average performance level of employees. The median value, which represents the middle score of the dataset, is 47.35, while the mode, or the most frequently occurring score, is 36.6. The close proximity between the mean and median suggests that the distribution of performance scores is relatively balanced, with no extreme skewness. Meanwhile, the position of the mode indicates the score range where employee performance is most concentrated. Overall, this visualization demonstrates that employee performance tends to cluster around the central values, reflecting a consistent performance level across the dataset.

Central_Tendency With BOX & SCATTER PLOT

This visualization combines a boxplot and a scatter plot

to show the distribution of employee performance scores. The median, displayed as the line inside the box, represents the central tendency and indicates the typical performance level of employees.

The interquartile range (IQR) illustrates the spread of the middle 50% of the data, helping to assess performance consistency. A wider IQR suggests greater variability, while a narrower IQR indicates more uniform performance. The whiskers show the overall range of the data.

The scatter points represent individual employee scores, allowing interactive exploration of patterns, clustering around the median, and identification of potential outliers. Together, this visualization effectively highlights both central tendency (median) and data dispersion, providing a clear and intuitive understanding of employee performance distribution.

Statistical Dispersion

The violin plot with overlaid boxplot and side scatter points

provides a clear view of the data distribution. Violin shapes show where values are concentrated, boxplots indicate median and IQR, and scatter points reveal individual observations and outliers.

PerformanceScore shows moderate variability with some high and low outliers. WorkHours is consistent, with most values clustered around the median. SalesAchieved has a wider spread, indicating that a few employees outperform others, while ProjectsCompleted highlights individuals completing more projects than the majority.

Overall, this visualization efficiently conveys both central tendency and variability, making it easy to compare distributions and identify outliers across variables.

Probability Distributions

This probability distribution visualization

shows how PerformanceScore and SalesAchieved values are distributed among employees. The histogram displays the relative frequency of each value, while the density curves provide a smooth estimate of the probability function. PerformanceScore is mostly concentrated around the median, with a few employees scoring much higher or lower, indicating some variability in performance. SalesAchieved shows a wider spread, suggesting that while most employees achieve average sales, a few outperform the majority, resulting in higher variance. The combination of histogram and density curve makes it easy to observe central tendency, variability, and outliers, as well as to compare the distribution shapes between performance and sales metrics.

Confidence Interval

## Column

Study Case

STUDY CASE

A company seeks to understand the overall performance level of its workforce based on monthly PerformanceScore data. Management wishes to estimate the true average performance score for the entire employee population. Since it is impractical to measure every employee each month, inferential statistics using confidence intervals are applied to a sample of historical data.

\[n = 200\]

\[S = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}} = 9.2\] \[\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = 46.5\]

\[\alpha = 1 - \text{CL} = 1 - 0.95 = 0.05\]

Calculate Confidence Interval

\[\frac{\alpha}{2} = \frac{1 - 0.95}{2} = \frac{0.05}{2} = 0.025\] \[df = n - 1 = 200 - 1 = 199\] \[ t_{0.025, 199} = 1.972\]

\[SE = \frac{s}{\sqrt{n}} = \frac{9.2}{\sqrt{200}} = \frac{9.2}{14.142} = 0.65\] \[ME = 1.972 \times \frac{9.2}{\sqrt{200}}\]

\[ME = 1.972 \times \frac{9.2}{14.1421}\]

\[ME = 1.972 \times 0.650 = 1.282\]

Interpretation Confidence Interval

Calculating the Confidence Interval

\[\begin{aligned} L &= \bar{x} - ME = 46.5 - 1.282 = 45.218 \\ U &= \bar{x} + ME = 46.5 + 1.282 = 47.782 \end{aligned}\]

Interpretation :

We are 95% confident that the true average performance score for all employees is between 45.22 and 47.78 points. This means if we repeated the study many times, 95% of such intervals would contain the actual company average. While our sample average is 46.5, statistical uncertainty allows the true value to reasonably range from 45.22 to 47.78. Since the entire interval is below 50, overall performance may need improvement.transalat

Statistical Inference

## Column

Study Case

Impact of Training on Employee Performance

This study examines whether completing organizational training significantly improves employee performance scores. Using data from 201 employees across multiple departments, an independent two-sample t-test was conducted to compare performance scores between those who completed training and those who did not. The results show a statistically significant difference, with trained employees scoring higher on average, confirming that training positively and meaningfully impacts employee performance.

Hypothesis

Null Hypothesis (H₀) There is no difference in the mean Performance Score between employees who completed training and those who did not.

Alternative Hypothesis (H₁) There is a difference in the mean Performance Score between employees who completed training and those who did not.

Calculating Statistical Inference

Training Status	n	∑X	Mean	SD
Training = Yes	134	6720.9	50.16	8.57
Training = No	67	2908.7	43.41	9.35

Do the t-test

\[ t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}} \]

\[t = \frac{50.16 - 43.41}{\sqrt{\frac{8.57^2}{134} + \frac{9.35^2}{67}}}\] \[t \approx \frac{6.75}{\sqrt{0.548 + 1.305}} \approx \frac{6.75}{\sqrt{1.853}} \approx \frac{6.75}{1.361} \approx 4.96\]

Conclusion Statistical Inference

\[df \approx \frac{(0.548 + 1.305)^2}{\frac{(0.548)^2}{133} + \frac{(1.305)^2}{66}}\] \[df \approx \frac{(1.853)^2}{0.00224 + 0.02544} \approx \frac{3.433}{0.02768} \approx 124\]

Critical value (α = 0.05, two-tailed): With df ≈ 124, t-critical ≈ ±1.98

Conclusion: The calculated t-value of approximately 4.96 exceeds the critical t-value of ±1.98 at α = 0.05 with about 124 degrees of freedom, leading to the rejection of the null hypothesis (H₀). The corresponding p-value for t = 4.96 with df ≈ 124 is less than 0.0001, indicating that the result is highly statistically significant. This means there is a statistically significant difference in Performance Scores between employees who completed training and those who did not, with trained employees demonstrating a higher average performance score than their untrained counterparts. Therefore, we conclude that completing organizational training has a positive and meaningful impact on employee performance.

Nonparametric Methods

## Column

Kruskal-Wallis Test Analysis

Kruskal-Wallis Test Analysis on Topic Effect of Work Hour Categories on Employee Performance

This investigation examines whether there are statistically significant differences in employee performance scores across three work hour categories: Low, Normal, and High. This analysis helps determine optimal work hour ranges for maximum employee productivity

Reasons for Selecting the Kruskal-Wallis Test :

No Normality Assumption Required: This test does not require the assumption of normally distributed data, making it suitable for data with unknown or non-normal distributions.
Multiple Group Comparison: Designed to compare three or more independent groups, which aligns with the three work hour categories.
Appropriate Measurement Scales: The dependent variable (performance score) is continuous numerical, while the independent variable (work hour category) is categorical, making non-parametric tests like Kruskal-Wallis more appropriate than parametric alternatives

Hypothesis Nonparametric Methods

Research Hypotheses

Null Hypothesis (H₀) There is no significant difference in median performance scores among employees with Low, Normal, and High work hours.

Alternative Hypothesis (H₁) At least one work hour category has a significantly different median performance score compared to the other categories.

Calculate Nonparametric Methods

Among the 201 employee records, 35 individuals are classified in the low work hours category, 101 in the normal category, and 65 in the high category.

From the total of 201 employee performance scores combined and sorted from smallest to largest, the minimum value is 22.9 (rank 1) and the maximum value is 98.2 (rank 201). Several values appear more than once (ties), for example, the score 33.3 appears 5 times in the early rank positions, thus receiving an average rank of 3, and the score 50.6 appears 4 times in the middle positions, assigned an average rank based on its positions. After all ranks are assigned to the combined dataset, these ranks are returned to each employee according to their work hour category. The sum of ranks for each category is: Low category (35 records) has a total rank sum R₁ = 4,312, Normal category (101 records) has a total rank sum R₂ = 19,523, and High category (65 records) has a total rank sum R₃ = 15,165. From this rank pattern, it is evident that most data in the Low category tend to have lower ranks (smaller scores), while the Normal and High categories tend to have higher ranks, although there is some overlap. The sum of all ranks is **201*(201+1)/2 = 20,301**, which matches R₁ + R₂ + R₃ after rounding. These R values will be used to calculate the Kruskal-Wallis test statistic to determine whether the differences in median performance scores among the three work hour categories are statistically significant.

Calculate the

\[\frac{R_1^2}{n_1} = \frac{(4312)^2}{35} = \frac{18{,}593{,}344}{35} = 531{,}238.4\]

\[\frac{R_2^2}{n_2} = \frac{(19523)^2}{101} = \frac{381{,}147{,}529}{101} = 3{,}773{,}737.9\]

\[ \frac{R_3^2}{n_3} = \frac{(15165)^2}{65} = \frac{229{,}977{,}225}{65} = 3{,}538{,}111.2 \]

Nonparametric Methods

## Column

Kruskal-Wallis Test Analysis

Kruskal-Wallis Test Analysis on Topic Effect of Work Hour Categories on Employee Performance

Reasons for Selecting the Kruskal-Wallis Test :

No Normality Assumption Required: This test does not require the assumption of normally distributed data, making it suitable for data with unknown or non-normal distributions.
Multiple Group Comparison: Designed to compare three or more independent groups, which aligns with the three work hour categories.
Appropriate Measurement Scales: The dependent variable (performance score) is continuous numerical, while the independent variable (work hour category) is categorical, making non-parametric tests like Kruskal-Wallis more appropriate than parametric alternatives

Hypothesis Nonparametric Methods

Research Hypotheses

Null Hypothesis (H₀) There is no significant difference in median performance scores among employees with Low, Normal, and High work hours.

Alternative Hypothesis (H₁) At least one work hour category has a significantly different median performance score compared to the other categories.

Calculate Nonparametric Methods

Among the 201 employee records, 35 individuals are classified in the low work hours category, 101 in the normal category, and 65 in the high category.

Calculate the H Test

\[\frac{R_1^2}{n_1} = \frac{(4312)^2}{35} = \frac{18{,}593{,}344}{35} = 531{,}238.4\]

\[\frac{R_2^2}{n_2} = \frac{(19523)^2}{101} = \frac{381{,}147{,}529}{101} = 3{,}773{,}737.9\]

\[ \frac{R_3^2}{n_3} = \frac{(15165)^2}{65} = \frac{229{,}977{,}225}{65} = 3{,}538{,}111.2 \]

\[ \sum \frac{R_i^2}{n_i} \approx 531{,}238.4 + 3{,}282{,}389.4 + 2{,}112{,}870.9\]

\[\sum \frac{R_i^2}{n_i} \approx 5{,}926{,}498.7\]

Calculate The H Test

\[\frac{N(N+1)}{12} = \frac{200 \times 201}{12} = \frac{40{,}200}{12} \approx 3{,}350\]

\[\frac{1}{\frac{N(N+1)}{12}} = \frac{1}{\frac{40{,}200}{12}} \approx 0.0002985\]

\[3(N+1) = 3 \times 201 = 603\]

\[H \approx 1{,}769.8 - 603 \approx 1{,}166.8\]

\[H \approx 1{,}166.8\]

Conclusion

Statistical Decision

Degrees of freedom:

\[df = k - 1 = 3 - 1 = 2\]

The critical chi-square value for α = 0.05 α=0.05 with d f = 2 df=2 is approximately 5.991.

Since

\[H \approx 1{,}166.8\]

is much larger than 5.991, we reject the null hypothesis

Based on the results of the Kruskal–Wallis test, there is a statistically significant difference in median Performance Scores across the three work hour categories—Low, Normal, and High \[(H \approx 1167,\; p < 0.001)\] The extremely large H H value indicates that the ranking of Performance Scores differs substantially between at least one pair of work hour groups. This suggests that work hour classification is associated with employee performance outcomes in this dataset

---
title: "UAS StatDas"
output: 
  flexdashboard::flex_dashboard:
    vertical_layout: scroll
    theme: yeti
    source_code: embed
    css: C:/Users/LENOVO/OneDrive/Desktop/assignment ststc/uas.css  
---

  
```{r setup, include=FALSE}
packages <- c(
  "flexdashboard",
  "tidyverse",
  "highcharter",
  "viridis",
  "DT",
  "gapminder",
  "jsonlite"
)

installed <- packages %in% rownames(installed.packages())
if (any(!installed)) {
  install.packages(packages[!installed])
}

# Load library
library(flexdashboard)
library(tidyverse)
library(highcharter)
library(viridis)
library(DT)
library(gapminder)
library(jsonlite)
```

Members {data-orientation=rows}
=======================================================================

```{=html}
<div class="layout-wrapper">

  <!-- ===== KIRI ===== -->
  <div class="left-panel">
    <h1 class="itsb-title">INSTITUT TEKNOLOGI SAINS BANDUNG</h1>
    <img src="C:/Users/LENOVO/OneDrive/Desktop/assignment ststc/itsb.png"" class="logo-itsb">
    <img src="C:/Users/LENOVO/OneDrive/Desktop/assignment ststc/oow.jpeg"" class="group-photo">
  </div>

  <!-- ===== GARIS ===== -->
  <div class="divider"></div>

  <!-- ===== KANAN ===== -->
  <div class="right-panel">
    <div class="card-container">

      <!-- CARD 1 -->
      <div class="id-card">

        <div class="id-header">
          <div class="badge">IDENTITY CARD</div>
        </div>
        
        <div class="id-body">
          <div class="id-text">
            <p>Name : Anindya Kristianingputri</p>
            <p>Student ID : 52250025</p>
            <p>Major : Data Science</p>
            <p>Lecturer : Mr. Bakti Siregar, M.Sc., CDS.</p>
          </div>
          <div class="id-photo">
            <img src="C:/Users/LENOVO/OneDrive/Desktop/assignment ststc/aning imut.jpg">
          </div>
        </div>
      </div>

      <!-- CARD 2 -->
      <div class="id-card">
      
        <div class="id-header">
          <div class="badge">IDENTITY CARD</div>
        </div>
        
        <div class="id-body">
          <div class="id-text">
            <p>Name : Safina Zahra</p>
            <p>Student ID : 52250033</p>
            <p>Major : Data Science</p>
            <p>Lecturer : Mr. Bakti Siregar, M.Sc., CDS.</p>
          </div>
          <div class="id-photo">
            <img src="C:/Users/LENOVO/OneDrive/Desktop/assignment ststc/safina.jpeg">
          </div>
        </div>
      </div>

      <!-- CARD 3 -->
      <div class="id-card">
      
        <div class="id-header">
          <div class="badge">IDENTITY CARD</div>
        </div>
        
        <div class="id-body">
          <div class="id-text">
            <p>Name : Putri Adria Garini</p>
            <p>Student ID : 52250002</p>
            <p>Major : Data Science</p>
            <p>Lecturer : Mr. Bakti Siregar, M.Sc., CDS.</p>
          </div>
          <div class="id-photo">
            <img src="C:/Users/LENOVO/OneDrive/Desktop/assignment ststc/kallin cantik.jpeg">
          </div>
        </div>
      </div>

      <!-- CARD 4 -->
      <div class="id-card">
      
        <div class="id-header">
          <div class="badge">IDENTITY CARD</div>
        </div>
        
        <div class="id-body">
          <div class="id-text">
            <p>Name : Cecilia Mutiara Handayani</p>
            <p>Student ID : 52250013</p>
            <p>Major : Data Science</p>
            <p>Lecturer : Mr. Bakti Siregar, M.Sc., CDS.</p>
          </div>
          <div class="id-photo">
            <img src="C:/Users/LENOVO/OneDrive/Desktop/assignment ststc/cecing setu.jpeg">
          </div>
        </div>
      </div>

      <!-- CARD 5 -->
      <div class="id-card">
      
        <div class="id-header">
          <div class="badge">IDENTITY CARD</div>
        </div>
        
        <div class="id-body">
          <div class="id-text">
            <p>Name : Jihan Ramadhani Deandri</p>
            <p>Student ID : 52250024</p>
            <p>Major : Data Science</p>
            <p>Lecturer : Mr. Bakti Siregar, M.Sc., CDS.</p>
          </div>
          <div class="id-photo">
            <img src="C:/Users/LENOVO/OneDrive/Desktop/assignment ststc/jihan.jpeg">
          </div>
        </div>
      </div>

      <!-- CARD 6 -->
      <div class="id-card">
      
        <div class="id-header">
          <div class="badge">IDENTITY CARD</div>
        </div>
        
        <div class="id-body">
          <div class="id-text">
            <p>Name : Hirose Kawarin Sirait</p>
            <p>Student ID : 52250012</p>
            <p>Major : Data Science</p>
            <p>Lecturer : Mr. Bakti Siregar, M.Sc., CDS.</p>
          </div>
          <div class="id-photo">
            <img src="C:/Users/LENOVO/OneDrive/Desktop/assignment ststc/karin.jpeg">
          </div>
        </div>
      </div>

    </div>
  </div>
</div>
```


Summary of Basic Statistics {data-orientation=rows}
=======================================================================
  
## Column {.tabset .tabset-fade data-height=520}
-----------------------------------------------------------------------

### Chapter 1 {data-width=1200}
```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */
.scale-wrapper {
  width: 100vw;
  height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
}
.canvas {
  width: 1568px;
  height: 583px;
  background-image: url("background.jpeg");
  background-size: cover;
  background-position: center;
  font-family: 'Segoe UI', sans-serif;
  overflow: hidden;
}
/* JUDUL UTAMA */
.canvas .main-title {
  text-align: center;
  font-size: 32px;
  font-weight: 700;
  margin: 12px 0 6px 0;
  color: #111;
}
/* GRID */
.canvas .card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  grid-template-rows: repeat(2, 1fr);
  gap: 12px;
  padding: 0 34px;
}
/* CARD */
.canvas .card {
  border-radius: 16px;
  padding: 14px;
  box-shadow: 0 6px 16px rgba(0,0,0,0.08);
  overflow: hidden;
}
/* WARNA CARD (SOFT) */
.card-1 { background: #203F9A; }
.card-2 { background: #94C2DA; }
.card-3 { background: #E84797; }
.card-4 { background: #E7A0CC; }
.card-5 { background: #EFE8E0; }
.card-6 { background: #4E7CB2; }
/* JUDUL CARD */
.canvas .card h2 {
  font-size: 16px;
  margin: 0 0 6px 0;
  padding-bottom: 4px;
  border-bottom: 2px solid rgba(255,255,255,0.4);
  color: #ffffff;
}
/* ISI CARD */
.canvas .card p,
.canvas .card b,
.canvas .card li {
  font-size: 12.5px;
  line-height: 1.35;
  color: #ffffff;
  margin: 0 0 4px 0;
}
/* RUMUS */
.canvas .formula {
  background: rgba(255,255,255,0.85);
  color: #111111;
  padding: 6px;
  border-radius: 6px;
  font-family: monospace;
  font-size: 11.5px;
  margin-top: 4px;
}
/* CARD KHUSUS (TEKS ITEM) */
.canvas .card-5 h2,
.canvas .card-5 p,
.canvas .card-5 b {
  color: #111111;
  border-bottom-color: rgba(0,0,0,0.3);
}
</style>
<div class="scale-wrapper">
<div class="canvas">
<div class="main-title">Intro to Statistics</div>
<div class="card-grid">
<div class="card card-1">
<h2>What is Statistics?</h2>
<p>
Statistics is the science that studies how to
<b>collect, process, analyze,
and present data</b>.
</p>
<p>
Statistics helps in making decisions
based on data.
</p>
</div>
<div class="card card-2">
<h2>Data and Variables</h2>
<p>
<b>Data</b> is a collection of values from observations.
</p>
<p>
<b>Variables</b> are characteristics whose values can change.
</p>
<p>
Examples: age, height, income.
</p>
</div>
<div class="card card-3">
<h2>Statistical Problems</h2>
<p>
The main statistical problem from a dataset is
determining <b>what information you want to get</b> from the data.
</p>
<p>
This problem becomes the basis for
determining the analysis method to be used.
</p>
</div>
<div class="card card-4">
<h2>Types of Statistics</h2>
<p>
<b>Descriptive Statistics</b> is used to
summarize and describe data.
</p>
<p>
<b>Inferential Statistics</b> is used to
draw conclusions from samples to populations.
</p>
</div>
<div class="card card-5">
<h2>Choosing the Right Type</h2>
<p>
If the analysis goal is only to
describe data → <b>Descriptive Statistics</b>.
</p>
<p>
If the analysis goal is to make
conclusions or predictions → <b>Inferential Statistics</b>.
</p>
</div>
<div class="card card-6">
<h2>Real-World Examples</h2>
<p>
• Average student grades<br>
• Pass rate percentages<br>
• Survey result predictions<br>
• Research data analysis
</p>
</div>
</div>
</div>
</div>
```

### Chapter 2 {data-width=1200}

```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */
.scale-wrapper {
  width: 100vw;
  height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
}
.canvas {
  width: 1568px;
  height: 583px;
  background-image: url("background.jpeg");
  background-size: cover;
  background-position: center;
  font-family: 'Segoe UI', sans-serif;
  overflow: hidden;
}
/* JUDUL */
.canvas .main-title {
  text-align: center;
  font-size: 32px;
  font-weight: 700;
  margin: 12px 0 6px 0;
  color: #111;
}
/* GRID */
.canvas .card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  grid-template-rows: repeat(2, 1fr);
  gap: 12px;
  padding: 0 34px;
}
/* CARD */
.canvas .card {
  border-radius: 16px;
  padding: 14px;
  box-shadow: 0 6px 16px rgba(0,0,0,0.08);
}
/* WARNA CARD (SOFT) */
.card-1 { background: #203F9A; }
.card-2 { background: #94C2DA; }
.card-3 { background: #E84797; }
.card-4 { background: #E7A0CC; }
.card-5 { background: #EFE8E0; }
.card-6 { background: #4E7CB2; }
/* JUDUL CARD */
.canvas .card h2 {
  font-size: 16px;
  margin: 0 0 6px 0;
  padding-bottom: 4px;
  border-bottom: 2px solid rgba(255,255,255,0.4);
  color: #ffffff;
}
/* ISI */
.canvas .card p,
.canvas .card li,
.canvas .card b {
  font-size: 12.5px;
  line-height: 1.35;
  color: #ffffff;
  margin: 0 0 4px 0;
}
/* KHUSUS CARD POLA & ANOMALI (TEKS HITAM) */
.canvas .card-5 h2,
.canvas .card-5 p,
.canvas .card-5 b,
.canvas .card-6 h2,
.canvas .card-6 p {
  color: #111111;
  border-bottom-color: rgba(0,0,0,0.3);
}
</style>
<div class="scale-wrapper">
<div class="canvas">
<div class="main-title">Data Exploration</div>
<div class="card-grid">
<div class="card card-1">
<h2>Purpose of Data Exploration</h2>
<p>
Data exploration aims to understand the
<b>initial characteristics of numerical data</b>
before conducting further analysis.
</p>
</div>
<div class="card card-2">
<h2>Numerical Variables</h2>
<p>
Numerical variables are data in the form of numbers
that can undergo mathematical operations.
</p>
<p>
Examples: age, income, test scores.
</p>
</div>
<div class="card card-3">
<h2>Statistical Summary</h2>
<p>
Statistical summaries are used to
briefly summarize numerical variables.
</p>
<ul>
<li>Mean (average)</li>
<li>Median</li>
<li>Minimum & maximum</li>
<li>Standard deviation</li>
</ul>
</div>
<div class="card card-4">
<h2>Summary Functions</h2>
<p>
Statistical summaries help you see the
<b>distribution and tendencies of data</b>.
</p>
<p>
Extreme values can be detected early on.
</p>
</div>
<div class="card card-5">
<h2>Initial Patterns and Trends</h2>
<p>
Initial patterns are visible from the
<b>consistency of data values</b>
or upward and downward tendencies.
</p>
<p>
Trends help understand the direction of data changes.
</p>
</div>
<div class="card card-6">
<h2>Anomalies (Outliers)</h2>
<p>
Anomalies are values that are
<b>very different</b>
from most of the data.
</p>
<p>
Outliers need to be analyzed because they can
affect statistical results.
</p>
</div>
</div>
</div>
</div>
```

### Chapter 3 {data-width=1200}

```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */
.scale-wrapper {
  width: 100vw;
  height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
}
.canvas {
  width: 1568px;
  height: 583px;
  background-image: url("background.jpeg");
  background-size: cover;
  background-position: center;
  font-family: 'Segoe UI', sans-serif;
  overflow: hidden;
}
/* JUDUL */
.canvas .main-title {
  text-align: center;
  font-size: 32px;
  font-weight: 700;
  margin: 12px 0 6px 0;
  color: #111;
}
/* GRID */
.canvas .card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  grid-template-rows: repeat(2, 1fr);
  gap: 12px;
  padding: 0 34px;
}
/* CARD */
.canvas .card {
  border-radius: 16px;
  padding: 14px;
  box-shadow: 0 6px 16px rgba(0,0,0,0.08);
}
/* WARNA CARD (SOFT) */
.card-1 { background: #203F9A; }
.card-2 { background: #94C2DA; }
.card-3 { background: #E84797; }
.card-4 { background: #E7A0CC; }
.card-5 { background: #EFE8E0; }
.card-6 { background: #4E7CB2; }
/* JUDUL CARD */
.canvas .card h2 {
  font-size: 16px;
  margin: 0 0 6px 0;
  padding-bottom: 4px;
  border-bottom: 2px solid rgba(255,255,255,0.4);
  color: #ffffff;
}
/* ISI */
.canvas .card p,
.canvas .card li,
.canvas .card b {
  font-size: 12.5px;
  line-height: 1.35;
  color: #ffffff;
  margin: 0 0 4px 0;
}
/* CARD ALASAN (TEKS ITEM) */
.canvas .card-4 h2,
.canvas .card-4 p,
.canvas .card-5 h2,
.canvas .card-5 p,
.canvas .card-6 h2,
.canvas .card-6 p {
  color: #111111;
  border-bottom-color: rgba(0,0,0,0.3);
}
</style>
<div class="scale-wrapper">
<div class="canvas">
<div class="main-title">Basic Visualizations</div>
<div class="card-grid">
<div class="card card-1">
<h2>Histogram</h2>
<p>
Histograms are used to
display <b>numerical data distribution</b>.
</p>
<p>
This visualization helps you see
the shape of data spread.
</p>
</div>
<div class="card card-2">
<h2>Boxplot</h2>
<p>
Boxplots are used to
display <b>statistical summaries</b>.
</p>
<p>
Median, quartiles, and outliers
can be clearly seen.
</p>
</div>
<div class="card card-3">
<h2>Scatter Plot</h2>
<p>
Scatter plots are used to
see <b>relationships between two numerical variables</b>.
</p>
<p>
Relationship patterns and anomalies
are easily identified.
</p>
</div>
<div class="card card-4">
<h2>Why Histogram?</h2>
<p>
Histograms are chosen because they're effective
for viewing data distribution patterns,
whether symmetric, skewed, or bimodal.
</p>
</div>
<div class="card card-5">
<h2>Why Boxplot?</h2>
<p>
Boxplots are chosen because they can
show measures of center,
spread, and detect outliers
in a concise way.
</p>
</div>
<div class="card card-6">
<h2>Why Scatter Plot?</h2>
<p>
Scatter plots are chosen to
identify relationships,
trends, and patterns between numerical variables.
</p>
</div>
</div>
</div>
</div>
```

### Chapter 4  {data-width=1200}

```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */
.scale-wrapper {
  width: 100vw;
  height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
}
.canvas {
  width: 1568px;
  height: 583px;
  background-image: url("background.jpeg");
  background-size: cover;
  background-position: center;
  font-family: 'Segoe UI', sans-serif;
  overflow: hidden;
}
/* JUDUL */
.canvas .main-title {
  text-align: center;
  font-size: 32px;
  font-weight: 700;
  margin: 12px 0 6px 0;
  color: #111;
}
/* GRID */
.canvas .card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  grid-template-rows: repeat(2, 1fr);
  gap: 12px;
  padding: 0 34px;
}
/* CARD */
.canvas .card {
  border-radius: 16px;
  padding: 14px;
  box-shadow: 0 6px 16px rgba(0,0,0,0.08);
}
/* WARNA CARD (SOFT) */
.card-1 { background: #203F9A; }
.card-2 { background: #94C2DA; }
.card-3 { background: #E84797; }
.card-4 { background: #E7A0CC; }
.card-5 { background: #EFE8E0; }
.card-6 { background: #4E7CB2; }
/* JUDUL & ISI DEFAULT */
.canvas .card h2 {
  font-size: 16px;
  margin: 0 0 6px 0;
  padding-bottom: 4px;
  border-bottom: 2px solid rgba(255,255,255,0.4);
  color: #ffffff;
}
.canvas .card p,
.canvas .card b {
  font-size: 12.5px;
  line-height: 1.35;
  color: #ffffff;
  margin: 0 0 4px 0;
}
/* CARD INTERPRETASI — TEKS GELAP */
.canvas .card-4 h2,
.canvas .card-4 p,
.canvas .card-5 h2,
.canvas .card-5 p,
.canvas .card-6 h2,
.canvas .card-6 p {
  color: #111111;
  border-bottom-color: rgba(0,0,0,0.3);
}
/* RUMUS */
.formula {
  background: rgba(255,255,255,0.85);
  color: #111111;
  padding: 6px;
  border-radius: 6px;
  font-family: monospace;
  font-size: 11.5px;
  margin-top: 4px;
}
</style>
<div class="scale-wrapper">
<div class="canvas">
<div class="main-title">Central Tendency</div>
<div class="card-grid">
<div class="card card-1">
<h2>Variable 1: Height</h2>
<p>Mean, median, and mode are calculated for the height variable.</p>
<div class="formula">Mean = Σx / n</div>
<div class="formula">Median = middle value</div>
<div class="formula">Mode = most frequent value</div>
</div>
<div class="card card-2">
<h2>Variable 2: Weight</h2>
<p>Measures of central tendency are also calculated for the weight variable.</p>
<div class="formula">Mean = Σx / n</div>
<div class="formula">Median = middle value</div>
<div class="formula">Mode = most frequent value</div>
</div>
<div class="card card-3">
<h2>Comparing Values</h2>
<p>
Mean is sensitive to extreme values,
while median is more stable.
</p>
<p>
Mode shows the most frequently occurring value.
</p>
</div>
<div class="card card-4">
<h2>Height Interpretation</h2>
<p>
If the mean is greater than the median,
the data tends to be right-skewed.
</p>
<p>
Median is more representative
when outliers are present.
</p>
</div>
<div class="card card-5">
<h2>Weight Interpretation</h2>
<p>
The difference between mean and median
indicates non-symmetric data distribution.
</p>
<p>
Mode helps identify
common weight values.
</p>
</div>
<div class="card card-6">
<h2>Conclusion</h2>
<p>
Measures of central tendency provide
a summary of numerical data characteristics.
</p>
<p>
Using multiple measures
helps achieve more accurate interpretation.
</p>
</div>
</div>
</div>
</div>
```

### Chapter 5 {data-width=1200}

```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */
.scale-wrapper {
  width: 100vw;
  height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
}
.canvas {
  width: 1568px;
  height: 583px;
  background-image: url("background.jpeg");
  background-size: cover;
  background-position: center;
  font-family: 'Segoe UI', sans-serif;
  overflow: hidden;
}
/* JUDUL */
.canvas .main-title {
  text-align: center;
  font-size: 32px;
  font-weight: 700;
  margin: 12px 0 6px 0;
  color: #111;
}
/* GRID */
.canvas .card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  grid-template-rows: repeat(2, 1fr);
  gap: 12px;
  padding: 0 34px;
}
/* CARD */
.canvas .card {
  border-radius: 16px;
  padding: 14px;
  box-shadow: 0 6px 16px rgba(0,0,0,0.08);
}
/* WARNA CARD (SOFT) */
.card-1 { background: #203F9A; }
.card-2 { background: #94C2DA; }
.card-3 { background: #E84797; }
.card-4 { background: #E7A0CC; }
.card-5 { background: #EFE8E0; }
.card-6 { background: #4E7CB2; }
/* JUDUL & ISI DEFAULT */
.canvas .card h2 {
  font-size: 16px;
  margin: 0 0 6px 0;
  padding-bottom: 4px;
  border-bottom: 2px solid rgba(255,255,255,0.4);
  color: #ffffff;
}
.canvas .card p,
.canvas .card b {
  font-size: 12.5px;
  line-height: 1.35;
  color: #ffffff;
  margin: 0 0 4px 0;
}
/* CARD INTERPRETASI — TEKS GELAP */
.canvas .card-4 h2,
.canvas .card-4 p,
.canvas .card-5 h2,
.canvas .card-5 p,
.canvas .card-6 h2,
.canvas .card-6 p {
  color: #111111;
  border-bottom-color: rgba(0,0,0,0.3);
}
/* RUMUS */
.formula {
  background: rgba(255,255,255,0.85);
  color: #111111;
  padding: 6px;
  border-radius: 6px;
  font-family: monospace;
  font-size: 11.5px;
  margin-top: 4px;
}
</style>
<div class="scale-wrapper">
<div class="canvas">
<div class="main-title">Statistical Dispersion</div>
<div class="card-grid">
<div class="card card-1">
<h2>Range</h2>
<p>
Range shows the distance
between the maximum and minimum values.
</p>
<div class="formula">Range = max(x) − min(x)</div>
</div>
<div class="card card-2">
<h2>Variance</h2>
<p>
Variance measures the average
squared difference of data from the mean.
</p>
<div class="formula">σ² = Σ(xᵢ − x̄)² / n</div>
</div>
<div class="card card-3">
<h2>Standard Deviation</h2>
<p>
Standard deviation is the square root
of variance.
</p>
<div class="formula">σ = √σ²</div>
</div>
<div class="card card-4">
<h2>Range Interpretation</h2>
<p>
A large range indicates
wide data dispersion.
</p>
<p>
However, range is sensitive
to extreme values.
</p>
</div>
<div class="card card-5">
<h2>Variance & SD Interpretation</h2>
<p>
Variance and standard deviation
describe how far
data spreads from the average.
</p>
<p>
Large values indicate
high data variation.
</p>
</div>
<div class="card card-6">
<h2>Conclusion</h2>
<p>
Measures of dispersion help
understand data consistency.
</p>
<p>
The smaller the dispersion,
the more homogeneous the data.
</p>
</div>
</div>
</div>
</div>
```

### Chapter 6 {data-width=1200}

```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */
.scale-wrapper {
  width: 100vw;
  height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
}
.canvas {
  width: 1568px;
  height: 583px;
  background-image: url("background.jpeg");
  background-size: cover;
  background-position: center;
  font-family: 'Segoe UI', sans-serif;
  overflow: hidden;
}
/* JUDUL */
.canvas .main-title {
  text-align: center;
  font-size: 32px;
  font-weight: 700;
  margin: 12px 0 6px 0;
  color: #111;
}
/* GRID */
.canvas .card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  grid-template-rows: repeat(2, 1fr);
  gap: 12px;
  padding: 0 34px;
}
/* CARD */
.canvas .card {
  border-radius: 16px;
  padding: 14px;
  box-shadow: 0 6px 16px rgba(0,0,0,0.08);
}
/* WARNA CARD (SOFT) */
.card-1 { background: #203F9A; }
.card-2 { background: #94C2DA; }
.card-3 { background: #E84797; }
.card-4 { background: #E7A0CC; }
.card-5 { background: #EFE8E0; }
.card-6 { background: #4E7CB2; }
/* JUDUL & ISI DEFAULT */
.canvas .card h2 {
  font-size: 16px;
  margin: 0 0 6px 0;
  padding-bottom: 4px;
  border-bottom: 2px solid rgba(255,255,255,0.4);
  color: #ffffff;
}
.canvas .card p,
.canvas .card b {
  font-size: 12.5px;
  line-height: 1.35;
  color: #ffffff;
  margin: 0 0 4px 0;
}
/* CARD INTERPRETASI — TEKS GELAP */
.canvas .card-4 h2,
.canvas .card-4 p,
.canvas .card-5 h2,
.canvas .card-5 p,
.canvas .card-6 h2,
.canvas .card-6 p {
  color: #111111;
  border-bottom-color: rgba(0,0,0,0.3);
}
/* RUMUS */
.formula {
  background: rgba(255,255,255,0.85);
  color: #111111;
  padding: 6px;
  border-radius: 6px;
  font-family: monospace;
  font-size: 11.5px;
  margin-top: 4px;
}
</style>
<div class="scale-wrapper">
<div class="canvas">
<div class="main-title">Essentials of Probability</div>
<div class="card-grid">
<div class="card card-1">
<h2>Defining the Event</h2>
<p>
The observed event is
<b>students passing the statistics exam</b>.
</p>
<p>
This event is relevant because
it's commonly used in academic evaluation.
</p>
</div>
<div class="card card-2">
<h2>Sample Space</h2>
<p>
The sample space consists of
all students taking the exam.
</p>
<p>
Outcomes include passing and failing.
</p>
</div>
<div class="card card-3">
<h2>Calculating Probability</h2>
<p>
Probability is calculated as
the ratio of event occurrences
to the entire sample space.
</p>
<div class="formula">P(A) = n(A) / n(S)</div>
</div>
<div class="card card-4">
<h2>Probability Interpretation</h2>
<p>
A probability value close to 1
indicates a high likelihood of passing.
</p>
<p>
A value close to 0
indicates a low likelihood.
</p>
</div>
<div class="card card-5">
<h2>Statistical Meaning</h2>
<p>
Probability helps
predict event outcomes
based on historical data.
</p>
</div>
<div class="card card-6">
<h2>Conclusion</h2>
<p>
Probability calculation provides
quantitative information
about the likelihood of an event occurring.
</p>
</div>
</div>
</div>
</div>
```

### Chapter 7 {data-width=1200}

```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */
.scale-wrapper {
  width: 100vw;
  height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
}
.canvas {
  width: 1568px;
  height: 583px;
  background-image: url("background.jpeg");
  background-size: cover;
  background-position: center;
  font-family: 'Segoe UI', sans-serif;
  overflow: hidden;
}
/* JUDUL */
.canvas .main-title {
  text-align: center;
  font-size: 32px;
  font-weight: 700;
  margin: 12px 0 6px 0;
  color: #111;
}
/* GRID */
.canvas .card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  grid-template-rows: repeat(2, 1fr);
  gap: 12px;
  padding: 0 34px;
}
/* CARD */
.canvas .card {
  border-radius: 16px;
  padding: 14px;
  box-shadow: 0 6px 16px rgba(0,0,0,0.08);
}
/* WARNA CARD (SOFT) */
.card-1 { background: #203F9A; }
.card-2 { background: #94C2DA; }
.card-3 { background: #E84797; }
.card-4 { background: #E7A0CC; }
.card-5 { background: #EFE8E0; }
.card-6 { background: #4E7CB2; }
/* JUDUL & ISI DEFAULT */
.canvas .card h2 {
  font-size: 16px;
  margin: 0 0 6px 0;
  padding-bottom: 4px;
  border-bottom: 2px solid rgba(255,255,255,0.4);
  color: #ffffff;
}
.canvas .card p,
.canvas .card b {
  font-size: 12.5px;
  line-height: 1.35;
  color: #ffffff;
  margin: 0 0 4px 0;
}
/* CARD DESKRIPSI — TEKS GELAP */
.canvas .card-4 h2,
.canvas .card-4 p,
.canvas .card-5 h2,
.canvas .card-5 p,
.canvas .card-6 h2,
.canvas .card-6 p {
  color: #111111;
  border-bottom-color: rgba(0,0,0,0.3);
}
</style>
<div class="scale-wrapper">
<div class="canvas">
<div class="main-title">Probability Distributions</div>
<div class="card-grid">
<div class="card card-1">
<h2>Numerical Variable</h2>
<p>
The numerical variable being analyzed
is <b>height</b>.
</p>
<p>
The data is ratio-scaled
and can be measured quantitatively.
</p>
</div>
<div class="card card-2">
<h2>Distribution Visualization</h2>
<p>
Data distribution is visualized
using a <b>histogram</b>.
</p>
<p>
Histograms effectively display
data value frequencies.
</p>
</div>
<div class="card card-3">
<h2>Visualization Purpose</h2>
<p>
Visualization aims to
observe the shape of data distribution.
</p>
<p>
Including symmetry and skewness.
</p>
</div>
<div class="card card-4">
<h2>Distribution Shape</h2>
<p>
The distribution appears to be
close to a symmetric shape.
</p>
<p>
Most data points are
concentrated around the center.
</p>
</div>
<div class="card card-5">
<h2>Distribution Skewness</h2>
<p>
If the distribution tail
is longer to the right,
the data is right-skewed.
</p>
<p>
The opposite indicates
left skewness.
</p>
</div>
<div class="card card-6">
<h2>Conclusion</h2>
<p>
Data distribution provides
an overview of variable spread patterns.
</p>
<p>
This analysis helps
understand data characteristics.
</p>
</div>
</div>
</div>
</div>
```

### Chapter 8  {data-width=1200}

```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */
.scale-wrapper {
  width: 100vw;
  height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
}
.canvas {
  width: 1568px;
  height: 583px;
  background-image: url("background.jpeg");
  background-size: cover;
  background-position: center;
  font-family: 'Segoe UI', sans-serif;
  overflow: hidden;
}
/* JUDUL */
.canvas .main-title {
  text-align: center;
  font-size: 32px;
  font-weight: 700;
  margin: 12px 0 6px 0;
  color: #111;
}
/* GRID */
.canvas .card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  grid-template-rows: repeat(2, 1fr);
  gap: 12px;
  padding: 0 34px;
}
/* CARD */
.canvas .card {
  border-radius: 16px;
  padding: 14px;
  box-shadow: 0 6px 16px rgba(0,0,0,0.08);
}
/* WARNA CARD (SOFT) */
.card-1 { background: #203F9A; }
.card-2 { background: #94C2DA; }
.card-3 { background: #E84797; }
.card-4 { background: #E7A0CC; }
.card-5 { background: #EFE8E0; }
.card-6 { background: #4E7CB2; }
/* JUDUL & ISI DEFAULT */
.canvas .card h2 {
  font-size: 16px;
  margin: 0 0 6px 0;
  padding-bottom: 4px;
  border-bottom: 2px solid rgba(255,255,255,0.4);
  color: #ffffff;
}
.canvas .card p,
.canvas .card b {
  font-size: 12.5px;
  line-height: 1.35;
  color: #ffffff;
  margin: 0 0 4px 0;
}
/* CARD INTERPRETASI — TEKS GELAP */
.canvas .card-4 h2,
.canvas .card-4 p,
.canvas .card-5 h2,
.canvas .card-5 p,
.canvas .card-6 h2,
.canvas .card-6 p {
  color: #111111;
  border-bottom-color: rgba(0,0,0,0.3);
}
/* RUMUS */
.formula {
  background: rgba(255,255,255,0.85);
  color: #111111;
  padding: 6px;
  border-radius: 6px;
  font-family: monospace;
  font-size: 11.5px;
  margin-top: 4px;
}
</style>
<div class="scale-wrapper">
<div class="canvas">
<div class="main-title">Confidence Interval</div>
<div class="card-grid">
<div class="card card-1">
<h2>Parameter Being Estimated</h2>
<p>
The parameter being estimated
is the <b>population mean</b>
of a numerical variable.
</p>
<p>
This value is estimated
using sample data.
</p>
</div>
<div class="card card-2">
<h2>Confidence Level</h2>
<p>
The confidence interval
used is <b>95%</b>.
</p>
<p>
This level is commonly used
in statistical analysis.
</p>
</div>
<div class="card card-3">
<h2>Interval Calculation</h2>
<p>
The 95% confidence interval
for the mean is calculated with:
</p>
<div class="formula">x̄ ± z₀.₉₇₅ (σ / √n)</div>
</div>
<div class="card card-4">
<h2>Interval Interpretation</h2>
<p>
The confidence interval
provides a range of values
that likely contains the population mean.
</p>
</div>
<div class="card card-5">
<h2>Meaning of 95%</h2>
<p>
This means if sampling
is repeated many times,
about 95% of the intervals
will contain the population mean.
</p>
</div>
<div class="card card-6">
<h2>Conclusion</h2>
<p>
Confidence intervals help
measure the uncertainty
of statistical estimates.
</p>
<p>
The narrower the interval,
the more precise the estimate.
</p>
</div>
</div>
</div>
</div>
```

### Chapter 9  {data-width=1200}

```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */
.scale-wrapper {
  width: 100vw;
  height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
}
.canvas {
  width: 1568px;
  height: 583px;
  background-image: url("background.jpeg");
  background-size: cover;
  background-position: center;
  font-family: 'Segoe UI', sans-serif;
  overflow: hidden;
}
/* JUDUL */
.canvas .main-title {
  text-align: center;
  font-size: 32px;
  font-weight: 700;
  margin: 12px 0 6px 0;
  color: #111;
}
/* GRID */
.canvas .card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  grid-template-rows: repeat(2, 1fr);
  gap: 12px;
  padding: 0 34px;
}
/* CARD */
.canvas .card {
  border-radius: 16px;
  padding: 14px;
  box-shadow: 0 6px 16px rgba(0,0,0,0.08);
}
/* WARNA CARD (SOFT) */
.card-1 { background: #203F9A; }
.card-2 { background: #94C2DA; }
.card-3 { background: #E84797; }
.card-4 { background: #E7A0CC; }
.card-5 { background: #EFE8E0; }
.card-6 { background: #4E7CB2; }
/* JUDUL & ISI DEFAULT */
.canvas .card h2 {
  font-size: 16px;
  margin: 0 0 6px 0;
  padding-bottom: 4px;
  border-bottom: 2px solid rgba(255,255,255,0.4);
  color: #ffffff;
}
.canvas .card p,
.canvas .card b {
  font-size: 12.5px;
  line-height: 1.35;
  color: #ffffff;
  margin: 0 0 4px 0;
}
/* CARD KESIMPULAN — TEKS GELAP */
.canvas .card-4 h2,
.canvas .card-4 p,
.canvas .card-5 h2,
.canvas .card-5 p,
.canvas .card-6 h2,
.canvas .card-6 p {
  color: #111111;
  border-bottom-color: rgba(0,0,0,0.3);
}
/* RUMUS */
.formula {
  background: rgba(255,255,255,0.85);
  color: #111111;
  padding: 6px;
  border-radius: 6px;
  font-family: monospace;
  font-size: 11.5px;
  margin-top: 4px;
}
</style>
<div class="scale-wrapper">
<div class="canvas">
<div class="main-title">Statistical Inference</div>
<div class="card-grid">
<div class="card card-1">
<h2>Null Hypothesis (H₀)</h2>
<p>
H₀: The average exam score
equals the standard value.
</p>
<p>
There is no significant difference.
</p>
</div>
<div class="card card-2">
<h2>Alternative Hypothesis (H₁)</h2>
<p>
H₁: The average exam score
differs from the standard value.
</p>
<p>
There is a significant difference.
</p>
</div>
<div class="card card-3">
<h2>Test Statistic</h2>
<p>
The test used is the
<b>one-sample t-test</b>.
</p>
<div class="formula">t = (x̄ − μ₀) / (s / √n)</div>
</div>
<div class="card card-4">
<h2>Decision Rule</h2>
<p>
H₀ is rejected if
the p-value is less than
the significance level (α).
</p>
<p>
Typically α = 0.05.
</p>
</div>
<div class="card card-5">
<h2>Test Results</h2>
<p>
If p-value &lt; 0.05,
then H₀ is rejected.
</p>
<p>
If p-value ≥ 0.05,
then H₀ fails to be rejected.
</p>
</div>
<div class="card card-6">
<h2>Conclusion</h2>
<p>
Based on the test results,
we can conclude whether
the mean differs significantly.
</p>
<p>
The decision is made
based on statistical evidence.
</p>
</div>
</div>
</div>
</div>
```

### Chapter 10  {data-width=1200}

```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */
.scale-wrapper {
  width: 100vw;
  height: 100vh;
  display: flex;
  align-items: center;
  justify-content: center;
}
.canvas {
  width: 1568px;
  height: 583px;
  background-image: url("background.jpeg");
  background-size: cover;
  background-position: center;
  font-family: 'Segoe UI', sans-serif;
  overflow: hidden;
}
/* JUDUL */
.canvas .main-title {
  text-align: center;
  font-size: 32px;
  font-weight: 700;
  margin: 12px 0 6px 0;
  color: #111;
}
/* GRID */
.canvas .card-grid {
  display: grid;
  grid-template-columns: repeat(3, 1fr);
  grid-template-rows: repeat(2, 1fr);
  gap: 12px;
  padding: 0 34px;
}
/* CARD */
.canvas .card {
  border-radius: 16px;
  padding: 14px;
  box-shadow: 0 6px 16px rgba(0,0,0,0.08);
}
/* WARNA CARD (SOFT) */
.card-1 { background: #203F9A; }
.card-2 { background: #94C2DA; }
.card-3 { background: #E84797; }
.card-4 { background: #E7A0CC; }
.card-5 { background: #EFE8E0; }
.card-6 { background: #4E7CB2; }
/* JUDUL & ISI DEFAULT */
.canvas .card h2 {
  font-size: 16px;
  margin: 0 0 6px 0;
  padding-bottom: 4px;
  border-bottom: 2px solid rgba(255,255,255,0.4);
  color: #ffffff;
}
.canvas .card p,
.canvas .card b {
  font-size: 12.5px;
  line-height: 1.35;
  color: #ffffff;
  margin: 0 0 4px 0;
}
/* CARD ALASAN & KESIMPULAN — TEKS GELAP */
.canvas .card-4 h2,
.canvas .card-4 p,
.canvas .card-5 h2,
.canvas .card-5 p,
.canvas .card-6 h2,
.canvas .card-6 p {
  color: #111111;
  border-bottom-color: rgba(0,0,0,0.3);
}
/* RUMUS */
.formula {
  background: rgba(255,255,255,0.85);
  color: #111111;
  padding: 6px;
  border-radius: 6px;
  font-family: monospace;
  font-size: 11.5px;
  margin-top: 4px;
}
</style>
<div class="scale-wrapper">
<div class="canvas">
<div class="main-title">Nonparametric Methods</div>
<div class="card-grid">
<div class="card card-1">
<h2>Type of Test</h2>
<p>
The nonparametric test used is the
<b>Mann-Whitney Test</b>.
</p>
<p>
This test compares
two independent groups.
</p>
</div>
<div class="card card-2">
<h2>Hypotheses</h2>
<p>
H₀: The distributions of both groups are the same.
</p>
<p>
H₁: The distributions of both groups differ.
</p>
</div>
<div class="card card-3">
<h2>Test Statistic</h2>
<p>
The test is based on
data ranks, not original values.
</p>
<div class="formula">U = min(U₁, U₂)</div>
</div>
<div class="card card-4">
<h2>Reason for Selection</h2>
<p>
The nonparametric test is used
because the data is not normally distributed.
</p>
<p>
Additionally, the sample size is relatively small.
</p>
</div>
<div class="card card-5">
<h2>Advantages</h2>
<p>
Does not require normality assumption.
</p>
<p>
More robust against outliers.
</p>
</div>
<div class="card card-6">
<h2>Conclusion</h2>
<p>
The Mann-Whitney test is suitable
for comparing two groups
with non-normal data.
</p>
<p>
The test results provide
decisions based on data ranks.
</p>
</div>
</div>
</div>
</div>
```


Dataset {data-orientation=rows}
=======================================================================


## Column {.tabset .tabset-fade data-height=520}
-----------------------------------------------------------------------

### Introduction Dataset {data-width=1200}

<div class="two-column">
<div class="col section-text">

```{=html}
<style>
/* ===================== */
/* ISOLATED STYLE — AMAN */
/* ===================== */

.scale-wrapper {
  width: 100%;
  display: flex;
  justify-content: center;
}

.canvas {
  width: 1400px;
  font-family: 'Segoe UI', sans-serif;
}

/* CARD BESAR */
.card-large {
  background: #203F9A;
  border-radius: 24px;
  padding: 40px;
  box-shadow: 0 8px 20px rgba(0,0,0,0.15);
}

/* JUDUL */
.card-large h2 {
  font-size: 22px;  /* sebelumnya 28px */
  margin: 0 0 20px 0;
  padding-bottom: 10px;
  border-bottom: 3px solid rgba(255,255,255,0.5);
  color: #ffffff;
}

/* ISI */
.card-large p {
  font-size: 16px;  /* sebelumnya 20px */
  line-height: 1.7;
  color: #ffffff;
  margin-bottom: 14px;
}
</style>

<div class="scale-wrapper">
  <div class="canvas">
    <div class="card-large">
      <h2>Introduction</h2>
      <p>
        This dataset contains information about employees from various departments such as Finance, HR, IT, and Marketing during the period from January to November. The data includes employee profiles (gender), training status, satisfaction levels, number of projects completed, sales achievements, working hours along with their categories, and performance scores. This dataset allows for analysis of the relationship between training, satisfaction, working hours, and employee performance, as well as comparisons between departments or genders. Overall, this data provides an overview of operational aspects, productivity, and monthly performance trends of employees within the company.
      </p>
    </div>
  </div>
</div>
```

</div>
<div class="divider"></div>
<div class="col section-text">

**Variable Classification**

```{r variables_table, message=FALSE, warning=FALSE}
library(DT)
library(dplyr)
library(htmltools)

# Create dataframe for variable types
variables_df <- data.frame(
  Variable = c(
    "Gender", "Department", "TrainingCompleted", "Satisfaction", 
    "ProjectsCompleted", "SalesAchieved", "WorkHours", "WorkHoursCategory", 
    "PerformanceScore", "Date"
  ),
  Type = c(
    "Nominal", "Nominal", "Nominal", "Ordinal", 
    "Discrete", "Continuous", "Continuous", "Ordinal", 
    "Continuous", "Nominal"
  ),
  Description = c(
    "Employee's gender", 
    "Employee's department", 
    "Training status (Yes/No)", 
    "Satisfaction level (Low/Medium/High)", 
    "Number of completed projects", 
    "Sales achieved", 
    "Employee work hours", 
    "Work hours category (Low/Normal/High)", 
    "Employee performance score", 
    "Month of observation"
  )
)

# Display as colored datatable
datatable(
  variables_df,
  options = list(
    pageLength = 10,
    scrollX = TRUE,
    fixedHeader = TRUE
  ),
  rownames = FALSE,
  class = "stripe hover cell-border"
) %>%
  formatStyle(
    'Variable', 
    backgroundColor = 'lavender',
    fontWeight = 'bold'
  ) %>%
  formatStyle(
    'Type', 
    backgroundColor = styleEqual(
      c("Nominal", "Ordinal", "Discrete", "Continuous"),
      c("#FFD700", "#87CEEB", "#98FB98", "#FFB6C1")
    ),
    fontWeight = 'bold'
  ) %>%
  formatStyle(
    'Description', 
    backgroundColor = 'aliceblue'
  ) %>%
  tagList(
    tags$style(HTML("
      table.dataTable thead th {
        background: linear-gradient(to right, #4e79a7, #9c5cba);
        color: white !important;
        text-align: center;
        border: 1px solid black !important;
      }
      table.dataTable tbody td {
        vertical-align: middle;
        text-align: center;
        border: 1px solid black !important;
      }
      table.dataTable tbody tr:hover td {
        background-color: #ffe4e1 !important;
      }
    "))
  )
```

</div>
</div>

### Table {data-height=520}

```{r datatable, message=FALSE, warning=FALSE}
library(DT)
library(dplyr)
library(readxl)
library(janitor)
library(htmltools)

# Load data
df <- read_excel("C:/Users/LENOVO/OneDrive/Desktop/assignment ststc/data new.xlsx") %>%
  clean_names() %>%
  mutate(
    performance_score = as.numeric(gsub(",", ".", performance_score)),
    projects_completed = as.numeric(projects_completed),
    sales_achieved = as.numeric(gsub(",", ".", sales_achieved)),
    work_hours = as.numeric(gsub(",", ".", work_hours))
  )

# Datatable modern
datatable(
  df,
  options = list(
    pageLength = 15,
    scrollX = TRUE,
    fixedHeader = TRUE
  ),
  class = "stripe hover cell-border"
) %>%
  formatStyle(
    columns = names(df),
    backgroundColor = "lavenderblush",
    color = "black"
  ) %>%
  tagList(
    tags$style(htmltools::HTML("
      /* Header gradient biru → ungu & border hitam */
      table.dataTable thead th {
        background: linear-gradient(to right, #4e79a7, #9c5cba);
        color: white !important;
        font-weight: bold;
        text-align: center;
        border: 1px solid black !important;
      }

      /* Hover effect */
      table.dataTable tbody tr:hover td {
        background-color: #ffe4e1 !important;
      }

      /* Baris bergantian warna pastel */
      table.dataTable.stripe tbody tr:nth-child(odd) td {
        background-color: #f8f0ff !important;
      }
      table.dataTable.stripe tbody tr:nth-child(even) td {
        background-color: #f0f0ff !important;
      }

      /* Sel tabel rata tengah & border hitam */
      table.dataTable tbody td {
        vertical-align: middle;
        text-align: center;
        border: 1px solid black !important;
      }
    "))
  )
```

All About Basic Visualizations {data-orientation=rows}
=======================================================================

## Column {.tabset .tabset-fade data-height=520}
-----------------------------------------------------------------------


### Pie-Chart {.tab}

<div class="two-column">
  <div class="col section-text">

```{r pie-3d-satisfaction, echo=FALSE, message=FALSE, warning=FALSE}
library(dplyr)
library(highcharter)

# Hitung persentase tiap tingkat kepuasan
satisfaction_df <- df %>%
  count(satisfaction) %>%
  mutate(
    percentage = round(n / sum(n) * 100, 1)
  )

# Buat pie chart 3D interaktif
highchart() %>%
  hc_chart(
    type = "pie",
    options3d = list(enabled = TRUE, alpha = 45),
    height = 480
  ) %>%
  hc_title(
    text = "Percentage Distribution of Employee Satisfaction"
  ) %>%
  hc_add_series(
    name = "Percentage",
    data = list_parse(
      transform(satisfaction_df, name = satisfaction, y = percentage)
    )
  ) %>%
  hc_plotOptions(
    pie = list(
      depth = 35,
      size = "75%",
      allowPointSelect = TRUE,  # efek "terangkat" saat klik/hover
      cursor = "pointer",
      dataLabels = list(
        enabled = TRUE,
        distance = 30,
        format = "<b>{point.name}</b><br>{point.y}%"
      )
    )
  ) %>%
  hc_colors(c(
    "#4E79A7", # Low
    "#F28E2B", # Medium
    "#59A14F"  # High
  ))

```

</div>
<div class="divider"></div>
<div class="col section-text">

**Interpretation of the 3D Pie Chart** 

depicts the percentage distribution of employee satisfaction levels, categorized into Low, Medium, and High. The chart shows that **Medium satisfaction dominates**, indicating that most employees feel **fairly satisfied** with their jobs. This suggests that while basic expectations are generally met, there is still room for improvement to enhance overall employee well-being and motivation. **The High satisfaction category represents a significant proportion**, reflecting that certain groups of employees experience positive working conditions, effective training, or a supportive work environment. This group can be considered a benchmark for identifying best practices within the organization.

In contrast, the **Low satisfaction segment accounts for the smallest share**, but it remains an important concern. Even though the proportion is relatively small, employees in this category may be at risk of decreased productivity or higher turnover if their concerns are not addressed. Overall, the distribution highlights the need for targeted strategies to shift employees from medium to high satisfaction levels in order to improve organizational performance.
</div>
</div>

### Bar-Chart {data-width=600 data-height=510}
  
<div class="two-column">
  <div class="col section-text">

```{r bar-horizontal-department-softbg, echo=FALSE, message=FALSE, warning=FALSE}
library(dplyr)
library(highcharter)

# Count employees per department
department_df <- df %>%
  count(department) %>%
  arrange(n)

# Add different colors for each bar
bar_data <- department_df %>%
  mutate(
    color = c(
      "#4E79A7", # Finance
      "#F28E2B", # HR
      "#59A14F", # IT
      "#E15759"  # Marketing
    )[seq_len(n())]
  )

highchart() %>%
  hc_chart(
    type = "bar",
    height = 380,
    backgroundColor = "#F5F7FB"  # 🌸 soft background
  ) %>%
  hc_title(
    text = "Employee Distribution by Department",
    style = list(
      fontSize = "16px",
      color = "#2C3E50"
    )
  ) %>%
  hc_xAxis(
    categories = bar_data$department,
    title = list(text = NULL),
    labels = list(style = list(color = "#34495E", fontSize = "12px")),
    gridLineColor = "#E1E8ED"
  ) %>%
  hc_yAxis(
    title = list(
      text = "Number of Employees",
      style = list(color = "#34495E")
    ),
    labels = list(style = list(color = "#34495E", fontSize = "12px")),
    gridLineColor = "#E1E8ED"
  ) %>%
  hc_add_series(
    name = "Employees",
    data = list_parse(
      transform(bar_data, y = n, color = color)
    )
  ) %>%
  hc_plotOptions(
    bar = list(
      borderRadius = 6,
      dataLabels = list(
        enabled = TRUE,
        style = list(
          fontSize = "12px",
          color = "#2C3E50"
        )
      )
    )
  ) %>%
  hc_tooltip(
    backgroundColor = "rgba(255,255,255,0.95)",
    borderColor = "#BDC3C7",
    style = list(color = "#2C3E50"),
    pointFormat = "<b>{point.y}</b> employees"
  )

```

</div>
<div class="divider"></div>
<div class="col section-text">

The horizontal bar chart clearly shows the distribution of employees across departments. **Marketing is the department with the highest number of employees**, indicating that the organization places strong emphasis on marketing and market-related activities. This may reflect the company’s focus on customer outreach, sales growth, and brand development.

In contrast, **HR has the smallest number of employees**, suggesting that human resource functions are handled by a relatively lean team. **IT also has a lower employee count compared to Marketing and Finance**, which may indicate efficient use of technology or outsourcing of certain technical tasks.

Meanwhile, **Finance occupies a middle-to-high position**, showing a stable workforce allocation to support financial operations, budgeting, and performance monitoring. Overall, the differences in employee distribution highlight how the organization prioritizes its human resources across departments, with a strong concentration in Marketing and a more compact structure in HR and IT.
</div>
</div>

### Line-Chart {data-width=600 data-height=510}

<div class="two-column">
  <div class="col section-text">
  
```{r line-chart-performance-month, echo=FALSE, message=FALSE, warning=FALSE}
library(dplyr)
library(highcharter)

df$date_clean <- trimws(as.character(df$date))

month_order <- c(
  "Januari", "February", "March", "April", "May", "June",
  "July", "August", "September", "October", "November"
)

df$date_clean <- factor(df$date_clean, levels = month_order)

performance_month <- df %>%
  group_by(date_clean) %>%
  summarise(
    avg_performance = round(mean(performance_score, na.rm = TRUE), 2),
    .groups = "drop"
  ) %>%
  arrange(date_clean)

highchart() %>%
  hc_chart(
    type = "line",
    backgroundColor = "#EAF5EA",
    height = 420
  ) %>%
  hc_title(text = "Average Employee Performance Score by Month") %>%
  hc_xAxis(
    categories = performance_month$date_clean,
    title = list(text = "Month")
  ) %>%
  hc_yAxis(
    title = list(text = "Average Performance Score")
  ) %>%
  hc_add_series(
    name = "Average Performance",
    data = performance_month$avg_performance,
    marker = list(enabled = TRUE, radius = 5)
  ) %>%
  hc_tooltip(pointFormat = "<b>{point.y}</b>") %>%
  hc_colors("#2E7D32")

```

</div>
<div class="divider"></div>
<div class="col section-text">
**Interpretation of the Line Chart**

Based on the line chart of the **Average Employee Performance Score by Month**, employee performance exhibits noticeable fluctuations throughout the observed period. The **lowest average performance score is observed in October**, indicating a decline in employee performance during this month. This decrease may be associated with factors such as increased workload pressure, fatigue toward the later part of the year, or reduced motivation after sustained work periods. In contrast, the **highest average performance score occurs in November**, showing a significant improvement in employee performance compared to previous months. This peak suggests that employees may become more focused and productive toward the end of the year, possibly driven by performance evaluations, completion of annual targets, or organizational incentives.

Overall, the visualization reveals a **performance decline leading into October followed by a sharp recovery in November**. These findings highlight the importance of monitoring employee workload and well-being during periods of decreased performance, while also identifying and reinforcing the factors that contribute to high performance toward the end of the year.
</div>
</div>

### Central_Tendency with Histogram

<div class="two-column">
  <div class="col section-text">

```{r interactive-central-tendency-brown-legend, echo=FALSE, message=FALSE, warning=FALSE}
library(ggplot2)
library(dplyr)
library(plotly)

# Central tendency values
mean_val <- mean(df$performance_score, na.rm = TRUE)
median_val <- median(df$performance_score, na.rm = TRUE)
mode_val <- as.numeric(
  names(sort(table(df$performance_score), decreasing = TRUE)[1])
)

# Plot
p <- ggplot(df, aes(x = performance_score)) +
  geom_histogram(
    aes(y = ..density.., fill = ..count..),
    binwidth = 5,
    color = "white",
    alpha = 0.9
  ) +
  scale_fill_gradient(
    low = "#EFE5DA",   # coklat sangat muda
    high = "#8B5E3C",  # coklat tua
    name = "Frequency"
  ) +
  geom_density(
    aes(color = "Distribution"),
    linewidth = 1.3
  ) +
  geom_vline(
    aes(xintercept = mean_val, color = "Mean"),
    linewidth = 1.3
  ) +
  geom_vline(
    aes(xintercept = median_val, color = "Median"),
    linewidth = 1.3,
    linetype = "dashed"
  ) +
  geom_vline(
    aes(xintercept = mode_val, color = "Mode"),
    linewidth = 1.3,
    linetype = "dotted"
  ) +
  scale_color_manual(
    name = "Central Tendency",
    values = c(
      "Mean" = "#1F77B4",        # biru
      "Median" = "#FF7F0E",     # oranye
      "Mode" = "#2CA02C",       # hijau
      "Distribution" = "#9467BD" # ungu
    )
  ) +
  labs(
    title = "Distribution of Performance Score with Central Tendency",
    x = "Performance Score",
    y = "Density"
  ) +
  theme_minimal() +
  theme(
    legend.position = "bottom",
    plot.title = element_text(size = 14, face = "bold")
  )

ggplotly(p)

```

</div>
<div class="divider"></div>
<div class="col section-text">
**Based on the visualization of the Performance Score distribution**


the data shows a clear pattern of central tendency supported by the histogram and distribution curve. The **mean performance score** is **`r round(mean(df$performance_score, na.rm = TRUE), 2)`**, indicating the average performance level of employees. The **median value**, which represents the middle score of the dataset, is **`r round(median(df$performance_score, na.rm = TRUE), 2)`**, while the **mode**, or the most frequently occurring score, is **`r as.numeric(names(sort(table(df$performance_score), decreasing = TRUE)[1]))`**. The close proximity between the mean and median suggests that the distribution of performance scores is relatively balanced, with no extreme skewness. Meanwhile, the position of the mode indicates the score range where employee performance is most concentrated. Overall, this visualization demonstrates that employee performance tends to cluster around the central values, reflecting a consistent performance level across the dataset.
</div>
</div>


### Central_Tendency With BOX & SCATTER PLOT

<div class="two-column">
  <div class="col section-text">

```{r box-scatter-performance-department, echo=FALSE, message=FALSE, warning=FALSE}
library(ggplot2)
library(plotly)
library(dplyr)

p <- ggplot(df, aes(x = department, y = performance_score)) +
  
  # Boxplot
  geom_boxplot(
    fill = "#DDB892",
    color = "#7F5539",
    alpha = 0.6,
    outlier.shape = NA
  ) +
  
  # Scatter (jitter)
  geom_jitter(
    aes(color = department),
    width = 0.2,
    size = 2,
    alpha = 0.8
  ) +
  
  labs(
    title = "Distribution of Performance Score by Department",
    x = "Department",
    y = "Performance Score"
  ) +
  
  theme_minimal() +
  theme(
    legend.position = "none",
    plot.title = element_text(size = 14, face = "bold")
  )

ggplotly(p)

```

</div>
<div class="divider"></div>
<div class="col section-text">
**This visualization combines a boxplot and a scatter plot**

to show the distribution of employee performance scores. The median, displayed as the line inside the box, represents the central tendency and indicates the typical performance level of employees.

**The interquartile range (IQR)** illustrates the spread of the middle 50% of the data, helping to assess performance consistency. A wider IQR suggests greater variability, while a narrower IQR indicates more uniform performance. The whiskers show the overall range of the data.

**The scatter points** represent individual employee scores, allowing interactive exploration of patterns, clustering around the median, and identification of potential outliers. Together, this visualization effectively highlights both central tendency (median) and data dispersion, providing a clear and intuitive understanding of employee performance distribution.

</div>
</div>


### Statistical Dispersion

<div class="two-column">
  <div class="col section-text">

```{r violin-box-scatter-plotly, echo=FALSE, message=FALSE, warning=FALSE}
library(plotly)
library(dplyr)
library(tidyr)

# Ambil variabel numerik
numeric_df <- df %>% select(where(is.numeric))

# Pivot ke format long
long_df <- numeric_df %>%
  pivot_longer(cols = everything(), names_to = "Variable", values_to = "Value")

# Plot interaktif
fig1 <- long_df %>%
  plot_ly(x = ~Variable, y = ~Value, color = ~Variable, type = "violin",
          box = list(visible = TRUE),
          points = "all",  # menampilkan scatter
          jitter = 0.3,    # geser scatter sedikit
          meanline = list(visible = TRUE),
          scalemode = "count",
          hoverinfo = "x+y") %>%
  layout(title = "Violin + Boxplot + Scatter (Interaktif)")

fig1
```

</div>
<div class="divider"></div>
<div class="col section-text">
**The violin plot with overlaid boxplot and side scatter points**

provides a clear view of the data distribution. Violin shapes show where values are concentrated, boxplots indicate median and IQR, and scatter points reveal individual observations and outliers.

PerformanceScore shows moderate variability with some high and low outliers. WorkHours is consistent, with most values clustered around the median. SalesAchieved has a wider spread, indicating that a few employees outperform others, while ProjectsCompleted highlights individuals completing more projects than the majority.

Overall, this visualization efficiently conveys both central tendency and variability, making it easy to compare distributions and identify outliers across variables.
</div>
</div>


### Probability Distributions

<div class="two-column">
  <div class="col section-text">

```{r probability-distribution-2vars-final, echo=FALSE, message=FALSE, warning=FALSE}
library(plotly)
library(dplyr)

# Pilih 2 variabel numerik
vars <- c("performance_score", "sales_achieved")
df2 <- df %>% select(all_of(vars))

# Warna untuk masing-masing variabel
colors <- c("orange", "green")

# Buat plot interaktif dengan background biru
fig <- plot_ly(layout = list(plot_bgcolor = 'lightblue', paper_bgcolor = 'lightblue'))

for (i in seq_along(vars)) {
  var <- vars[i]
  data_var <- df2[[var]]
  
  # Histogram (probability)
  fig <- fig %>%
    add_histogram(x = data_var, histnorm = "probability",
                  name = paste(var, "hist"),
                  opacity = 0.5, marker = list(color = colors[i]))
  
  # Density curve
  dens <- density(data_var)
  fig <- fig %>%
    add_trace(x = dens$x, y = dens$y, type = 'scatter', mode = 'lines',
              name = paste(var, "density"), line = list(color = colors[i], width = 2))
}

# Layout final dengan judul dan label sumbu
fig <- fig %>%
  layout(title = "Probability Distribution of PerformanceScore and SalesAchieved",
         xaxis = list(title = "Value"),
         yaxis = list(title = "Probability / Density"),
         barmode = "overlay")

fig
```

</div>
<div class="divider"></div>
<div class="col section-text">
**This probability distribution visualization**

shows how **PerformanceScore** and **SalesAchieved** values are distributed among employees. The histogram displays the relative frequency of each value, while the density curves provide a smooth estimate of the probability function. **PerformanceScore** is mostly concentrated around the median, with a few employees scoring much higher or lower, indicating some variability in performance. **SalesAchieved** shows a wider spread, suggesting that while most employees achieve average sales, a few outperform the majority, resulting in higher variance. The combination of histogram and density curve makes it easy to observe **central tendency, variability, and outliers**, as well as to compare the distribution shapes between performance and sales metrics.
</div>
</div>


Confidence Interval  {data-orientation=rows}
=======================================================================

## Column {.tabset .tabset-fade data-height=520}
-----------------------------------------------------------------------

### Study Case

::: {style="background-color:#fffaf0; border-left:6px solid #DAA520; padding:12px; border-radius:8px; margin:20px 0;"}

***STUDY CASE***
<p style="text-align: justify; text-justify: inter-word;">
A company seeks to understand the overall performance level of its workforce based on monthly PerformanceScore data. Management wishes to estimate the true average performance score for the entire employee population. Since it is impractical to measure every employee each month, inferential statistics using confidence intervals are applied to a sample of historical data.
</p>

:::

$$n = 200$$

$$S = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}} = 9.2$$
$$\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} = 46.5$$

$$\alpha = 1 - \text{CL} = 1 - 0.95 = 0.05$$


### Calculate Confidence Interval  {data-orientation=rows}

$$\frac{\alpha}{2} = \frac{1 - 0.95}{2} = \frac{0.05}{2} = 0.025$$
$$df = n - 1 = 200 - 1 = 199$$
$$ t_{0.025, 199} = 1.972$$

$$SE = \frac{s}{\sqrt{n}} = \frac{9.2}{\sqrt{200}} = \frac{9.2}{14.142} = 0.65$$
$$ME = 1.972 \times \frac{9.2}{\sqrt{200}}$$

$$ME = 1.972 \times \frac{9.2}{14.1421}$$

$$ME = 1.972 \times 0.650 = 1.282$$

### Interpretation Confidence Interval 

::: {style="background-color:#f0fff4; border-left:6px solid #2e8b57; padding:12px; border-radius:8px; margin:20px 0;"}

Calculating the Confidence Interval

\begin{aligned}
L &= \bar{x} - ME = 46.5 - 1.282 = 45.218 \\
U &= \bar{x} + ME = 46.5 + 1.282 = 47.782
\end{aligned}

:::

::: {style="background-color:#fff4e6; border-left:6px solid #ff8c00; padding:12px; border-radius:8px; margin:20px 0;"}

***Interpretation*** : 
<p style="text-align: justify; text-justify: inter-word;">
We are 95% confident that the true average performance score for all employees is between 45.22 and 47.78 points. This means if we repeated the study many times, 95% of such intervals would contain the actual company average. While our sample average is 46.5, statistical uncertainty allows the true value to reasonably range from 45.22 to 47.78. Since the entire interval is below 50, overall performance may need improvement.transalat
</p>

:::

Statistical Inference {data-orientation=rows}
=======================================================================

## Column {.tabset .tabset-fade data-height=520}
-----------------------------------------------------------------------

### Study Case
::: {style="background-color:#f8eef1; border-left:6px solid #800020; padding:12px; border-radius:8px; margin:20px 0;"}

***Impact of Training on Employee Performance***
<p style="text-align: justify; text-justify: inter-word;">
This study examines whether completing organizational training significantly improves employee performance scores. Using data from 201 employees across multiple departments, an independent two-sample t-test was conducted to compare performance scores between those who completed training and those who did not. The results show a statistically significant difference, with trained employees scoring higher on average, confirming that training positively and meaningfully impacts employee performance.
</p>
:::

::: {style="background-color:#f2f2f2; border-left:6px solid #808080; padding:12px; border-radius:8px; margin:20px 0;"}

**Hypothesis**

***Null Hypothesis (H₀)***
There is no difference in the mean Performance Score between employees who completed training and those who did not.

***Alternative Hypothesis (H₁)***
There is a difference in the mean Performance Score between employees who completed training and those who did not.

:::


### Calculating Statistical Inference

```{r}
library(formattable)

summary_data <- data.frame(
  `Training Status` = c("Training = Yes", "Training = No"),
  n = c(134, 67),
  `∑X` = c(6720.9, 2908.7),
  Mean = c(50.16, 43.41),
  SD = c(8.57, 9.35),
  check.names = FALSE
)

formattable(summary_data, 
  align = c("l", "c", "c", "c", "c"),
  list(
    `Training Status` = formatter("span", 
      style = x ~ style(display = "block",
                       padding = "0 4px",
                       `border-radius` = "4px",
                       `background-color` = ifelse(x == "Training = Yes", "#4CAF50", "#FF9800"),
                       color = "white",
                       `font-weight` = "bold")),
    n = color_tile("#E1F5FE", "#0277BD"),
    `∑X` = color_tile("#F1F8E9", "#558B2F"),
    Mean = color_tile("#FFF3E0", "#FF9800"),
    SD = color_tile("#FCE4EC", "#AD1457")
  ))
```

***Do the t-test***

$$
t = \frac{\bar{X}_1 - \bar{X}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}
$$

$$t = \frac{50.16 - 43.41}{\sqrt{\frac{8.57^2}{134} + \frac{9.35^2}{67}}}$$
$$t \approx \frac{6.75}{\sqrt{0.548 + 1.305}} 
\approx \frac{6.75}{\sqrt{1.853}} 
\approx \frac{6.75}{1.361} 
\approx 4.96$$


### Conclusion Statistical Inference 

$$df \approx \frac{(0.548 + 1.305)^2}{\frac{(0.548)^2}{133} + \frac{(1.305)^2}{66}}$$
$$df \approx \frac{(1.853)^2}{0.00224 + 0.02544}
\approx \frac{3.433}{0.02768}
\approx 124$$

**Critical value (α = 0.05, two-tailed): With df ≈ 124, t-critical ≈ ±1.98**

:::{style="background-color:#f2f2f2; border-left:6px solid #808080; padding:12px; border-radius:8px; margin:20px 0;"}

<p style="text-align: justify; text-justify: inter-word;">
***Conclusion:***
The calculated t-value of approximately 4.96 exceeds the critical t-value of ±1.98 at α = 0.05 with about 124 degrees of freedom, leading to the rejection of the null hypothesis (H₀). The corresponding p-value for t = 4.96 with df ≈ 124 is less than 0.0001, indicating that the result is highly statistically significant. This means there is a statistically significant difference in Performance Scores between employees who completed training and those who did not, with trained employees demonstrating a higher average performance score than their untrained counterparts. Therefore, we conclude that completing organizational training has a positive and meaningful impact on employee performance.
</p>
:::

Nonparametric Methods {data-orientation=rows}
=======================================================================

## Column {.tabset .tabset-fade data-height=520}
-----------------------------------------------------------------------

### Kruskal-Wallis Test Analysis 
::: {style="background-color:#fffaf0; border-left:6px solid #DAA520; padding:12px; border-radius:8px; margin:20px 0;"}

*Kruskal-Wallis Test Analysis on Topic Effect of Work Hour Categories on Employee Performance* 

This investigation examines whether there are statistically significant differences in employee performance scores across three work hour categories: Low, Normal, and High. This analysis helps determine optimal work hour ranges for maximum employee productivity

:::

::: {style="background-color:#fff1f2; border-left:6px solid #FCA5A5; padding:12px; border-radius:8px; margin:20px 0;"}

<p style="text-align: justify; text-justify: inter-word;">

*Reasons for Selecting the Kruskal-Wallis Test :*

1. No Normality Assumption Required: This test does not require the assumption of normally distributed data, making it suitable for data with unknown or non-normal distributions.

2. Multiple Group Comparison: Designed to compare three or more independent groups, which aligns with the three work hour categories.

3. Appropriate Measurement Scales: The dependent variable (performance score) is continuous numerical, while the independent variable (work hour category) is categorical, making non-parametric tests like Kruskal-Wallis more appropriate than parametric alternatives

</p>
:::

### Hypothesis Nonparametric Methods {data-orientation=rows}

::: {style="background-color:#f0f9ff; border-left:6px solid #93C5FD; padding:12px; border-radius:8px; margin:20px 0;"}

***Research Hypotheses***

***Null Hypothesis (H₀)***
There is no significant difference in median performance scores among employees with Low, Normal, and High work hours.

***Alternative Hypothesis (H₁)***
At least one work hour category has a significantly different median performance score compared to the other categories.

:::

### Calculate Nonparametric Methods 

::: {style="background-color:#e8f5e9; border-left:6px solid #1b5e20; padding:12px; border-radius:8px; margin:20px 0;"}

<p style="text-align: justify; text-justify: inter-word;">
Among the 201 employee records, 35 individuals are classified in the low work hours category, 101 in the normal category, and 65 in the high category.

From the total of 201 employee performance scores combined and sorted from smallest to largest, **the minimum value is 22.9** (rank 1) and **the maximum value is 98.2** (rank 201). Several values appear more than once (ties), for example, the score **33.3** appears 5 times in the early rank positions, thus receiving an **average rank of 3**, and the score **50.6** appears 4 times in the middle positions, assigned an average rank based on its positions. After all ranks are assigned to the combined dataset, these ranks are returned to each employee according to their work hour category. The sum of ranks for each category is: **Low category (35 records)** has a total rank sum R₁ = **4,312**, **Normal category (101 records)** has a total rank sum R₂ = **19,523**, and **High category (65 records)** has a total rank sum R₃ = **15,165**. From this rank pattern, it is evident that most data in the Low category tend to have lower ranks (smaller scores), while the Normal and High categories tend to have higher ranks, although there is some overlap. The sum of all ranks is **201*(201+1)/2 = 20,301**, which matches R₁ + R₂ + R₃ after rounding. These R values will be used to calculate the Kruskal-Wallis test statistic to determine whether the differences in median performance scores among the three work hour categories are statistically significant.
</p> 
:::

### Calculate the 

$$\frac{R_1^2}{n_1}
=
\frac{(4312)^2}{35}
=
\frac{18{,}593{,}344}{35}
=
531{,}238.4$$


$$\frac{R_2^2}{n_2}
=
\frac{(19523)^2}{101}
=
\frac{381{,}147{,}529}{101}
=
3{,}773{,}737.9$$

$$
\frac{R_3^2}{n_3}
=
\frac{(15165)^2}{65}
=
\frac{229{,}977{,}225}{65}
=
3{,}538{,}111.2
$$ 

Nonparametric Methods {data-orientation=rows}
=======================================================================

## Column {.tabset .tabset-fade data-height=520}
-----------------------------------------------------------------------

### Kruskal-Wallis Test Analysis 
::: {style="background-color:#fffaf0; border-left:6px solid #DAA520; padding:12px; border-radius:8px; margin:20px 0;"}

*Kruskal-Wallis Test Analysis on Topic Effect of Work Hour Categories on Employee Performance* 

This investigation examines whether there are statistically significant differences in employee performance scores across three work hour categories: Low, Normal, and High. This analysis helps determine optimal work hour ranges for maximum employee productivity

:::

::: {style="background-color:#fff1f2; border-left:6px solid #FCA5A5; padding:12px; border-radius:8px; margin:20px 0;"}

<p style="text-align: justify; text-justify: inter-word;">

*Reasons for Selecting the Kruskal-Wallis Test :*

1. No Normality Assumption Required: This test does not require the assumption of normally distributed data, making it suitable for data with unknown or non-normal distributions.

2. Multiple Group Comparison: Designed to compare three or more independent groups, which aligns with the three work hour categories.

3. Appropriate Measurement Scales: The dependent variable (performance score) is continuous numerical, while the independent variable (work hour category) is categorical, making non-parametric tests like Kruskal-Wallis more appropriate than parametric alternatives

</p>
:::

### Hypothesis Nonparametric Methods {data-orientation=rows}

::: {style="background-color:#f0f9ff; border-left:6px solid #93C5FD; padding:12px; border-radius:8px; margin:20px 0;"}

***Research Hypotheses***

***Null Hypothesis (H₀)***
There is no significant difference in median performance scores among employees with Low, Normal, and High work hours.

***Alternative Hypothesis (H₁)***
At least one work hour category has a significantly different median performance score compared to the other categories.

:::

### Calculate Nonparametric Methods 

::: {style="background-color:#e8f5e9; border-left:6px solid #1b5e20; padding:12px; border-radius:8px; margin:20px 0;"}

<p style="text-align: justify; text-justify: inter-word;">
Among the 201 employee records, 35 individuals are classified in the low work hours category, 101 in the normal category, and 65 in the high category.

From the total of 201 employee performance scores combined and sorted from smallest to largest, **the minimum value is 22.9** (rank 1) and **the maximum value is 98.2** (rank 201). Several values appear more than once (ties), for example, the score **33.3** appears 5 times in the early rank positions, thus receiving an **average rank of 3**, and the score **50.6** appears 4 times in the middle positions, assigned an average rank based on its positions. After all ranks are assigned to the combined dataset, these ranks are returned to each employee according to their work hour category. The sum of ranks for each category is: **Low category (35 records)** has a total rank sum R₁ = **4,312**, **Normal category (101 records)** has a total rank sum R₂ = **19,523**, and **High category (65 records)** has a total rank sum R₃ = **15,165**. From this rank pattern, it is evident that most data in the Low category tend to have lower ranks (smaller scores), while the Normal and High categories tend to have higher ranks, although there is some overlap. The sum of all ranks is **201*(201+1)/2 = 20,301**, which matches R₁ + R₂ + R₃ after rounding. These R values will be used to calculate the Kruskal-Wallis test statistic to determine whether the differences in median performance scores among the three work hour categories are statistically significant.
</p> 
:::

### Calculate the H Test 

::: {style="background-color:#f5efe6; border-left:6px solid #8b5a2b; padding:12px; border-radius:8px; margin:20px 0;"}


$$\frac{R_1^2}{n_1}
=
\frac{(4312)^2}{35}
=
\frac{18{,}593{,}344}{35}
=
531{,}238.4$$

$$\frac{R_2^2}{n_2}
=
\frac{(19523)^2}{101}
=
\frac{381{,}147{,}529}{101}
=
3{,}773{,}737.9$$

$$
\frac{R_3^2}{n_3}
=
\frac{(15165)^2}{65}
=
\frac{229{,}977{,}225}{65}
=
3{,}538{,}111.2
$$ 

$$
\sum \frac{R_i^2}{n_i}
\approx
531{,}238.4
+
3{,}282{,}389.4
+
2{,}112{,}870.9$$

$$\sum \frac{R_i^2}{n_i}
\approx
5{,}926{,}498.7$$
:::

### Calculate The H Test 
::: {style="background-color:#f5efe6; border-left:6px solid #8b5a2b; padding:12px; border-radius:8px; margin:20px 0;"}

$$\frac{N(N+1)}{12} 
= \frac{200 \times 201}{12} 
= \frac{40{,}200}{12} 
\approx 3{,}350$$

$$\frac{1}{\frac{N(N+1)}{12}} 
= \frac{1}{\frac{40{,}200}{12}} 
\approx 0.0002985$$

$$3(N+1) = 3 \times 201 = 603$$

$$H \approx 1{,}769.8 - 603 \approx 1{,}166.8$$

$$H \approx 1{,}166.8$$

:::

### Conclusion


***Statistical Decision***

Degrees of freedom:

$$df = k - 1 = 3 - 1 = 2$$

The critical chi-square value for 
α
=
0.05
α=0.05 with 
d
f
=
2
df=2 is approximately 5.991.

Since

$$H \approx 1{,}166.8$$

is much larger than 5.991, ***we reject the null hypothesis***
 
::: {style="background-color:#fff4e6; border-left:6px solid #ff8c00; padding:12px; border-radius:8px; margin:20px 0;"}

<p style="text-align: justify; text-justify: inter-word;">
Based on the results of the Kruskal–Wallis test, there is a statistically significant difference in median Performance Scores across the three work hour categories—Low, Normal, and High $$(H \approx 1167,\; p < 0.001)$$
The extremely large 
H
H value indicates that the ranking of Performance Scores differs substantially between at least one pair of work hour groups. This suggests that work hour classification is associated with employee performance outcomes in this dataset
</p> 
:::