Members

Our Member Group

Discover the impact behind the role.

Quality Control
Chandra - Quality Control Specialist
UI / UX Designer
Andra - UI/UX Designer
Data Analyst
Refan - Data Analyst
Project Leader
Cahaya - Project Leader
Documentation
Cloise - Documentation Specialist
Technical Writer
Maiza - Technical Writer

Summary of Basic Statistics

## Column

Chapter 1

Intro to Statistics

Statistics is a fundamental foundation for data analysis and evidencebased decision making.Its main purpose is to convert raw data into reliable information for reasoning and decision-making. Statistics helps simplify data so people can draw conclusions without examining every detail. For example, instead of looking at each student’s exam score individually, a teacher can calculate the average score to get an overall view.Mastery of statistics enables better understanding of patterns, prediction of outcomes, and objective conclusions from data.

Aspect Explanation
Who Researchers, data analysts, policymakers, scientists, healthcare professionals, and anyone who needs to make decisions based on data.
What Statistics is the science of collecting, organizing, analyzing, and interpreting data. It includes descriptive and inferential statistics.
Why It reduces uncertainty, supports data-driven decisions, identifies patterns, and enables predictions.
When Used whenever data exists and decisions must be made; developed formally in the 18th–19th centuries.
Where Applied in business, healthcare, social sciences, engineering, and all data-driven fields.
How By defining problems, collecting data, cleaning, exploring, analyzing, and interpreting results.

Chapter 2

Data Exploration

Data forms the foundation of any analysis, and without a clear understanding of its types and structure, organizing, interpreting, and making accurate decision can be challenging. Classification of data into numeric or quantitativ and categorical or qualitative.Numeric or quantitative data are data that provide information about how much or how many of something that expressed in numbers that represent counts or measurements. Categorical or Qualitative data are data expressed in labels, names, or categories rather than numbers. It describe qualities, attributes, or classification.

Type of Data Sub-Type of Data Explanation Example
Numeric Discreate Consist of countable whole numbers Number of student, Number of cars
Countinuos Consist of measurable values that can take on decimals Height, Weight, Temperature
Categorical Nominal Categories without any natural order or ranking Gender, Blood type, Car brand
Ordinal Categories with a meaningful order or ranking, but without fixed differences between ranks Education level, Satisfaction rating
Internal Sources - Data that coming from within the organization Sales transaction, Employee Data
External Sources - Data obtained from outside the organization Public datasets, Social media
Structured - Data that organized by tables or databases and easy to analyze Sales Data containing transaction date, quantity, price, ID
Unstructured - Data in the from of text, images, videos, or log files that require processing JPEG, PNG, PDF, Blog

Chapter 3

Basic Visualizations

Data visualization is an important process in presenting raw data to be clearer and meaningful. These visualizations not only help us in understanding the distribution, comparison, and relationships between variables in a simple yet informative way, but also provide a basis for deeper analysis, finding hidden patterns and making data-driven decisions with greater confidence.

Type Type Data Function Pros Cons
Line Chart Continuous, Time-series Used to display trends or changes in continuous data over a specific period. Highly effective for time-series data. Not suitable for discrete categories and can become confusing if there are too many lines.
Bar Chart Categorical, Discreate Compares values across different categories or groups. Easy to understand for comparing categorical data. Less effective for data with too many categories.
Histogram Continuous Shows the frequency distribution of a single numerical variable. Easy to understand for comparing categorical data. Difficult to compare multiple groups becuase overlaying several histograms often results in cluttered and confusing visuals.
Boxplot Continuous Displays a five-number statistical summary (minimum, first quartile, median, third quartile, and maximum) and identifies outliers. Excellent for comparing distributions between groups. Individual data points are invisible; not suitable when exact values matter.
Scatter Plot Continuous, Numeric Shows the relationship or correlation between two numerical variables. Uses dots on the X and Y axes to indicates the direction of the relationship (positive, negative, or no correlation). Limited to few variables, typically shows only two variables clearly.
Pie Chart Categorical Displays proportions or parts of a whole. Effective when used to display 2-5 data recorded in percentage or proportion. Ineffective with many categories; too many slices can create confusion.

Chapter 4

Central Tendency

Central tendency is a single value that attempts to describe a set of data by identifying the central point within that dataset. These statistical tools offer a concise summary of complex information, making it easy to detect patterns, describe distributions, and lay the groundwork for more in-depth analysis. There are three primary measures.

Mean Median Mode
Definition The arithmetic average of all scores in the data. The middle value in a dataset that has been ordered from smallest to largest. The value that appears most frequently in a dataset. Data can have one mode (unimodal), two modes (bimodal), or more (multimodal).
Formula \(\bar{X} = \frac{\sum X_i}{n}\), where \(X_i\) is each data value and \(n\) is number of observation If the number of data point n is odd, the median is \(\frac{n+1}{2}\); if n is even, the median is the average of the two middle values
Type of Data Interval/Ratio; Numeric data with meaningful intervals Ordinal; Data in categories Nominal or Ordinal
Data Conditions Data without outliers and symmetrical Data with outliers or skewed distribution Categorical data and multimodal data
Visualization Histogram + Density Histogtam + Density Boxplot and Histogram + Density
Explanation When the data is symmetrical and no outliers, the mean gives a good overall representation and in this conditions, the mean , median, and mode are equal or nearly identical, reflecting perfect equilibrum in the data A skewed distributions indicates that the dataset is asymmetrical, meaning the data don’t fall evenly around the central point. The outliers causses this imbalance, pulling one side of the distribution’s tail farther than the other. In this conditions, the median is more reliable because it is not affected by those extremes. For categorical variables, the mode identifies the most frequent category and some datasets with multiple peaks, there can be more than one mode, indicating several dominant values of groups. Using Boxplot for categorical data are valuable for defecting differences in distributions among groups and if more than one mode, the histogram reveals the overall shape and multiple modes while the boxplot emphasizes the spread and skewness of the data

Chapter 5

Statistical Dispersion

Dispersion or variability is a measure that shows how far a value in the distribution of data spreads from its central point (Central Tendency). It is important because, two data sets can have identical means, but their characteristics are completely different. One data may cluster tightly in the middle, while the other is widespread. Understanding dispersion provides an idea of the consistency and reliability of the data. There are Main Parameters of Dispersion.

Range Variance Standard Deviation
Definition The difference between the maximum and minimum values Measures the average of the squared deviatins from the mean The square root of the variance
Characteristics Easiest to calculate, but very sensitive to outliers and does not describe the distribution of data among extreme values Quantifies how much each data point differs form the mean, capturing the degree of the spread inn dataset Measures the average distance of each data point form the mean an dis expressed in the same unit as teh original data
Formula \(Range = X_{max} - X_{min}\) For a population = \(\sigma^2 = \frac{\sum_{i=1}^{N}(X_i - \mu)^2}{N}\) For a Population = \(\sigma = \sqrt{\frac{\sum_{i=1}^{N}(X_i - \mu)^2}{N}}\)
For a sample, \(s^2 = \frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n-1}\) For a sample, \(s = \sqrt{\frac{\sum_{i=1}^{n}(X_i - \bar{X})^2}{n-1}}\)

This material concludes that in statistical analysis, the average can be misleading. Without looking at the size of the dispersion, we cannot assess the risk or consistency of a phenomenon. In a medical or scientific context, low dispersion is often more desirable because the results are more predictable for each individual.

Chapter 6

Essentials of Probability

The probability of an event is related to the relationship between events that are likely to occur. Probability is a numerical measure used to describe the chances or likelihood of an event occurring. Formally, probability is defined as the ratio between the number of desired events and the number of possible events as a whole. The probability is always between 0 and 1, where 0 indicates that the event must not have happened, while 1 indicates that the event must have happened. Probability is not just an intuition, but a systematic framework for measuring the likelihood of an event occurring. This understanding is important for making evidence-based decisions. There are Basis of Probability.

Definition Formula
Probability Probability can be define as the chance that an event will occur. \(P(A) = \frac { (number \ of \ favorable \ outcomes \ (n(A))}{ (total \ out \ comes \ (n(S))}\)
Complement Rule The probability that an event did not occur. \(P(A^c) = 1 - P(A)\)
Inpendents The occurrence of one event does not affect the chances of another event occurring \(P(A∩B)=P(A)×P(B)\)
Dependent The occurrence of one event affects the probability of another. \(P(A \ and \ B)=P(A)×P(B \ after \ A \ happened)\)
Union of Events Calculates the probability that at least one of several events occurs. \(P(A \ or \ B)=P(A)+P(B)−P(A \ and \ B)\)
Mutually Exclusive Two events that cannot occur at the same time \(P(A∩B)=0\)
Exhaustive Events A set of events that includes all the possibilities in the sample space \(P(A∪B)=1\)
Binomial Distribution the probability of getting a certain number of successes from a large number of experiments \(P(X) = \binom{n}{x} p^{x} (1 - p)^{n - x}\)
Binomial Experiment An experiment that have 2 outcomes, “success or failure

Chapter 7

Probability Distributions

A probability distribution is a mathematical function that assigns the probabilities of different outcomes to the possible values of a random variable. It provides a way of modeling the likelihood of each outcome in a random experiment. While a Frequency Distribution shows how often outcomes occur in a sample or dataset, a probability distribution assigns probabilities to outcomes abstractly, theoretically, regardless of any specific dataset. These probabilities represent the likelihood of each outcome occurring.

Main Topic Definition Key Concept Essential Formula/Rules
Continuous Random Variable Continuous random variable is a type of random variable that can take on an infinite number of possible values within a given range(e.g., time, weight). Probability is the Area Under the Curve for a range, not a single point (\(P(X=x)=0\)). Total area under PDF = 1. CDF is used to calculate cumulative probability.
Sampling Dsitribution A sampling distribution represents the probability distribution of a statistic (such as the mean or standard deviation) that is calculated from multiple samples of a population. Distinguishes between Population, Sample, and the Distribution of Means. Standard Error (SE): Measures the variability of sample means (\(SE = \sigma/\sqrt{n}\)).
Central Limit Theorem (CLT) Central Limit Theorem predicts the shape of a sampling distribution based on the sample size. If \(n \geq 30\), the sampling distribution will be Normal, regardless of population shape. Allows the use of Z or T distributions for large samples even if population is skewed.
Sample Proportion Probability distribution for qualitative data (success/failure). Estimating population percentages or ratios (\(\hat{p}\)). Normality Condition: \(np \geq 10\) and \(n(1-p) \geq 10\).

Chapter 8

Confidence Interval Methods

Confidence interval is a range of values used to estimate a population parameter (such as a mean) more reliably than a single point estimate. It is constructed by taking a sample mean and adding or subtracting a margin of error. Essentially, this method provides a way to quantify the uncertainty and precision of an estimate derived from sample data.

Comparison of Confidence Interval Methods

Z-Distribution T-Distribution Sample Size Proportion One-Sided
When to Use When population standard deviation (\(\sigma\)) is known; When sample size is large (\(n>30\)) When population standard deviation (\(\sigma\)) is unknown; When sample size is small (\(n<30\)) To ensure that the sample is large enough to provide accurate and reliable estimates To estimate the true population proportion based on sample data When research focuses on minimum or maximum performance, not a full range
Formula \(\bar{x} \pm z_{\alpha/2}\left(\frac{\sigma}{\sqrt{n}}\right)\) \(\bar{x} \,\pm\, t_{\alpha/2,\, df} \left( \frac{s}{\sqrt{n}} \right)\), where \(df = n-1\) \(n = \left( \frac{t_{\alpha/2} \times \sigma}{E} \right)^2\) \(\hat{p} \,\pm\, z_{\alpha/2} \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}\) \(\hat{p} \pm z_{1-\alpha} \times \sqrt{\frac{\hat{p}(1 - \hat{p})}{n}}\)

Critical Relationships in Confidence Intervals

Understanding how different factors affect confidence interval width is crucial for proper interpretation and sample size planning.

Critical Relationship Effect on Confidence Interval
Sample size (\(n\)) \(\uparrow\) As the sample size increases, the Standard Error decreases, making the interval narrower (more precise)
Confidence Level \(\uparrow\) If moving from \(90\%\) to \(99\%\), the interval becomes wider (more conservative)
Standard Deviation \(\uparrow\) More "noise" or variation in the data results in a wider interval (less precise)
Key Insight:
There is always a trade-off between precision (narrow interval) and confidence (wider interval). Increasing sample size is the most effective way to improve both precision and confidence.

Chapter 9

Basic Concepts of Statistical Inference

Statistical inference is the process of drawing conclusions about a population based on information obtained from a sample. This allows researchers and analysts to make generalizations, predictions, and decisions under conditions of uncertainty. Highlights its three main components: Statistical hypothesis, hypothesis testing methods, and decision-making.

Hypothesis Fundamentals

Term Symbol Definition
Null Hypothesis \(H_0\) Represents the assumption that there is no effect, no difference, or no relationship in the population
Alternative Hypothesis \(H_1\) or \(H_a\) Statement we want to prove; indicates a difference, relationship, or effect
P-value \(p\) Probability that observed results happened by chance if \(H_0\) is true
Significance Level \(\alpha\) "Threshold" for evidence (usually 0.05)

Hypothesis Testing Methods

Hypothesis testing methods are used to determine whether the evidence form a sample is strong enough to reject the null hypothesis in favor of the alternative hypothesis. The choice of test depends on the type of data, sample size, and population characteristic.

Comparison of Statistical Tests

Z-Test T-Test Chi-Square Test
Function: Compare means with known \(\sigma\) or large \(n\)
Formula: \(z = \frac{\bar{x} - \mu}{\sigma/\sqrt{n}}\)
Example: Test if new teaching method changes scores
Function: Compare means with unknown \(\sigma\) and small \(n\)
Formula: \(t = \frac{\bar{x} - \mu}{s/\sqrt{n}}\)
Example: Test if average score differs from 80
Function: Test categorical data distributions
Formula: \(\chi^2 = \sum\frac{(O-E)^2}{E}\)
Example: Test if gender is independent of learning preference

Statistical Decision Making

Error Type Definition Probability
Type I Error Rejecting \(H_0\) when it's true (False Positive) \(\alpha\), commonly 0.05
Type II Error Failing to reject \(H_0\) when it's false (False Negative) \(\beta\); power \(=1-\beta\)
Decision Rules Based on P-value:
If P-value < α
→ Reject H₀
(Statistically Significant)
If P-value ≥ α
→ Fail to reject H₀
(Not Significant)

Chapter 10

Nonparametric Methods

Nonparametric statistics are statistical methods that do not require the assumption of a normal distribution in population data. Nonparametric methods serve as alternatives when the basic assumptions of parametric statistics are not met. Their primary role is to provide flexibility in analyzing data with a low measurement scale (ordinal/nominal) or data with extreme outliers. These methods serve as alternatives when parametric statistics cannot be used, such as for categorical data.

Nonparametric Tests Parametric Equivalents Main Uses Advantages Disadvantages
Sign Test One-sample t-test Tests the difference in medians based on the sign (+ or -). Fewer assumptions; extremely robust to outliers Very low statistical power
Wilcoxon Signed-Rank Paired sample t-test Compares two paired samples (before vs. after). More powerful than sign test Requires symmetric distribution of difference
Mann-Whitney U Independent t-test Compares two independent groups (two different populations). Suitable for ordinal data Test distributional shift rather than mean difference
Kruskal-Wallis One-Way ANOVA Compares more than two independent groups. Extend Mann-Whitney to multiple groups Doesn’t indicate which groups differ without post-hoc tests
Friedman Test Repeated Measures ANOVA Compares more than two paired groups. Suitable for repeated measures Less powerful than parametric repeated measures ANOVA
Chi-Square Test - Tests the relationship (independence) between two categorical variables. Ideal for categorical data sensitive to small expected frequencies

Dataset

All About Basic Visualizations

## Column

Pie-Chart

Pie Chart Interpretation

This visualization shows the composition of supermarket sales transactions based on six product lines. The "Food and beverages" category is likely to dominate with the largest share of transactions, reflecting its nature as a frequently purchased staple, followed by routine categories such as "Fashion accessories" and "Home and lifestyle." The donut-shaped pie chart was chosen because it is visually effective in showing the proportion of each category to the total transaction and makes it easier to identify the product categories that contribute most to sales volume. In conclusion, this distribution pattern suggests that inventory strategies and store layout should continue to prioritize product categories with high transaction turnover while maintaining availability in other categories to meet all customer needs.

Bar-Chart

Bar Chart Interpretation

This visualization compares the performance of three supermarket branches (A, B, and C) based on their total revenue. The branch with the highest total sales is displayed as the longest bar on the left, indicating its greatest contribution to the company`s revenue. Bar charts were chosen because they are highly effective for comparing numerical values across different categories, allowing for immediate identification of top-performing branches and those requiring improvement. In conclusion, the visible variations in bar height reflect significant differences in sales performance between locations, which can be caused by factors such as geographic location, branch size, or the effectiveness of local marketing strategies.

Line-Chart

Line Chart Interpretation

This visualization displays the fluctuations in total daily sales from all three supermarket chains combined over a three-month period. The line on the chart will show the pattern of revenue fluctuations over time, with peaks that might indicate weekends, holidays, or specific promotional periods. Line charts were chosen because they are the best method for uncovering temporal patterns, long-term trends, and seasonal cycles in time-series data, and they also facilitate the identification of days with the highest and lowest sales. Ultimately, the observed fluctuation patterns can reveal weekly or monthly business rhythms, providing valuable insights for inventory planning, staff scheduling, and marketing campaign timing.

Central Tendency

Central Tendency Interpretation

Histogram with Density Curve of Transaction Value Distribution: This visualization reveals the core revenue structure of the supermarket. The pronounced, narrow peak in the $10-$50 range is the engine of customer traffic, representing high-frequency, low-margin purchases like daily essentials and impulse buys. The long right tail extending to $200+ represents the engine of significant revenue, coming from a smaller cohort of customers making large weekly shops, bulk purchases, or buying premium items.

Key Business Insight & Strategy Implications: The supermarket successfully serves two distinct customer missions. To maximize profitability, strategies should differ: for the high-traffic, low-value segment, focus on optimizing checkout speed, promoting high-margin impulse items at point-of-sale, and volume-driven loyalty programs. For the high-value segment in the distribution`s tail, strategies should include targeted promotions for bulk items, personalized offers, and enhanced customer service to increase their basket size and retention.

Statistical Dispersion

Statistical Dispersion

This integrated analysis combines the visual pattern from the violin plot with precise statistical metrics from the summary table to evaluate branch performance holistically. The table provides the exact numerical evidence for observations made from the plot: a branch with a high Variance and SD in the table will correspond to the wider, shorter violin shape, indicating inconsistent customer spending. Conversely, a branch with low variance will have a tall, narrow violin, signifying stable transaction values.

The most critical metric for stability is the coefficient of variation (CV = SD/Mean), which the table currently lacks. A branch with a high mean but also a high SD (resulting in a high CV) has volatile revenue despite high sales, which is a different risk profile than a branch with lower mean and lower SD. Comparing the Range in the table with the outliers in the violin plot also shows if extreme values are skewing the statistics.

Probability Distributions

Probability Distributions

This visualization stacks the smoothed transaction value distributions of all three branches (A, B, C) into a single view, using semi-transparent filled areas. The shared vertical lines for the global Mean and Median across all data provide a crucial common reference point. This overlay method is chosen because it allows for direct, precise visual comparison of the central tendency, spread, and shape of each branch`s sales distribution simultaneously, highlighting which branch`s "peak" aligns with the overall average and which has heavier tails.

The branch whose density curve peaks closest to the global mean line is the one whose most common transaction size aligns with the company`s overall average, suggesting it represents the "typical" store performance. Branches with density curves shifted to the left of the mean have more frequent low-value transactions, while those shifted to the right see more frequent high-value sales. The relative width and height of the peaks show which branch has more consistent sales (taller, narrower peak) versus more variable spending (shorter, wider spread). This directly informs resource allocation, identifying which branches might need strategies to increase average transaction size versus those that excel at it.

Confidence Interval

## Column

Introduction

In the modern retail industry, the level of competition between supermarkets is increasing as consumers increase their choices. This condition demands that companies not only focus on increasing sales, but also on customer satisfaction as a key indicator of service success and consumer loyalty. Because satisfied customers tend to make repeat purchases, recommend to others and can create a positive image for the company.

Supermarkets as providers of daily necessities generally set minimum customer satisfaction standards as a benchmark for service quality. However, the standard needs to be evaluated periodically using empirical data to ensure that the perceived level of customer satisfaction actually meets or even exceeds the set targets. For example, supermarkets often implement membership programs with the aim of increasing customer loyalty. This program is expected to be able to provide a better shopping experience for customers than non-member customers.

Based on these conditions, it is important to find out whether there is a difference in the level of satisfaction between member and non-member customers. This study aims to analyze the level of customer satisfaction through several statistical approaches, namely confidence interval to estimate average customer satisfaction, statistical inference to test whether the level of satisfaction has exceeded the company’s standards, and A nonparametric method to compare customer satisfaction between member and non-member customers without relying on the assumption of data normality.

Study Case

Confidence Interval

At this stage, the management wants to estimate the average value range of customer satisfaction levels in supermarkets whether they are at the company’s satisfaction standard, which is 7 based on sample data to evaluate whether the average customer satisfaction has met or exceeded the company’s minimum standards using a 95% confidence interval.

Formula

Sample \[9.1, \ 9.6, \ 7.4, \ 8.4, \ 5.3, \ ..., \ 6.6\]

Sample Size \[n=1000\]

Sample Mean \[\begin{array}{rl} \bar{x} &= \frac{1}{n} \sum_{i=1}^{n} x_i \\[2mm] &= \frac{1}{1000} (x_1 + x_2 + \cdots + x_{1000}) \\[1mm] &= \frac{1}{1000} (9.1+9.6+7.4+8.4+5.3+...+6.6) \\[1mm] &= 6.973 \end{array}\]

Sample Standard Deviation \[\begin{array}{rl} s & = \sqrt{\frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1}} \\[2mm] & = \sqrt{\frac{(9.1-6.973)^2 + (9.6-6.973)^2 + \dots + (6.6-6.973)^2}{1000-1}} \\[2mm] & \approx 1.7190 \end{array}\]

Study Case

Formula

Degrees of Freedom \[df= n-1=999\]

Standard Error \[\begin{array}{rl} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{1.7190}{\sqrt{1000}} \\ & \approx 0.0543 \end{array}\]

Margin of Error \[\begin{array}{rl} ME & = t_{\alpha/2,\,df} \times SE \\ \end{array}\]

Confidence Interval \[\begin{array}{rl} CI_{95\%} & = \bar{x} \pm ME \\ \end{array}\]

Summary Using code R

95% Confidence Interval of Average Supermarket Customer Rating
Statistik Nilai
Number of Samples (n) 1000.0000
Mean Rating 6.9730
Standard Deviation 1.7190
Standard Error 0.0543
Lower Bound of 95% CI 6.8660
Upper Bound of 95% CI 7.0790
  • Statistical interpretations

Statistically, a 95% confidence interval indicates that with a 95% confidence level, the average customer satisfaction of the population is estimated to be between 6.87 and 7.08. This means that if the sampling process is repeated many times, about 95% of the interval formed will contain the average value of the actual population’s customer satisfaction. Because the company’s standard satisfaction value (μ = 7) is within the confidence interval, there is no statistically strong enough evidence to state that average customer satisfaction is significantly higher or lower than that standard. In other words, average customer satisfaction does not differ significantly from the company’s standard value at a 95% confidence level.

  • Interpretation in busines

From a business perspective, the results of this confidence interval show that the level of customer satisfaction is right around the company’s set targets. However, because the lower bound of the trust interval is below the standard value (6.87 < 7), there is a risk that customer satisfaction has not actually been fully consistent with the company’s targets.

Statistical Inference

## Column

Case

Once the company knows the average range of customer satisfaction statistically, at this stage the company needs to formally test whether the average customer satisfaction level in supermarkets is significantly higher than the minimum standard of company satisfaction. The company also wanted to determine whether the observed differences were statistically significant or just a coincidence of the sample.

Null Hypothesis (H₀)
\[\mu = 7\] Average satisfaction equals company standard

Alternative Hypothesis (H₁)
\[\mu > 7\] Average satisfaction exceeds company standard

Significance Level
\[\alpha = 0.05\]
Sample Mean

\[\bar{x} = 6.973\]

Sample SD

\[s \approx 1.7190\]

Standard Error

\[SE \approx 0.0543\]

T-statistic Calculation

\[\begin{aligned} t &= \frac{\bar{x} - \mu_0}{SE} \\ &= \frac{6.973 - 7}{0.0543} &= \frac{-0.027}{0.0543} &\approx -0.502 \end{aligned}\]

Interpretation

Since \(t = -0.502\) (negative value), the sample mean is below the hypothesized value of 7.

Summary

One-Sample t-Test Results of Customer Satisfaction Level
Komponen Hasil
Null Hypothesis (H₀) μ = 7
Alternative Hypothesis (H₁) μ > 7
t-value -0.502
Degrees of Freedom (df) 999
p-value 0.6922
Decision Failed to Reject H₀
Interpretation

Based on the results of the one-sample t-test, a p-value greater than the significance level of 0.05 was obtained, which was 0.6922. Therefore, the null hypothesis cannot be rejected.This means that there is not enough statistical evidence to state that the average customer satisfaction rate is significantly higher than the minimum standard of company satisfaction of 7.This result is consistent with the previous confidence interval, where the company’s standard value was still within the confidence interval.

Nonparametric Methods

## Column

Case

Next, the company wanted to test whether there was a difference in customer satisfaction levels between member and non-member customers without relying on normal distribution assumptions. The method used is the Mann–Whitney U Test.

Null Hypothesis (H₀)

\[H_0: \text{Member Rating Distribution} = \text{Non-Member Rating Distribution}\]

No difference in rating distributions

Alternative Hypothesis (H₁)

\[H_1: \text{Distribution Ratings differ between Members and Non-Members}\]

Rating distributions are different

  • Step 1: Combine and Rank Data

2

  • Step 2: Rank Sum and Compute the U Statistic

The formula for U-statistic is: \[U_1 = n_1 n_2 + \frac{n_1 (n_1 + 1)}{2} - R_1\] \[U_2 = n_1 n_2 + \frac{n_2 (n_2 + 1)}{2} - R_2\]

Perhitungan Mann–Whitney U Statistic
Kelompok n Rank_Sum U_Statistic
Member 501 248055 122304
Normal 499 252445 127695
Minimum NA NA 122304

3

  • Step 3: Test Statistic
Perhitungan Z-Statistic (Normal Approximation Mann–Whitney)
Statistik Nilai
U minimum 122304.000
Mean U 124999.500
SD U 4566.627
Z Statistic -0.590

summary

Ringkasan Uji Mann–Whitney: Statistik U, Z, dan p-value
Komponen Nilai
U minimum 122304.0000
Z Statistic -0.5903
p-value 0.5550
Alpha (α) 0.0500

Interpretation :
Since the p-value > α(0.05), we have failed to reject H0. This means that there is not enough evidence to state that the distribution of ratings between Member and Non-Member customers differs significantly. This means that according to the Mann–Whitney U Test, both groups are likely to come from the same population in terms of location/distribution of Ratings.Thus, statistically membership programs have not been proven to affect customer satisfaction levels.

Business Interpretation :
From a business point of view, the results of this test show that the level of customer satisfaction according to the rating does not differ statistically between Member and Non-Member customers. In other words; The member program may not result in different satisfaction increases, according to customer data and companies should evaluate other factors that differentiate the Member vs Non-Member experience because they both appear to have similar satisfaction distributions.

Nonparametric Methods

FINAL CONCLUSION

This study evaluates supermarket membership programs’ effect on customer satisfaction (measured by Ratings) using three statistical approaches to compare Member vs. Non-Member satisfaction.

CI
Confidence Interval

No significant difference found in average ratings

HT
Hypothesis Test

Fail to reject H0 - no difference

NP
Nonparametric

Consistent results without assumptions

CONCLUSION & RECOMMENDATION

All three statistical methods consistently show no significant difference in customer satisfaction between members and non-members. The membership program has not proven effective in enhancing satisfaction. The supermarket should redirect focus to improving service quality, transaction speed, and overall customer experience, as these factors likely have greater impact on satisfaction than membership status alone.