Introduction
This report analyzes the red wine dataset from the UCI Machine
Learning Repository. The goal is to explore data quality, outliers,
summaries, distributions, and skewness.
1. Sample Size
The Red Wine dataset has 1599 samples with 11 physicochemical
variables and 1 output variable (quality).
2. Data Quality Concerns
- Dataset does not have any missing values.
- All variables are numeric
3. Outliers
- Variables like residual sugar, total sulfur dioxide have extreme
values compared to the rest of the data.
- Other variables like alcohol, pH, citric acid have fewer extreme
points.


4. Summary Statisctics
- The statistics I have presented below for all variables are
mean, standard deviation, min, max and median, which
summarize the central tendency, spread and range of the data.
Summary Statistics for Red Wine
fixed.acidity |
8.3196373 |
1.7410963 |
4.60000 |
15.90000 |
7.90000 |
volatile.acidity |
0.5278205 |
0.1790597 |
0.12000 |
1.58000 |
0.52000 |
citric.acid |
0.2709756 |
0.1948011 |
0.00000 |
1.00000 |
0.26000 |
residual.sugar |
2.5388055 |
1.4099281 |
0.90000 |
15.50000 |
2.20000 |
chlorides |
0.0874665 |
0.0470653 |
0.01200 |
0.61100 |
0.07900 |
free.sulfur.dioxide |
15.8749218 |
10.4601570 |
1.00000 |
72.00000 |
14.00000 |
total.sulfur.dioxide |
46.4677924 |
32.8953245 |
6.00000 |
289.00000 |
38.00000 |
density |
0.9967467 |
0.0018873 |
0.99007 |
1.00369 |
0.99675 |
pH |
3.3111132 |
0.1543865 |
2.74000 |
4.01000 |
3.31000 |
sulphates |
0.6581488 |
0.1695070 |
0.33000 |
2.00000 |
0.62000 |
alcohol |
10.4229831 |
1.0656676 |
8.40000 |
14.90000 |
10.20000 |
quality |
5.6360225 |
0.8075694 |
3.00000 |
8.00000 |
6.00000 |
5. Distribution and Skewness
- Fixed Acidity: Most red wines have acidity around
7–8 with a few higher values. The distribution is slightly
right-skewed.
- Volatile Acidity: Most values are around 0.3–0.6
with a few higher values, indicating a right-skewed distribution
- Citric Acid: Concentrated near 0, slightly
right-skewed.
- Residual Sugar: Most wines have low sugar (1–3
g/L), but some have much more, creating a long tail on the higher
end.
- Chlorides: Mostly low values, with a few high
points. It is right-skewed distribution.
- Free Sulfur Dioxide: Most wines have low values,
but some wines have higher amounts. Right-skewed distribution.
- Total Sulfur Dioxide: Similar to free sulfur
dioxide but with a high range, distribution is right-skewed.
- Density: Most values are close to 0.995–0.998,
distribution is even.
- pH: Most wines have pH around 3.2–3.5, distribution
is almost even.
- Sulphates: Most wines have 0.5–0.7, with a few
higher values.
- Alcohol: Most wines have 9–11% alcohol,
distribution is fairly even with a few higher values.
- Quality: Most wines are rated 5 or 6, distribution
is slightly skewed toward lower ratings.

