Introduction

This report analyzes the red wine dataset from the UCI Machine Learning Repository. The goal is to explore data quality, outliers, summaries, distributions, and skewness.

1. Sample Size

The Red Wine dataset has 1599 samples with 11 physicochemical variables and 1 output variable (quality).

2. Data Quality Concerns

3. Outliers

4. Summary Statisctics

Summary Statistics for Red Wine
Variable Mean SD Min Max Median
fixed.acidity 8.3196373 1.7410963 4.60000 15.90000 7.90000
volatile.acidity 0.5278205 0.1790597 0.12000 1.58000 0.52000
citric.acid 0.2709756 0.1948011 0.00000 1.00000 0.26000
residual.sugar 2.5388055 1.4099281 0.90000 15.50000 2.20000
chlorides 0.0874665 0.0470653 0.01200 0.61100 0.07900
free.sulfur.dioxide 15.8749218 10.4601570 1.00000 72.00000 14.00000
total.sulfur.dioxide 46.4677924 32.8953245 6.00000 289.00000 38.00000
density 0.9967467 0.0018873 0.99007 1.00369 0.99675
pH 3.3111132 0.1543865 2.74000 4.01000 3.31000
sulphates 0.6581488 0.1695070 0.33000 2.00000 0.62000
alcohol 10.4229831 1.0656676 8.40000 14.90000 10.20000
quality 5.6360225 0.8075694 3.00000 8.00000 6.00000

5. Distribution and Skewness