#Load all the needed Libraries for your analysis

Load and view characteristics of the Data

## 'data.frame':    10886 obs. of  12 variables:
##  $ datetime  : chr  "2011-01-01 00:00:00" "2011-01-01 01:00:00" "2011-01-01 02:00:00" "2011-01-01 03:00:00" ...
##  $ season    : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ holiday   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ workingday: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ weather   : int  1 1 1 1 1 2 1 1 1 1 ...
##  $ temp      : num  9.84 9.02 9.02 9.84 9.84 ...
##  $ atemp     : num  14.4 13.6 13.6 14.4 14.4 ...
##  $ humidity  : int  81 80 80 75 75 75 80 86 75 76 ...
##  $ windspeed : num  0 0 0 0 0 ...
##  $ casual    : int  3 8 5 3 0 0 2 1 1 8 ...
##  $ registered: int  13 32 27 10 1 1 0 2 7 6 ...
##  $ count     : int  16 40 32 13 1 1 2 3 8 14 ...

Correlation coefficients of all numerical variables

## # A tibble: 110 × 3
##    var1       var2   coef_corr
##    <fct>      <fct>      <dbl>
##  1 holiday    season   0.0294 
##  2 workingday season  -0.00813
##  3 weather    season   0.00888
##  4 temp       season   0.259  
##  5 atemp      season   0.265  
##  6 humidity   season   0.191  
##  7 windspeed  season  -0.147  
##  8 casual     season   0.0968 
##  9 registered season   0.164  
## 10 count      season   0.163  
## # … with 100 more rows

#Diagnoses the outliers of the numeric (continuous and discrete)

## # A tibble: 55 × 3
##    var1       var2   coef_corr
##    <fct>      <fct>      <dbl>
##  1 holiday    season   0.0294 
##  2 workingday season  -0.00813
##  3 weather    season   0.00888
##  4 temp       season   0.259  
##  5 atemp      season   0.265  
##  6 humidity   season   0.191  
##  7 windspeed  season  -0.147  
##  8 casual     season   0.0968 
##  9 registered season   0.164  
## 10 count      season   0.163  
## # … with 45 more rows

#Univariate #The following is a list of the EDA functions included in the dlookr package.

#Provides descriptive statistics for numerical data.

## # A tibble: 11 × 26
##    described_…¹     n    na    mean      sd se_mean   IQR skewness kurto…²   p00
##    <chr>        <int> <int>   <dbl>   <dbl>   <dbl> <dbl>    <dbl>   <dbl> <dbl>
##  1 season       10886     0 2.51e+0   1.12  0.0107    2   -0.00708  -1.36   1   
##  2 holiday      10886     0 2.86e-2   0.167 0.00160   0    5.66     30.0    0   
##  3 workingday   10886     0 6.81e-1   0.466 0.00447   1   -0.776    -1.40   0   
##  4 weather      10886     0 1.42e+0   0.634 0.00607   1    1.24      0.396  1   
##  5 temp         10886     0 2.02e+1   7.79  0.0747   12.3  0.00369  -0.915  0.82
##  6 atemp        10886     0 2.37e+1   8.47  0.0812   14.4 -0.103    -0.850  0.76
##  7 humidity     10886     0 6.19e+1  19.2   0.184    30   -0.0863   -0.760  0   
##  8 windspeed    10886     0 1.28e+1   8.16  0.0783   10.0  0.589     0.630  0   
##  9 casual       10886     0 3.60e+1  50.0   0.479    45    2.50      7.55   0   
## 10 registered   10886     0 1.56e+2 151.    1.45    186    1.52      2.63   0   
## 11 count        10886     0 1.92e+2 181.    1.74    242    1.24      1.30   1   
## # … with 16 more variables: p01 <dbl>, p05 <dbl>, p10 <dbl>, p20 <dbl>,
## #   p25 <dbl>, p30 <dbl>, p40 <dbl>, p50 <dbl>, p60 <dbl>, p70 <dbl>,
## #   p75 <dbl>, p80 <dbl>, p90 <dbl>, p95 <dbl>, p99 <dbl>, p100 <dbl>, and
## #   abbreviated variable names ¹​described_variables, ²​kurtosis

#Perform normalization and visualization of numerical data.

## # A tibble: 11 × 4
##    vars       statistic  p_value sample
##    <chr>          <dbl>    <dbl>  <dbl>
##  1 season         0.857 2.32e-55   5000
##  2 holiday        0.160 8.97e-92   5000
##  3 workingday     0.590 6.16e-76   5000
##  4 weather        0.659 4.15e-72   5000
##  5 temp           0.982 5.74e-25   5000
##  6 atemp          0.982 6.88e-25   5000
##  7 humidity       0.982 1.89e-24   5000
##  8 windspeed      0.956 2.34e-36   5000
##  9 casual         0.702 2.43e-69   5000
## 10 registered     0.855 1.05e-55   5000
## 11 count          0.878 1.10e-52   5000

#bivariate #Select the variable to compute

#Diagnoses the outliers of the numeric (continuous and discrete)

##     variables outliers_cnt outliers_ratio outliers_mean   with_mean
## 1      season            0    0.000000000           NaN   2.5066140
## 2     holiday          311    2.856880397       1.00000   0.0285688
## 3  workingday            0    0.000000000           NaN   0.6808745
## 4     weather            1    0.009186111       4.00000   1.4184273
## 5        temp            0    0.000000000           NaN  20.2308598
## 6       atemp            0    0.000000000           NaN  23.6550841
## 7    humidity           22    0.202094433       0.00000  61.8864597
## 8   windspeed          227    2.085247106      36.58932  12.7993954
## 9      casual          749    6.880396840     181.91856  36.0219548
## 10 registered          423    3.885724784     631.32624 155.5521771
## 11      count          300    2.755833180     751.11667 191.5741319
##    without_mean
## 1     2.5066140
## 2     0.0000000
## 3     0.6808745
## 4     1.4181902
## 5    20.2308598
## 6    23.6550841
## 7    62.0117820
## 8    12.2927519
## 9    25.2419848
## 10  136.3174998
## 11  175.7170792

#Calculate the correlation coefficient between two numerical data and provide visualization.

## # A tibble: 110 × 3
##    var1       var2   coef_corr
##    <fct>      <fct>      <dbl>
##  1 holiday    season   0.0294 
##  2 workingday season  -0.00813
##  3 weather    season   0.00888
##  4 temp       season   0.259  
##  5 atemp      season   0.265  
##  6 humidity   season   0.191  
##  7 windspeed  season  -0.147  
##  8 casual     season   0.0968 
##  9 registered season   0.164  
## 10 count      season   0.163  
## # … with 100 more rows

#Defines the target variable

#Describes the relationship with the variables of interest corresponding to the target variable.

## 
## Call:
## lm(formula = formula_str, data = data)
## 
## Coefficients:
## (Intercept)    windspeed  
##     2.76405     -0.02011

#Visualizes the relationship to the variable of interest corresponding to the destination variable.

#dlookr provides two automated EDA reports: