This is a course catelogue of R programming and Data Science course offer by Iskulghar. This webpage is created by R programming which only consists of 25% of the course! So imagine how much will you learn from this single course. Visit: https://www.facebook.com/iskulghar For more information.

Data Frame (Example)

Dataset summary

       ID        Name                Age         Score   
 Min.   :1   Length:5           Min.   :22   Min.   :78  
 1st Qu.:2   Class :character   1st Qu.:25   1st Qu.:81  
 Median :3   Mode  :character   Median :28   Median :85  
 Mean   :3                      Mean   :28   Mean   :85  
 3rd Qu.:4                      3rd Qu.:30   3rd Qu.:89  
 Max.   :5                      Max.   :35   Max.   :92  

Basic plot of the dataset

Real world dataset - IRIS

Summary of iris dataset

  Sepal.Length    Sepal.Width     Petal.Length    Petal.Width          Species  
 Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100   setosa    :50  
 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300   versicolor:50  
 Median :5.800   Median :3.000   Median :4.350   Median :1.300   virginica :50  
 Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199                  
 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800                  
 Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500                  

Histrogram - Distribution

Scatter plot

Box plot

Violine plot

Correlation Matrix

             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

Heat map of correlation matrix

Heat map of correlation matrix (version 2)

Heat map of correlation matrix (version 3)

Pari plot

Regression

Linear Regression

Linear Regression Analaysis


Call:
lm(formula = Sepal.Length ~ Petal.Length, data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-1.24675 -0.29657 -0.01515  0.27676  1.00269 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   4.30660    0.07839   54.94   <2e-16 ***
Petal.Length  0.40892    0.01889   21.65   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.4071 on 148 degrees of freedom
Multiple R-squared:   0.76, Adjusted R-squared:  0.7583 
F-statistic: 468.6 on 1 and 148 DF,  p-value: < 2.2e-16

Polynomial Regression

Polynomial Regression analysis


Call:
lm(formula = y ~ x + I(x^2) + I(x^3))

Residuals:
     Min       1Q   Median       3Q      Max 
-1.06434 -0.24523  0.00707  0.19869  0.92755 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.64817    0.45873  10.133   <2e-16 ***
x            0.27811    0.48046   0.579    0.564    
I(x^2)      -0.04428    0.13454  -0.329    0.743    
I(x^3)       0.01055    0.01123   0.939    0.349    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.364 on 146 degrees of freedom
Multiple R-squared:  0.8106,    Adjusted R-squared:  0.8067 
F-statistic: 208.3 on 3 and 146 DF,  p-value: < 2.2e-16

Multivariate Polynomial Regression


Call:
lm(formula = Sepal.Length ~ poly(Sepal.Width, 2) + poly(Petal.Length, 
    2) + poly(Petal.Width, 2), data = iris)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.85830 -0.21065  0.00061  0.19278  0.77325 

Coefficients:
                       Estimate Std. Error t value Pr(>|t|)    
(Intercept)             5.84333    0.02509 232.877  < 2e-16 ***
poly(Sepal.Width, 2)1   2.99803    0.40359   7.428 9.12e-12 ***
poly(Sepal.Width, 2)2   0.34547    0.31951   1.081  0.28141    
poly(Petal.Length, 2)1 12.74168    1.78665   7.132 4.54e-11 ***
poly(Petal.Length, 2)2  1.59442    0.58991   2.703  0.00771 ** 
poly(Petal.Width, 2)1  -2.82015    1.72498  -1.635  0.10427    
poly(Petal.Width, 2)2  -0.95176    0.67450  -1.411  0.16040    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.3073 on 143 degrees of freedom
Multiple R-squared:  0.8678,    Adjusted R-squared:  0.8623 
F-statistic: 156.5 on 6 and 143 DF,  p-value: < 2.2e-16

Clustering

Clustering result analysis

K-means clustering with 3 clusters of sizes 62, 38, 50

Cluster means:
  Sepal.Length Sepal.Width Petal.Length Petal.Width
1     5.901613    2.748387     4.393548    1.433871
2     6.850000    3.073684     5.742105    2.071053
3     5.006000    3.428000     1.462000    0.246000

Clustering vector:
  [1] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 1 1 2 1 1 1
 [57] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 2 2 2 2 2
[113] 2 1 1 2 2 2 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 1 2 2 2 2 1 2 2 2 1 2 2 2 1 2 2 1

Within cluster sum of squares by cluster:
[1] 39.82097 23.87947 15.15100
 (between_SS / total_SS =  88.4 %)

Available components:

[1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss" "betweenss"    "size"        
[8] "iter"         "ifault"      

Classification

SVM Model

[1] "Confusion Matrix:"
Confusion Matrix and Statistics

            Reference
Prediction   setosa versicolor virginica
  setosa         10          0         0
  versicolor      0         10         1
  virginica       0          0         9

Overall Statistics
                                          
               Accuracy : 0.9667          
                 95% CI : (0.8278, 0.9992)
    No Information Rate : 0.3333          
    P-Value [Acc > NIR] : 2.963e-13       
                                          
                  Kappa : 0.95            
                                          
 Mcnemar's Test P-Value : NA              

Statistics by Class:

                     Class: setosa Class: versicolor Class: virginica
Sensitivity                 1.0000            1.0000           0.9000
Specificity                 1.0000            0.9500           1.0000
Pos Pred Value              1.0000            0.9091           1.0000
Neg Pred Value              1.0000            1.0000           0.9524
Prevalence                  0.3333            0.3333           0.3333
Detection Rate              0.3333            0.3333           0.3000
Detection Prevalence        0.3333            0.3667           0.3000
Balanced Accuracy           1.0000            0.9750           0.9500
[1] "Accuracy: 0.966666666666667"

SVM Classification confusion matrix

Statistical Analysis

Data Distribution

T-test


    Welch Two Sample t-test

data:  setosa and virginica
t = -15.386, df = 76.516, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.78676 -1.37724
sample estimates:
mean of x mean of y 
    5.006     6.588 

ANOVA


    Welch Two Sample t-test

data:  setosa and virginica
t = -15.386, df = 76.516, p-value < 2.2e-16
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -1.78676 -1.37724
sample estimates:
mean of x mean of y 
    5.006     6.588 

Tukey’s posthoc test

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = Sepal.Length ~ Species, data = iris)

$Species
                      diff       lwr       upr p adj
versicolor-setosa    0.930 0.6862273 1.1737727     0
virginica-setosa     1.582 1.3382273 1.8257727     0
virginica-versicolor 0.652 0.4082273 0.8957727     0

Chi-square test

Warning: Chi-squared approximation may be incorrect

    Pearson's Chi-squared test

data:  table(iris$Species, cut(iris$Petal.Length, breaks = c(1, 2, 3,     4, 5)))
X-squared = 114.49, df = 6, p-value < 2.2e-16

Principal component analysis

Standard deviations (1, .., p=4):
[1] 1.7083611 0.9560494 0.3830886 0.1439265

Rotation (n x k) = (4 x 4):
                    PC1         PC2        PC3        PC4
sepal.length  0.5210659 -0.37741762  0.7195664  0.2612863
sepal.width  -0.2693474 -0.92329566 -0.2443818 -0.1235096
petal.length  0.5804131 -0.02449161 -0.1421264 -0.8014492
petal.width   0.5648565 -0.06694199 -0.6342727  0.5235971
Importance of components:
                          PC1    PC2     PC3     PC4
Standard deviation     1.7084 0.9560 0.38309 0.14393
Proportion of Variance 0.7296 0.2285 0.03669 0.00518
Cumulative Proportion  0.7296 0.9581 0.99482 1.00000

PCA Dimension Contribution

PCA Dimension Contribution (Heat map)

PCA Dimension Contribution (Vector map)

PCA 2D plot (scatter with marked data point)

PCA Clustering

Interactive plots

Bonus!

3D sine curve

Support Vector Regression Surface Curve

Our next course

Certificate

We provide certificate upon course completion
We provide certificate upon course completion
