library(datasets)

621 - Blog 1 - Statistical terminologies

As part of the assignments for blogs in 621, I would like to start with basic elements that are needed in statistical world and go onto talk about different topics that address major topics.

So letโ€™s begin with basic definitions of the foundations in statistical language as well as modelling techniques.

Statistical basic terms

Mean - A mean value refers to the average of the list of values

data("iris")
mean(iris$Sepal.Length)
## [1] 5.843333

Variance - A variance is used to explain how far the numbers are spread out from their average or mean. This is very critical in statistics to define the data points. var(df$value)

var(iris$Sepal.Length)
## [1] 0.6856935

Standard deviation - A standard deviation is simply a square root of the variance, used to explain the value of difference between the datapoint and the mean of datapoints.

sd(iris$Sepal.Length)
## [1] 0.8280661

zscore - A zscore parameter is expressed in terms of standard deviations i.e., by how many standard deviations is a value away from mean.

p value - A p value is used to explain the probability of an occurence of the test result by chance i.e., due to null hypothesis. Usually a p value of .05 is used as default in most of the statistical models.

Modelling basic terms

RMSE - RMSE (Root Mean Squared Error) is used to calculate the deviance of the projection made by the model against the actual data points.

Adjusted RMSE - Adjusted RMSE is similar to RMSE, but we penalize the model for adding high number of parameters into the model. Effectively, Adjusted RMSE is a better metric than a simple RMSE.

Null deviance - A null deviance is a parameter metric provided usually by glm or classification models, so tell us what is the deviance in model if we just go with the mean of all the values.

Residual deviance - A metric that is provided in glm or classification models, that explains us how our model is predicting the actual values. Residual deviance should always be lower than null deviance.

Bias- A bias is project in modelling as to explain how close it the model able to project the results for training data. Its a metric to show if our model is able to learn from the training dataset or not.

Variance A variance in modelling explains how close is the model able to project the results during test i.e., on unknown data. It shows is our model is generalizable or not.