Logical : Logical has two values TRUE, FALSE and NA.
Integer : Integer has only whole numbers 1,2,3,4,5…
Numeric : Numeric has integer, double ( floating numbers).
Character : It contains string and arbitrary combinations of items( “Data types”).
Vector : Vector is a most common data structure. It has two different types which is Atomic and list vectors.
Atomic vector : A vector can be a vector of characters, logical, integers or numeric.
List vector : List vector can be number and strings.
Factor : Factors are special vectors that represent categorical data. Factors can only contain pre-defined values.It contains ordered and unordered functions.
Matrix : Matrix is two dimensional rectangular data set.
Data Frame : Data frames are tabular data objects. It is larger than the matrix in size.
A = c("Ganesh", "KumaR", "Tom", "Jerry")
class(A)
## [1] "character"
data()
typeof(data())
## [1] "list"
Pick up a vector with 7 elements :
vector = c(2, 3, 4, 5, 6, 7, 8)
Apply sd() function :
R_StandardDeviation_InBuilt = sd(vector)
print(R_StandardDeviation_InBuilt)
## [1] 2.160247
Calculate the standard deviation with hand :
sqrt(sum((vector-mean(vector))^2/(length(vector)-1)))
## [1] 2.160247
sd
## function (x, na.rm = FALSE)
## sqrt(var(if (is.vector(x) || is.factor(x)) x else as.double(x),
## na.rm = na.rm))
## <bytecode: 0x000001d664bdaf00>
## <environment: namespace:stats>
This is the standard deviation function. The codes are calculating the standard deviation.
My function for add three elements to vector.
add = function(x){
y = x + 3
return(y)
}
add(vector)
## [1] 5 6 7 8 9 10 11
library(ggplot2)
library(psych)
##
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
data()
plot = USArrests
print(plot)
## Murder Assault UrbanPop Rape
## Alabama 13.2 236 58 21.2
## Alaska 10.0 263 48 44.5
## Arizona 8.1 294 80 31.0
## Arkansas 8.8 190 50 19.5
## California 9.0 276 91 40.6
## Colorado 7.9 204 78 38.7
## Connecticut 3.3 110 77 11.1
## Delaware 5.9 238 72 15.8
## Florida 15.4 335 80 31.9
## Georgia 17.4 211 60 25.8
## Hawaii 5.3 46 83 20.2
## Idaho 2.6 120 54 14.2
## Illinois 10.4 249 83 24.0
## Indiana 7.2 113 65 21.0
## Iowa 2.2 56 57 11.3
## Kansas 6.0 115 66 18.0
## Kentucky 9.7 109 52 16.3
## Louisiana 15.4 249 66 22.2
## Maine 2.1 83 51 7.8
## Maryland 11.3 300 67 27.8
## Massachusetts 4.4 149 85 16.3
## Michigan 12.1 255 74 35.1
## Minnesota 2.7 72 66 14.9
## Mississippi 16.1 259 44 17.1
## Missouri 9.0 178 70 28.2
## Montana 6.0 109 53 16.4
## Nebraska 4.3 102 62 16.5
## Nevada 12.2 252 81 46.0
## New Hampshire 2.1 57 56 9.5
## New Jersey 7.4 159 89 18.8
## New Mexico 11.4 285 70 32.1
## New York 11.1 254 86 26.1
## North Carolina 13.0 337 45 16.1
## North Dakota 0.8 45 44 7.3
## Ohio 7.3 120 75 21.4
## Oklahoma 6.6 151 68 20.0
## Oregon 4.9 159 67 29.3
## Pennsylvania 6.3 106 72 14.9
## Rhode Island 3.4 174 87 8.3
## South Carolina 14.4 279 48 22.5
## South Dakota 3.8 86 45 12.8
## Tennessee 13.2 188 59 26.9
## Texas 12.7 201 80 25.5
## Utah 3.2 120 80 22.9
## Vermont 2.2 48 32 11.2
## Virginia 8.5 156 63 20.7
## Washington 4.0 145 73 26.2
## West Virginia 5.7 81 39 9.3
## Wisconsin 2.6 53 66 10.8
## Wyoming 6.8 161 60 15.6
p <- ggplot(USArrests, aes(x=Rape)) +
geom_density()
print(p)
Based on the plot it is a positive skewness. It is a right hand side skewness.
library(moments)
skewness(USArrests$Rape)
## [1] 0.7769613
We get above 0.77 in the skewness and it is moderate skewness.