attitude is the chosen dataset for this EDA
attitude
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
## 7 58 67 42 56 66 68 35
## 8 71 75 50 55 70 66 41
## 9 72 82 72 67 71 83 31
## 10 67 61 45 47 62 80 41
## 11 64 53 53 58 58 67 34
## 12 67 60 47 39 59 74 41
## 13 69 62 57 42 55 63 25
## 14 68 83 83 45 59 77 35
## 15 77 77 54 72 79 77 46
## 16 81 90 50 72 60 54 36
## 17 74 85 64 69 79 79 63
## 18 65 60 65 75 55 80 60
## 19 65 70 46 57 75 85 46
## 20 50 58 68 54 64 78 52
## 21 50 40 33 34 43 64 33
## 22 64 61 52 62 66 80 41
## 23 53 66 52 50 63 80 37
## 24 40 37 42 58 50 57 49
## 25 63 54 42 48 66 75 33
## 26 66 77 66 63 88 76 72
## 27 78 75 58 74 80 78 49
## 28 48 57 44 45 51 83 38
## 29 85 85 71 71 77 74 55
## 30 82 82 39 59 64 78 39
1. See basic descriptive statistics
dim(attitude) #Dimensions of Dataframe
## [1] 30 7
str(attitude) #Structure of Dataframe
## 'data.frame': 30 obs. of 7 variables:
## $ rating : num 43 63 71 61 81 43 58 71 72 67 ...
## $ complaints: num 51 64 70 63 78 55 67 75 82 61 ...
## $ privileges: num 30 51 68 45 56 49 42 50 72 45 ...
## $ learning : num 39 54 69 47 66 44 56 55 67 47 ...
## $ raises : num 61 63 76 54 71 54 66 70 71 62 ...
## $ critical : num 92 73 86 84 83 49 68 66 83 80 ...
## $ advance : num 45 47 48 35 47 34 35 41 31 41 ...
summary(attitude) #Summary of Dataframe
## rating complaints privileges learning raises
## Min. :40.00 Min. :37.0 Min. :30.00 Min. :34.00 Min. :43.00
## 1st Qu.:58.75 1st Qu.:58.5 1st Qu.:45.00 1st Qu.:47.00 1st Qu.:58.25
## Median :65.50 Median :65.0 Median :51.50 Median :56.50 Median :63.50
## Mean :64.63 Mean :66.6 Mean :53.13 Mean :56.37 Mean :64.63
## 3rd Qu.:71.75 3rd Qu.:77.0 3rd Qu.:62.50 3rd Qu.:66.75 3rd Qu.:71.00
## Max. :85.00 Max. :90.0 Max. :83.00 Max. :75.00 Max. :88.00
## critical advance
## Min. :49.00 Min. :25.00
## 1st Qu.:69.25 1st Qu.:35.00
## Median :77.50 Median :41.00
## Mean :74.77 Mean :42.93
## 3rd Qu.:80.00 3rd Qu.:47.75
## Max. :92.00 Max. :72.00
1 - a) What is the difference between (attitude[3]) and (attitude$learning)
Ans.
Below, we observe that attitude[3] results a dataframe consists of one column. On the other hand attitude$learning produces a numeric vector
attitude[3]
## privileges
## 1 30
## 2 51
## 3 68
## 4 45
## 5 56
## 6 49
## 7 42
## 8 50
## 9 72
## 10 45
## 11 53
## 12 47
## 13 57
## 14 83
## 15 54
## 16 50
## 17 64
## 18 65
## 19 46
## 20 68
## 21 33
## 22 52
## 23 52
## 24 42
## 25 42
## 26 66
## 27 58
## 28 44
## 29 71
## 30 39
attitude$learning
## [1] 39 54 69 47 66 44 56 55 67 47 58 39 42 45 72 72 69 75 57 54 34 62 50 58 48
## [26] 63 74 45 71 59
class(attitude[3])
## [1] "data.frame"
class(attitude$learning)
## [1] "numeric"
2. Lists name of variables in a dataset
colnames(attitude)
## [1] "rating" "complaints" "privileges" "learning" "raises"
## [6] "critical" "advance"
3. Calculate number of rows in a dataset
nrow(attitude)
## [1] 30
4. Calculate number of columns in a dataset
ncol(attitude)
## [1] 7
5. List structure of a dataset
str(attitude)
## 'data.frame': 30 obs. of 7 variables:
## $ rating : num 43 63 71 61 81 43 58 71 72 67 ...
## $ complaints: num 51 64 70 63 78 55 67 75 82 61 ...
## $ privileges: num 30 51 68 45 56 49 42 50 72 45 ...
## $ learning : num 39 54 69 47 66 44 56 55 67 47 ...
## $ raises : num 61 63 76 54 71 54 66 70 71 62 ...
## $ critical : num 92 73 86 84 83 49 68 66 83 80 ...
## $ advance : num 45 47 48 35 47 34 35 41 31 41 ...
6. See first 6 rows (by default) of dataset
head(attitude)
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
7. See first n rows of dataset. Select to see the first 15 rows of dataset
head(attitude,15)
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
## 7 58 67 42 56 66 68 35
## 8 71 75 50 55 70 66 41
## 9 72 82 72 67 71 83 31
## 10 67 61 45 47 62 80 41
## 11 64 53 53 58 58 67 34
## 12 67 60 47 39 59 74 41
## 13 69 62 57 42 55 63 25
## 14 68 83 83 45 59 77 35
## 15 77 77 54 72 79 77 46
8. See all rows but the last row
head(attitude,nrow(attitude)-1)
## rating complaints privileges learning raises critical advance
## 1 43 51 30 39 61 92 45
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
## 7 58 67 42 56 66 68 35
## 8 71 75 50 55 70 66 41
## 9 72 82 72 67 71 83 31
## 10 67 61 45 47 62 80 41
## 11 64 53 53 58 58 67 34
## 12 67 60 47 39 59 74 41
## 13 69 62 57 42 55 63 25
## 14 68 83 83 45 59 77 35
## 15 77 77 54 72 79 77 46
## 16 81 90 50 72 60 54 36
## 17 74 85 64 69 79 79 63
## 18 65 60 65 75 55 80 60
## 19 65 70 46 57 75 85 46
## 20 50 58 68 54 64 78 52
## 21 50 40 33 34 43 64 33
## 22 64 61 52 62 66 80 41
## 23 53 66 52 50 63 80 37
## 24 40 37 42 58 50 57 49
## 25 63 54 42 48 66 75 33
## 26 66 77 66 63 88 76 72
## 27 78 75 58 74 80 78 49
## 28 48 57 44 45 51 83 38
## 29 85 85 71 71 77 74 55
9. See last 6 rows (by default) of a dataset
tail(attitude)
## rating complaints privileges learning raises critical advance
## 25 63 54 42 48 66 75 33
## 26 66 77 66 63 88 76 72
## 27 78 75 58 74 80 78 49
## 28 48 57 44 45 51 83 38
## 29 85 85 71 71 77 74 55
## 30 82 82 39 59 64 78 39
10. See last n rows of dataset. Select to see the last 12 rows of dataset.
tail(attitude,12)
## rating complaints privileges learning raises critical advance
## 19 65 70 46 57 75 85 46
## 20 50 58 68 54 64 78 52
## 21 50 40 33 34 43 64 33
## 22 64 61 52 62 66 80 41
## 23 53 66 52 50 63 80 37
## 24 40 37 42 58 50 57 49
## 25 63 54 42 48 66 75 33
## 26 66 77 66 63 88 76 72
## 27 78 75 58 74 80 78 49
## 28 48 57 44 45 51 83 38
## 29 85 85 71 71 77 74 55
## 30 82 82 39 59 64 78 39
11. See the last n rows but the first row
tail(attitude,nrow(attitude)-1) # See all rows but first row
## rating complaints privileges learning raises critical advance
## 2 63 64 51 54 63 73 47
## 3 71 70 68 69 76 86 48
## 4 61 63 45 47 54 84 35
## 5 81 78 56 66 71 83 47
## 6 43 55 49 44 54 49 34
## 7 58 67 42 56 66 68 35
## 8 71 75 50 55 70 66 41
## 9 72 82 72 67 71 83 31
## 10 67 61 45 47 62 80 41
## 11 64 53 53 58 58 67 34
## 12 67 60 47 39 59 74 41
## 13 69 62 57 42 55 63 25
## 14 68 83 83 45 59 77 35
## 15 77 77 54 72 79 77 46
## 16 81 90 50 72 60 54 36
## 17 74 85 64 69 79 79 63
## 18 65 60 65 75 55 80 60
## 19 65 70 46 57 75 85 46
## 20 50 58 68 54 64 78 52
## 21 50 40 33 34 43 64 33
## 22 64 61 52 62 66 80 41
## 23 53 66 52 50 63 80 37
## 24 40 37 42 58 50 57 49
## 25 63 54 42 48 66 75 33
## 26 66 77 66 63 88 76 72
## 27 78 75 58 74 80 78 49
## 28 48 57 44 45 51 83 38
## 29 85 85 71 71 77 74 55
## 30 82 82 39 59 64 78 39
12. Number of missing values. Which function will return number of missing values in each variable of a dataset?
is.na(attitude)
## rating complaints privileges learning raises critical advance
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [11,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [14,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [15,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [16,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [17,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [18,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [19,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [20,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [21,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [22,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [23,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [24,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [26,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [27,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [28,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [29,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [30,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
cat("The total number of missing values in the dataset is" , sum(is.na(attitude)))
## The total number of missing values in the dataset is 0
13. Number of missing values in a single variable
for (i in 1:ncol(attitude)) {
x <- sum(is.na(attitude[i]))
cat("Number of missing values in col ", i, " = ", x, "\n")
}
## Number of missing values in col 1 = 0
## Number of missing values in col 2 = 0
## Number of missing values in col 3 = 0
## Number of missing values in col 4 = 0
## Number of missing values in col 5 = 0
## Number of missing values in col 6 = 0
## Number of missing values in col 7 = 0
14. Plot a simple graph, which will appear on a screen device.
plot(attitude) #Using only the basic command

15. Plot the graph shown below, and make it appear on a file device (a pdf file)
plot(x = attitude$privileges,
y = attitude$learning,
xlab = "Privileges",
ylab = "Learning",
main= "Learning Attitude",
xlim = c(30,85),
ylim = c(40,75))

Saving the plot as PDF on my disktop
pdf(file = "/Users/salahkaf/Desktop/My Plot.pdf", # The directory to save the plot
width = 8, # The width of the plot in inches
height = 8) # The height of the plot in inches
plot(x = attitude$privileges,
y = attitude$learning,
xlab = "Privileges",
ylab = "Learning",
main= "Learning Attitude",
xlim = c(30,85),
ylim = c(40,75))
dev.off()
## quartz_off_screen
## 2