Results

Introduction

A data frame is the most common way of storing data in R and, generally, is the data structure most often used for data analysis. Data frames can store different classes of objects in each column (i.e. numeric, character, factor).

The easiest way to think of a data frame is as an Excel worksheet that contains columns of different types of data but are all of equal length rows.

We will use the built-in “attitude” dataset in R.

It is a data frame with 30 observations on 7 variables.

The variables are complaints, privileges, learning, raises, critical and advance.

They all represent ratings, which are collected from a survey of the clerical employees of a large financial organization.

Complaints - Handling of employee complaints.

Privileges - Does not allow special privileges.

Learning - Opportunity to learn.

Raises - Raises based on performance.

Critical - Too critical.

Advance - Advancement.

Packages Info

The packages used are tidyverse and ggplot2.

Tidyverse is a collection of essential R packages for data science. The packages under the tidyverse umbrella help us in performing and interacting with the data. We can do a lot with data, including subsetting, transforming and visualizing.

Tidyverse was created to provide all these utilities to clean and work with data.

ggplot2, included in the tidyverse package, is a plotting package that makes it simple to create complex plots from data in a data frame. It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties.

In short, ggplot2 allows us to use the grammar of graphics to build layered, customizable plots.

Data Preparation

To use existing of attitude dataset in R for analysis.

data = attitude 

Data Analysis

Basic descriptive statistics

1. The summary function provides the summary of our data and variables. We can see that for all the numeric attributes, it displays min, 1st quartile, median, mean, 3rd quartile and max values.

summary(data) 
##      rating        complaints     privileges       learning         raises     
##  Min.   :40.00   Min.   :37.0   Min.   :30.00   Min.   :34.00   Min.   :43.00  
##  1st Qu.:58.75   1st Qu.:58.5   1st Qu.:45.00   1st Qu.:47.00   1st Qu.:58.25  
##  Median :65.50   Median :65.0   Median :51.50   Median :56.50   Median :63.50  
##  Mean   :64.63   Mean   :66.6   Mean   :53.13   Mean   :56.37   Mean   :64.63  
##  3rd Qu.:71.75   3rd Qu.:77.0   3rd Qu.:62.50   3rd Qu.:66.75   3rd Qu.:71.00  
##  Max.   :85.00   Max.   :90.0   Max.   :83.00   Max.   :75.00   Max.   :88.00  
##     critical        advance     
##  Min.   :49.00   Min.   :25.00  
##  1st Qu.:69.25   1st Qu.:35.00  
##  Median :77.50   Median :41.00  
##  Mean   :74.77   Mean   :42.93  
##  3rd Qu.:80.00   3rd Qu.:47.75  
##  Max.   :92.00   Max.   :72.00

2. The glimpse function provides a vertical preview of the dataset and also allows us to easily preview the data type and sample data .

library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.1.1     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## Warning: package 'readr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
glimpse(data)
## Rows: 30
## Columns: 7
## $ rating     <dbl> 43, 63, 71, 61, 81, 43, 58, 71, 72, 67, 64, 67, 69, 68, 77,~
## $ complaints <dbl> 51, 64, 70, 63, 78, 55, 67, 75, 82, 61, 53, 60, 62, 83, 77,~
## $ privileges <dbl> 30, 51, 68, 45, 56, 49, 42, 50, 72, 45, 53, 47, 57, 83, 54,~
## $ learning   <dbl> 39, 54, 69, 47, 66, 44, 56, 55, 67, 47, 58, 39, 42, 45, 72,~
## $ raises     <dbl> 61, 63, 76, 54, 71, 54, 66, 70, 71, 62, 58, 59, 55, 59, 79,~
## $ critical   <dbl> 92, 73, 86, 84, 83, 49, 68, 66, 83, 80, 67, 74, 63, 77, 77,~
## $ advance    <dbl> 45, 47, 48, 35, 47, 34, 35, 41, 31, 41, 34, 41, 25, 35, 46,~

3. Calculate number of rows in a dataset. .

nrow(data)
## [1] 30

4. Calculate number of columns in a dataset.

ncol(data)
## [1] 7

5. List structure of a dataset

str(data)
## 'data.frame':    30 obs. of  7 variables:
##  $ rating    : num  43 63 71 61 81 43 58 71 72 67 ...
##  $ complaints: num  51 64 70 63 78 55 67 75 82 61 ...
##  $ privileges: num  30 51 68 45 56 49 42 50 72 45 ...
##  $ learning  : num  39 54 69 47 66 44 56 55 67 47 ...
##  $ raises    : num  61 63 76 54 71 54 66 70 71 62 ...
##  $ critical  : num  92 73 86 84 83 49 68 66 83 80 ...
##  $ advance   : num  45 47 48 35 47 34 35 41 31 41 ...

6. See first 6 rows (by default of dataset)

head(data)
##   rating complaints privileges learning raises critical advance
## 1     43         51         30       39     61       92      45
## 2     63         64         51       54     63       73      47
## 3     71         70         68       69     76       86      48
## 4     61         63         45       47     54       84      35
## 5     81         78         56       66     71       83      47
## 6     43         55         49       44     54       49      34

7. See first 15 rows of dataset

head(data,15)
##    rating complaints privileges learning raises critical advance
## 1      43         51         30       39     61       92      45
## 2      63         64         51       54     63       73      47
## 3      71         70         68       69     76       86      48
## 4      61         63         45       47     54       84      35
## 5      81         78         56       66     71       83      47
## 6      43         55         49       44     54       49      34
## 7      58         67         42       56     66       68      35
## 8      71         75         50       55     70       66      41
## 9      72         82         72       67     71       83      31
## 10     67         61         45       47     62       80      41
## 11     64         53         53       58     58       67      34
## 12     67         60         47       39     59       74      41
## 13     69         62         57       42     55       63      25
## 14     68         83         83       45     59       77      35
## 15     77         77         54       72     79       77      46

8. See all rows but the last row

head(data,nrow(data)-1)
##    rating complaints privileges learning raises critical advance
## 1      43         51         30       39     61       92      45
## 2      63         64         51       54     63       73      47
## 3      71         70         68       69     76       86      48
## 4      61         63         45       47     54       84      35
## 5      81         78         56       66     71       83      47
## 6      43         55         49       44     54       49      34
## 7      58         67         42       56     66       68      35
## 8      71         75         50       55     70       66      41
## 9      72         82         72       67     71       83      31
## 10     67         61         45       47     62       80      41
## 11     64         53         53       58     58       67      34
## 12     67         60         47       39     59       74      41
## 13     69         62         57       42     55       63      25
## 14     68         83         83       45     59       77      35
## 15     77         77         54       72     79       77      46
## 16     81         90         50       72     60       54      36
## 17     74         85         64       69     79       79      63
## 18     65         60         65       75     55       80      60
## 19     65         70         46       57     75       85      46
## 20     50         58         68       54     64       78      52
## 21     50         40         33       34     43       64      33
## 22     64         61         52       62     66       80      41
## 23     53         66         52       50     63       80      37
## 24     40         37         42       58     50       57      49
## 25     63         54         42       48     66       75      33
## 26     66         77         66       63     88       76      72
## 27     78         75         58       74     80       78      49
## 28     48         57         44       45     51       83      38
## 29     85         85         71       71     77       74      55

9. See last 6 rows (by default) of a dataset

tail(data)
##    rating complaints privileges learning raises critical advance
## 25     63         54         42       48     66       75      33
## 26     66         77         66       63     88       76      72
## 27     78         75         58       74     80       78      49
## 28     48         57         44       45     51       83      38
## 29     85         85         71       71     77       74      55
## 30     82         82         39       59     64       78      39

10. See last 12 rows of dataset

tail(data,12)
##    rating complaints privileges learning raises critical advance
## 19     65         70         46       57     75       85      46
## 20     50         58         68       54     64       78      52
## 21     50         40         33       34     43       64      33
## 22     64         61         52       62     66       80      41
## 23     53         66         52       50     63       80      37
## 24     40         37         42       58     50       57      49
## 25     63         54         42       48     66       75      33
## 26     66         77         66       63     88       76      72
## 27     78         75         58       74     80       78      49
## 28     48         57         44       45     51       83      38
## 29     85         85         71       71     77       74      55
## 30     82         82         39       59     64       78      39

11. See the last n rows but the first row

tail(data,nrow(data)-1)
##    rating complaints privileges learning raises critical advance
## 2      63         64         51       54     63       73      47
## 3      71         70         68       69     76       86      48
## 4      61         63         45       47     54       84      35
## 5      81         78         56       66     71       83      47
## 6      43         55         49       44     54       49      34
## 7      58         67         42       56     66       68      35
## 8      71         75         50       55     70       66      41
## 9      72         82         72       67     71       83      31
## 10     67         61         45       47     62       80      41
## 11     64         53         53       58     58       67      34
## 12     67         60         47       39     59       74      41
## 13     69         62         57       42     55       63      25
## 14     68         83         83       45     59       77      35
## 15     77         77         54       72     79       77      46
## 16     81         90         50       72     60       54      36
## 17     74         85         64       69     79       79      63
## 18     65         60         65       75     55       80      60
## 19     65         70         46       57     75       85      46
## 20     50         58         68       54     64       78      52
## 21     50         40         33       34     43       64      33
## 22     64         61         52       62     66       80      41
## 23     53         66         52       50     63       80      37
## 24     40         37         42       58     50       57      49
## 25     63         54         42       48     66       75      33
## 26     66         77         66       63     88       76      72
## 27     78         75         58       74     80       78      49
## 28     48         57         44       45     51       83      38
## 29     85         85         71       71     77       74      55
## 30     82         82         39       59     64       78      39

12. Which function will returns number of missing values in each variable of a dataset?

The is.na() function will find missing values for you.

is.na(data)
##       rating complaints privileges learning raises critical advance
##  [1,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
##  [2,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
##  [3,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
##  [4,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
##  [5,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
##  [6,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
##  [7,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
##  [8,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
##  [9,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [10,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [11,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [12,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [13,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [14,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [15,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [16,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [17,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [18,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [19,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [20,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [21,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [22,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [23,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [24,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [25,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [26,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [27,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [28,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [29,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE
## [30,]  FALSE      FALSE      FALSE    FALSE  FALSE    FALSE   FALSE

13. Number of missing values in a single variable: sum(is.na(df$col))

sum(is.na(data$rating))
## [1] 0

Graph Plotting

14. Plot a simple graph, which will appear on a screen device.

library(ggplot2)
ggplot(data, aes(x=privileges,y=learning)) + 
  geom_point()

15. Plotting the graph with more details, such as title and axis labelling.

library(ggplot2)
ggplot(data, aes(x=privileges,y=learning)) + 
  geom_point(size=3, shape=1) + #changing the size and shape
  labs(
        title = "Learning Attitude", #Provide a Title
        x = "Privileges", #Renaming the label of x-axis
        y = "Learning"#Renaming the label of y-axis
         ) +
  theme(
    plot.title = element_text(hjust = 0.5, size = 14, face = "bold")   # Center title position and size
  )

16. We can print the graph plotted into pdf with the dev.copy2pdf() function.

dev.copy2pdf()
## png 
##   2