Tidyverse

One of the very popular tool among data scientists is Tidyverse. It is a combination of lots of other powerful tools that makes the life of data scientists easy while doing the calculations and analysis. It is a collection of packages for preparing data, wrangling data and visualizing data. It was created by the team of Hadley Wickham.

Some of the tools that are very popular are:

  1. dplyr: It is a very powerful and popular tool used in data manipulation. With the use of pipe function “%>%” you can use functions like select(), join(), group_by(), filter(), etc to manipulate which makes the computation very fast.
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
head(women)
##   height weight
## 1     58    115
## 2     59    117
## 3     60    120
## 4     61    123
## 5     62    126
## 6     63    129
women %>%
  select(weight) %>%
  summarise(avg_A=mean(weight))
##      avg_A
## 1 136.7333
  1. readr: Another great tool to solve the problem of parsing a flat file is readr. It improves the computation speed. syntax: read_delim(‘filename.csv’, delim=“,”)

  2. ggplot2:

library(ggplot2)
ggplot(data = women) +
  aes(y = height, x = weight) +
  geom_point(data = women, colour = 'blue', size = 2) +
  theme_minimal()

There are many other packages like

tidr() purr() forcats() tibble()

I haven’t used these much but dplyr(), readr(), ggplot() comes on handy. These helped me and my team members a lot while we were doing our weekly homeworks and the projects for DATA 621.

Truely, Tidyverse is a very helpful package for data scientits.

library(knitr)
knitr::include_graphics('https://raw.githubusercontent.com/maharjansudhan/DATA621/master/0001.jpg')

Reference:

https://www.analyticsvidhya.com/blog/2019/05/beginner-guide-tidyverse-most-powerful-collection-r-packages-data-science/ https://www.tidyverse.org/packages/