1 - Importation de données

readr

The key problem that readr solves is parsing a flat file into a tibble. Parsing is the process of taking a text file and turning it into a rectangular tibble where each column is the appropriate part. Parsing takes place in three basic stages:

  • The flat file is parsed into a rectangular matrix of strings.

  • The type of each column is determined.

  • Each column of strings is parsed into a vector of a more specific type.

It’s easiest to learn how this works in the opposite order Below, you’ll learn how the:

  • Vector parsers turn a character vector in to a more specific type.

  • Column specification describes the type of each column and the strategy readr uses to guess types so you don’t need to supply them all.

  • Rectangular parsers turn a flat file into a matrix of rows and columns.

Vignette

readxl

The readxl package makes it easy to get data out of Excel and into R. Compared to many of the existing packages (e.g. gdata, xlsx, xlsReadWrite) readxl has no external dependencies, so it’s easy to install and use on all operating systems.

It is designed to work with tabular data.

readxl supports both the legacy .xls format and the modern xml-based .xlsx format.

Vignette

rio

rio supports a variety of different file formats for import and export. To keep the package slim, all non-essential formats are supported via “Suggests” packages, which are not installed (or loaded) by default. To ensure rio is fully functional, install these packages the first time you use rio via install_formats()

Vignette

haven

haven enables R to read and write various data formats used by other statistical packages.

Vignette

jsonlite

The jsonlite package is a JSON parser/generator optimized for the web.

Its main strength is that it implements a bidirectional mapping between JSON data and the most important R data types. Thereby we can convert between R objects and JSON without loss of type or information, and without the need for any manual data munging.

This is ideal for interacting with web APIs, or to build pipelines where data structures seamlessly flow in and out of R using JSON.

Vignette

googlesheets

Vignette

# Synthèse
install.packages("readr")
install.packages("readxl")
install.packages("rio")
install.packages("haven")
install.packages("jsonlite")
install.packages("googlesheets")

2 - EDA

DataExplorer

There are 3 main goals for DataExplorer:

  • Exploratory Data Analysis (EDA)
  • Feature Engineering
  • Data Reporting

Vignette

dlookr

A collection of tools that support data diagnosis, exploration, and transformation. Data diagnostics provides information and visualization of missing values and outliers and unique and negative values to help you understand the distribution and quality of your data.

Data exploration provides information and visualization of the descriptive statistics of univariate variables, normality tests and outliers, correlation of two variables, and relationship between target variable and predictor.

Data transformation supports binning for categorizing continuous variables, imputates missing values and outliers, resolving skewness. And it creates automated reports that support these three tasks.

# Synthèse
install.packages("dlookr")

Vignette

3 - Data wrangling

dplyr

Vignette

tidyr

Vignette

forcats

Vignette

plyr

Vignette

reshape2

Vignette

data.table

Vignette

questionr

Vignette

Hmisc

Vignette

janitor

Vignette

# Synthèse
install.packages("dplyr")
install.packages("tidyr")
install.packages("forcats")
install.packages("plyr")
install.packages("reshape2")
install.packages("DT")
install.packages("data.table")
install.packages("questionr")
install.packages("Hmisc")
install.packages("janitor")

4 - Text data

CRAN Task View: Natural Language Processing

stringr

Vignette

stringi

Vignette

tidytext

Vignette

# Synthèse
install.packages("stringr")
install.packages("stringi")
install.packages("tidytext")

5 - Anomalies / outliers / missing values

CRAN Task View: Missing Data

missMDA

Vignette

VIM

Vignette

mice

Vignette

simputation

Vignette

Amelia

Vignette

stranger

stranger is a framework for unsupervised anomalies detection that simplifies the user experience because the one does not need to be concerned with the many packages and functions that are required.

It acts as a wrapper around existing packages (“à la Caret”) and provides in a clean and uniform toolkit for evaluation explaination reporting routines. Hence the name stranger taht stands for Simple Toolkit in R for Anomalies Get Explain and Report.

Vignette

# Synthèse

install.packages("missMDA")
install.packages("VIM")
install.packages("mice")
install.packages("simputation")
install.packages("Amelia")

6 - Machine learning

CRAN Task View: Machine Learning

# Synthèse
install.packages("caret")
install.packages("mlr")
install.packages("FactoMineR")
install.packages("factoextra")
install.packages("tensorflow")
install.packages("keras")
install.packages("NbClust")
install.packages("ada")
install.packages("randomForest")
install.packages("rpart")
install.packages("CHAID")
install.packages("caTools")
install.packages("ranger")
install.packages("earth")
install.packages("elasticnet")
install.packages("gbm")
install.packages("kernlab")
install.packages("klaR")
install.packages("kknn")
install.packages("MASS")
install.packages("kohonen")
install.packages("neuralnet")
install.packages("nnet")
install.packages("party")
install.packages("xgboost")

7 - Databases

CRAN Task View: Databases with R

# Synthèse
install.packages("dbplyr")
install.packages("odbc")
install.packages("DBI")
install.packages("RODBC")
install.packages("RJDBC")
install.packages("RMySQL")
install.packages("RSQLite")

8 - Time series

CRAN Task View: Time Series Analysis

# Synthèse
install.packages("zoo")
install.packages("xts")
install.packages("prophet")
install.packages("forecast")

9 - Shiny

ShowMeShiny

# Synthèse
install.packages("shiny")
install.packages("shinydashboard")
install.packages("flexdashboard")
install.packages("htmlwidgets")

10 - Maps & Spatial Data

CRAN Task View: Analysis of Spatial Data

# Synthèse
install.packages("sp")
install.packages("sf")
install.packages("leaflet")
install.packages("rgeos")
install.packages("maps")
install.packages("maptools")
install.packages("osmr")
install.packages("osmdata")

11 - Plots

The R Graph Gallery

# Synthèse
install.packages("ggplot2")
install.packages("plotly")
install.packages("highcharter")
install.packages("corrplot")

12- Utilities

# Synthèse
install.packages("devtools")
install.packages("magrittr")
install.packages("reticulate")
install.packages("knitr")
install.packages("rmarkdown")
install.packages("rsconnect")
install.packages("prettydoc")
install.packages("RColorBrewer")
install.packages("clipr")
install.packages("tictoc")
install.packages("RCurl")
install.packages("rvest")
install.packages("httr")