Data analysis with R

Andreu Vall - R Users Group Barcelona
July 23, 2013

What is R?

  • R is a language and environment for statistical computing and graphics
  • R provides a wide variety techniques
  • R is programmable and highly extensible
  • R is available as free software (multiplatform)

The R community

  • R has 4702 contributed packages (2013/07/22)
  • R has a large and increasing community of users Poll data tools

Source: 2012 KDnuggets Software Poll on languages for data mining.

Big Data and Data Analysis

When data is not so big

  • data.table: enhanced table like structure
  • ff: saving big objects out of RAM
  • foreach: loop supporting parallel execution
  • RODBC implements ODBC database connectivity

When data is really big

Check the High Performance Computing task view

Data Visualization

ggplot2

ggplot2 example

ggmap

googleVis

treemap

plot of chunk Treemap

Source: flowingdata

wordcloud

wordcloud example

Source: Fells Stats.

Open Data and Reproducible Research

R provides tools for automatic report generation combining text and R code

This makes very easy to share the results together with the detailed methodology

The R users Group in Barcelona

Thank you! … Questions?