- Introductory stuff
- Connecting to SQL Server
- Basic analysis with data.table and ggplot2
- Building interactive reports
- Dynamically built documents
- Q&A & Conclusion
Saturday, July 19, 2014
SQLRelay.co.uk & your user groups
R is a programming language for analysing data. Fundamentally, R allows you to directly code analysis, reports, and statistical models.
Like any technology there are some current quirks or limitations

First things first - get a SQL Server driver
Secondly - install the RODBC package for R and activate it
install.packages("RODBC",repos="http://cran.ma.imperial.ac.uk/")
## Installing package into 'C:/Users/OzandSteph/Documents/R/win-library/3.1' ## (as 'lib' is unspecified)
## package 'RODBC' successfully unpacked and MD5 sums checked ## ## The downloaded binary packages are in ## C:\Users\OzandSteph\AppData\Local\Temp\RtmpsjOV8U\downloaded_packages
# Activate the ODBC package for use
library("RODBC")
Thirdly - connect and grab some data!
# Connection courtesy of Redgate bit.ly/1jrl7V6
# Build a connection to the DB for reuse
azure <- odbcDriverConnect(
"Driver={SQL Server Native Client 11.0};
Server=mhknbn2kdz.database.windows.net;
Database=AdventureWorks2012;
Uid=sqlfamily;
Pwd=sqlf@m1ly;")
# Store simple SELECT statement results
person <- sqlQuery(azure,
"SELECT TOP 5 * FROM [Person].[Person]")
Inline example of dynamic results: 5 records returned on previous page and details below
#Show results person
## BusinessEntityID PersonType NameStyle Title FirstName MiddleName ## 1 1 EM 0 <NA> Ken J ## 2 2 EM 0 <NA> Terri Lee ## 3 3 EM 0 <NA> Roberto <NA> ## 4 4 EM 0 <NA> Rob <NA> ## 5 5 EM 0 Ms. Gail A ## LastName Suffix EmailPromotion AdditionalContactInfo ## 1 Sánchez NA 0 NA ## 2 Duffy NA 1 NA ## 3 Tamburello NA 0 NA ## 4 Walters NA 0 NA ## 5 Erickson NA 0 NA ## Demographics ## 1 <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"><TotalPurchaseYTD>0</TotalPurchaseYTD></IndividualSurvey> ## 2 <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"><TotalPurchaseYTD>0</TotalPurchaseYTD></IndividualSurvey> ## 3 <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"><TotalPurchaseYTD>0</TotalPurchaseYTD></IndividualSurvey> ## 4 <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"><TotalPurchaseYTD>0</TotalPurchaseYTD></IndividualSurvey> ## 5 <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"><TotalPurchaseYTD>0</TotalPurchaseYTD></IndividualSurvey> ## rowguid ModifiedDate ## 1 92C4279F-1207-48A3-8448-4636514EB7E2 2003-02-08 ## 2 D8763459-8AA8-47CC-AFF7-C9079AF79033 2002-02-24 ## 3 E1A2555E-0828-434B-A33B-6F38136A37DE 2001-12-05 ## 4 F2D7CE06-38B3-4357-805B-F4B6B71C01FF 2001-12-29 ## 5 F3A3F6B4-AE3B-430C-A754-9F2231BA6FEF 2002-01-30
TrustedConnection property for AD authenticating loginsdata.table and ggplot2 are probably the two most useful packages in R!
# SQL statement
sales <- sqlQuery(azure, "
SELECT
soh.OrderDate,
t.Name as Territory,
cr.countryregioncode as Code,
cr.Name as Region,
sum(TotalDue) as Value,
count(*) as Volume
FROM [Sales].[SalesOrderHeader] soh
INNER JOIN [Sales].[SalesTerritory] t
on t.territoryid=soh.territoryid
INNER JOIN [Person].[CountryRegion] cr
on cr.countryregioncode=t.countryregioncode
WHERE soh.OrderDate < '2008-07-01'
GROUP BY soh.OrderDate,
t.Name ,
cr.Name,
cr.countryregioncode")
# Convert to data.table sales<-data.table(sales) # Basic GROUP BY aggregation - returns count by region sales[,.N,by=Region]
## Region N ## 1: United States 1822 ## 2: Canada 607 ## 3: Germany 710 ## 4: France 695 ## 5: Australia 1027 ## 6: United Kingdom 758
# .SD returns the subset of rows and is filtered by which.max kable(sales[,.SD[which.max(Value)],by=Region])
| Region | OrderDate | Territory | Code | Value | Volume |
|---|---|---|---|---|---|
| United States | 2006-09-01 | Southwest | US | 1158625 | 36 |
| Canada | 2007-12-01 | Canada | CA | 803033 | 37 |
| Germany | 2007-08-01 | Germany | DE | 297738 | 16 |
| France | 2007-08-01 | France | FR | 803118 | 25 |
| Australia | 2007-09-01 | Australia | AU | 311014 | 21 |
| United Kingdom | 2007-09-01 | United Kingdom | GB | 561617 | 15 |
# ORDER, multiple aggregations, and aliasing
kable(sales[order(OrderDate),list(Value=sum(Value),Volume=sum(Volume))
,by=list(Year=year(OrderDate))])
| Year | Value | Volume |
|---|---|---|
| 2005 | 12693251 | 1379 |
| 2006 | 34463848 | 3692 |
| 2007 | 47171490 | 12443 |
| 2008 | 28832019 | 12975 |
salesbyyear<-sales[,list(AnnualValue=sum(Value),AnnualVolume=sum(Volume))
,keyby=list(Year=year(OrderDate))]
salesbyregion<-sales[,list(Value=sum(Value),Volume=sum(Volume))
,by=list(Region,Year=year(OrderDate))]
#PK assignment
setkey(salesbyregion,Year)
kable(salesbyyear[salesbyregion[Region=="United Kingdom"],
list(Proportion=percent(Volume/AnnualVolume))])
| Year | Proportion |
|---|---|
| 2005 | 6.96% |
| 2006 | 8.15% |
| 2007 | 10.9% |
| 2008 | 10.7% |
# Activate ggplot2
library(ggplot2)
# Quick plot (qplot)
qplot(year(OrderDate), Value,fill=Region,
data=sales[order(Region)],geom="bar",stat="identity")
# Chart backbone ggplot(data=sales[order(Region)],aes(x=year(OrderDate),y=Value,fill=Region))+ #bar chart type, identity uses contents not count geom_bar(stat="identity")+ #themeing scale_fill_brewer(type="qual")+ theme_minimal()
ggplot(data=sales,aes(x=year(OrderDate),y=Value,fill=Region))+ geom_bar(stat="identity")+ # Trellis command facet_grid(.~Region)+ theme(legend.position="none",axis.text.x=element_text(angle=45))
library(scales) ggplot(data=sales,aes(x=year(OrderDate),y=Value/1000000,fill=Region))+ geom_bar(stat="identity")+ facet_grid(.~Region)+ theme_minimal(base_size = 10) + theme(legend.position="none",axis.text.x=element_text(angle=45,hjust=1))+ scale_y_continuous(label=dollar)+ labs(x="Year",y="Value ($m)")
A web application framework for R Turn your analyses into interactive web applications No HTML, CSS, or JavaScript knowledge required
Uses magic and pixie dust to transform your R code into a webpage that provides end-users with the ability to analyse data interactively
It's call and response!
You need a Linux server for this and your applications can be made internally or externally available.
Shiny is by default open, use either a password system or utilise AD integration in the pro edition
library(shiny)
system.file("examples", package="shiny")
runExample("03_reactivity") # a reactive expression
runExample("05_sliders") # sliders
CONCEPT: modular documents that contain text and analysis that update
This is pretty much impossible in the Office paradigm.
R integrates with LaTeX, markdown, and pandoc to enable the blending of code and content to enable documents to be produced from scratch each time.
R is a great language that has a wide array of uses.
This presentation is available at http://rpubs.com/stephlocke/sqlbits