- Introductory stuff
- Connecting to SQL Server
- Basic analysis with data.table and ggplot2
- Building interactive reports
- Dynamically built documents
- Q&A & Conclusion
SQLRelay
R is a programming language for analysing data. Fundamentally, R allows you to directly code analysis, reports, and statistical models. Oh, and it can IO to your database.
Like any technology there are some current quirks or limitations
dplyr)
First things first - get a SQL Server driver
Secondly - install the RODBC package for R and activate it
install.packages("RODBC",repos="http://cran.ma.imperial.ac.uk/")
## Installing package into 'C:/Users/OzandSteph/Documents/R/win-library/3.1' ## (as 'lib' is unspecified)
## package 'RODBC' successfully unpacked and MD5 sums checked ## ## The downloaded binary packages are in ## C:\Users\OzandSteph\AppData\Local\Temp\Rtmpqyemkj\downloaded_packages
# Activate the ODBC package for use
library("RODBC")
Thirdly - connect and grab some data!
# Connection courtesy of Redgate bit.ly/1jrl7V6
# Build a connection to the DB for reuse
azure <- odbcDriverConnect(
"Driver={SQL Server Native Client 11.0};
Server=mhknbn2kdz.database.windows.net;
Database=AdventureWorks2012;
Uid=sqlfamily;
Pwd=sqlf@m1ly;")
# Store simple SELECT statement results
person <- sqlQuery(azure,
"SELECT TOP 5 * FROM [Person].[Person]")
Inline example of dynamic results: 5 records returned on previous page and details below
#Show results person
## BusinessEntityID PersonType NameStyle Title FirstName MiddleName ## 1 1 EM 0 <NA> Ken J ## 2 2 EM 0 <NA> Terri Lee ## 3 3 EM 0 <NA> Roberto <NA> ## 4 4 EM 0 <NA> Rob <NA> ## 5 5 EM 0 Ms. Gail A ## LastName Suffix EmailPromotion AdditionalContactInfo ## 1 Sánchez NA 0 NA ## 2 Duffy NA 1 NA ## 3 Tamburello NA 0 NA ## 4 Walters NA 0 NA ## 5 Erickson NA 0 NA ## Demographics ## 1 <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"><TotalPurchaseYTD>0</TotalPurchaseYTD></IndividualSurvey> ## 2 <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"><TotalPurchaseYTD>0</TotalPurchaseYTD></IndividualSurvey> ## 3 <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"><TotalPurchaseYTD>0</TotalPurchaseYTD></IndividualSurvey> ## 4 <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"><TotalPurchaseYTD>0</TotalPurchaseYTD></IndividualSurvey> ## 5 <IndividualSurvey xmlns="http://schemas.microsoft.com/sqlserver/2004/07/adventure-works/IndividualSurvey"><TotalPurchaseYTD>0</TotalPurchaseYTD></IndividualSurvey> ## rowguid ModifiedDate ## 1 92C4279F-1207-48A3-8448-4636514EB7E2 2003-02-08 ## 2 D8763459-8AA8-47CC-AFF7-C9079AF79033 2002-02-24 ## 3 E1A2555E-0828-434B-A33B-6F38136A37DE 2001-12-05 ## 4 F2D7CE06-38B3-4357-805B-F4B6B71C01FF 2001-12-29 ## 5 F3A3F6B4-AE3B-430C-A754-9F2231BA6FEF 2002-01-30
TrustedConnection property for AD authenticating loginsdata.table and ggplot2 are probably the two most useful packages in R!
# SQL statement
sales <- sqlQuery(azure, "
SELECT
soh.OrderDate,
t.Name as Territory,
cr.countryregioncode as Code,
cr.Name as Region,
sum(TotalDue) as Value,
count(*) as Volume
FROM [Sales].[SalesOrderHeader] soh
INNER JOIN [Sales].[SalesTerritory] t
on t.territoryid=soh.territoryid
INNER JOIN [Person].[CountryRegion] cr
on cr.countryregioncode=t.countryregioncode
WHERE soh.OrderDate < '2008-07-01'
GROUP BY soh.OrderDate,
t.Name ,
cr.Name,
cr.countryregioncode")
# Convert to data.table sales<-data.table(sales) # Basic GROUP BY aggregation - returns count by region sales[,.N,by=Region]
## Region N ## 1: United States 1822 ## 2: Canada 607 ## 3: Germany 710 ## 4: France 695 ## 5: Australia 1027 ## 6: United Kingdom 758
# .SD returns the subset of rows and is filtered by which.max kable(sales[,.SD[which.max(Value)],by=Region])
| Region | OrderDate | Territory | Code | Value | Volume |
|---|---|---|---|---|---|
| United States | 2006-09-01 | Southwest | US | 1158625 | 36 |
| Canada | 2007-12-01 | Canada | CA | 803033 | 37 |
| Germany | 2007-08-01 | Germany | DE | 297738 | 16 |
| France | 2007-08-01 | France | FR | 803118 | 25 |
| Australia | 2007-09-01 | Australia | AU | 311014 | 21 |
| United Kingdom | 2007-09-01 | United Kingdom | GB | 561617 | 15 |
# ORDER, multiple aggregations, and aliasing
kable(sales[order(OrderDate),.(Value=sum(Value),Volume=sum(Volume))
,by=.(Year=year(OrderDate))])
| Year | Value | Volume |
|---|---|---|
| 2005 | 12693251 | 1379 |
| 2006 | 34463848 | 3692 |
| 2007 | 47171490 | 12443 |
| 2008 | 28832019 | 12975 |
salesbyyear<-sales[,.(AnnualValue=sum(Value),AnnualVolume=sum(Volume))
,keyby=.(Year=year(OrderDate))]
salesbyregion<-sales[,.(Value=sum(Value),Volume=sum(Volume))
,by=.(Region,Year=year(OrderDate))]
setkey(salesbyregion,Year) #PK assignment for non-unique column
kable(salesbyyear[salesbyregion[Region=="United Kingdom"],
.(Proportion=percent(Volume/AnnualVolume)),by=.EACHI])
| Year | Proportion |
|---|---|
| 2005 | 6.96% |
| 2006 | 8.15% |
| 2007 | 10.9% |
| 2008 | 10.7% |
library(ggplot2) # Activate ggplot2
# Quick plot (qplot)
qplot(year(OrderDate), Value,fill=Region,
data=sales[order(Region)],geom="bar",stat="identity")
# Chart backbone ggplot(data=sales[order(Region)],aes(x=year(OrderDate),y=Value,fill=Region))+ geom_bar(stat="identity")+#bar chart type, identity uses contents not count scale_fill_brewer(type="qual")+ #themeing theme_minimal()
ggplot(data=sales,aes(x=year(OrderDate),y=Value,fill=Region))+ geom_bar(stat="identity")+ facet_grid(.~Region)+ # Trellis command theme(legend.position="none",axis.text.x=element_text(angle=45))
ggplot(data=sales,aes(x=year(OrderDate),y=Value/1000000,fill=Region))+ geom_bar(stat="identity")+ facet_grid(.~Region)+ theme_minimal(base_size = 10) + theme(legend.position="none",axis.text.x=element_text(angle=45,hjust=1))+ scale_y_continuous(label=dollar)+ labs(x="Year",y="Value ($m)")
A web application framework for R Turn your analyses into interactive web applications No HTML, CSS, or JavaScript knowledge required
Uses magic and pixie dust to transform your R code into a webpage that provides end-users with the ability to analyse data interactively
It's call and response!
You need a Linux server for this and your applications can be made internally or externally available.
Shiny is by default open, use either a password system or utilise AD integration in the pro edition
library(shiny)
system.file("examples", package="shiny")
runExample("03_reactivity") # a reactive expression
runExample("05_sliders") # sliders
CONCEPT: modular documents that contain text and analysis that update
This is pretty much impossible in the Office paradigm.
R integrates with LaTeX, markdown, and pandoc to enable the blending of code and content to enable documents to be produced from scratch each time.
R is a great language that has a wide array of uses. This session covered some of the basics of data manipulation to get you "walking" quickly.
This presentation is available at http://rpubs.com/stephlocke/sqlrelay and the source code is on https://github.com/stephlocke/Rintro