Sometimes it makes sense to develop an internal R package at your company. These internal packages typically contain functions that make it easier to connect to your companies data warehouse or they may contain common data cleaning/wrangling functions specific to your company’s data. You can also create a custom ggplot2 theme with your company’s colors! The deaR package (dea stands for the data engineering and analytics team) is Clayton’s internal R package and it currently contains functions that make it easier to connect to and execute queries against our SQL Server data warehouse.
Please reach out to Austin Solomon for instructions on installing the deaR package.
init_sql_server
function)When querying a database in R, you first need to establish a connection to the data warehouse that holds that database. This can be pretty tricky sometimes, depending on what type of data warehouse you are trying to connect to. In our case at Clayton, we want to connect to SQL Server and we can do this using the init_sql_server
function from the deaR package. This function has two arguments: server
and database
. We use the server function to tell R that we want to connect to PROD (EDW), DEV (EDWDEV), or QUA (EDWQUA), and we use the database function to tell R which database to connect to (i.e. EDWRPT, CMHMFG_MFGDATALIB, CommodityForecast, etc.). We want to save our connection as an object in R so that we can reference the connection in queries (see below).
library(deaR)
conn <- init_sql_server(server = "EDW", database = "EDWRPT")
Once you’ve established a connection with SQL Server, you can now execute queries against any database in the data warehouse. Even though you specified a data warehouse in the init_sql_server
function, you can query any database in the warehouse as long as you specify the database name in your FROM
statement in the SQL query. The next step is to save your query as a .sql
file and then read the file into R using the read_file
function from the readr package. The read_file
function has an argument named file
which is just the file path to your SQL query. We will pretend that the query is in a queries folder under your Documents folder. Use the getwd()
fucntion to get your current working directory
getwd()
## [1] "C:/Users/SolomonA/Documents"
You can see from the output above that I’m currently in my Documents folder so I can read in the example query by running the line below.
library(readr)
my_query <- read_file("Documents/queries/my_query.sql")
Now that I have read in my query, I can now execute the query to get my table results into R using the query_sql_server
function. This function has two main arguments: channel
and query
. The channel
argument takes in the name of the connection to the data warehouse (we saved this as “conn” earlier in this post) and the query
argument takes in the name of the query that you read into R (we saved this as “my_query”). We can execute our query and save it as my_table using the code below.
my_table <- query_sql_server(channel = conn, query = my_query)
You should now have everything you need to query SQL Server at Clayton! Please don’t hesitate to reach out to Austin Solomon if you experience any issues or need help!