In this document, we will simulate a typical exercise of pulling data from a sqlite database and manipulating them in R. We will first create 3 tables in a sqlite database and then query them. We will follow the below steps:
To facilitate this exercise, we will use three .csv datasets saved on github. Both of these three datasets are based on fictitious data.
The overall goals of this exercise are:
We will first load a few helper libraries and the three datasets from github: customers, calls, and sales. The customers dataset has a list of all of our customers with their phone numbers and state of birth. The calls dataset has the number of calls by phone number and duration of calls. The sales dataset has our company’s sales to each customer.
library(DBI)
library(RSQLite)
library(tidyverse)
customers = read.csv("https://raw.githubusercontent.com/mugishajean/merge-data/master/Customers.csv")
head(customers)
## Phone Fname Lname State
## 1 5553334444 FirstName1 LastName1 CA
## 2 5553334445 FirstName2 LastName2 NY
## 3 5553334446 FirstName3 LastName3 NJ
## 4 5553334447 FirstName4 LastName4 CA
## 5 5553334448 FirstName5 LastName5 NC
## 6 5553334449 FirstName6 LastName6 FL
calls = read.csv("https://raw.githubusercontent.com/mugishajean/merge-data/master/calls.csv")
head(calls)
## Phone Calls Minutes
## 1 555-333-4444 3 20
## 2 (555) 333 4445 4 30
## 3 5553334446-xxx 5 40
## 4 (555)3334447 6 50
## 5 555)3334448 7 60
## 6 (555)3334449 8 70
sales = read.csv("https://raw.githubusercontent.com/mugishajean/merge-data/master/sales.csv")
head(sales)
## Product Category Name Sales
## 1 apple Fruit LastName1, FirstName1 1000
## 2 banana Fruit LastName6, FirstName6 400
## 3 strawberry Fruit LastName4, FirstName4 100
## 4 TV Equipment LastName5, FirstName5 10000
## 5 Radio Equipment LastName6, FirstName6 6000
## 6 Table Furniture LastName7, FirstName7 2100
Notice above the extra characters in calls phone numbers. We will need to transform them in order to merge calls with other datasets.
We will now create a sqlite database called ourcompany and save the three datasets.
createDB <- dbConnect(RSQLite::SQLite(), "ourcompany.sqlite")
dbWriteTable(createDB, "customers", customers)
dbWriteTable(createDB, "calls", calls)
dbWriteTable(createDB, "sales", sales)
dbListTables(createDB)
## [1] "calls" "customers" "sales"
Now that we have created the sqlite database, we can move on to querying the tables. We will load all observations in R.
mysales = dbGetQuery(createDB, 'SELECT * FROM sales')
mycustomers = dbGetQuery(createDB, 'SELECT * FROM customers')
mycalls = dbGetQuery(createDB, 'SELECT * FROM calls')
We will now proceed to transform the data and get the top 3 customers based on total sales.
mysales = mysales %>%
group_by(Category, Name) %>%
summarise(Totalsales = sum(Sales)) %>%
top_n(n=-1, wt = Name)
mysales
## # A tibble: 3 x 3
## # Groups: Category [3]
## Category Name Totalsales
## <chr> <chr> <int>
## 1 Equipment LastName1, FirstName1 31642
## 2 Fruit LastName1, FirstName1 1000
## 3 Furniture LastName5, FirstName5 7654
In order to merge with customers, we need to format the names so that they match sales. We will create a new column Name
mycustomers = mycustomers %>%
mutate(Name = paste(Lname, Fname, sep = ", "))
Next, we will remove the extra characters in calls:
mycalls = mycalls %>%
mutate(Phone = as.numeric(gsub("[-()x\\[:space:]]", "", Phone)))
head(mycalls)
## Phone Calls Minutes
## 1 5553334444 3 20
## 2 5553334445 4 30
## 3 5553334446 5 40
## 4 5553334447 6 50
## 5 5553334448 7 60
## 6 5553334449 8 70
Finally, we can merge sales and customers by Name and calls by Phone
mytop = left_join(mysales, mycustomers[, c(1,4,5)], by = "Name") %>%
left_join(mycalls, by="Phone")
mytop
## # A tibble: 3 x 7
## # Groups: Category [?]
## Category Name Totalsales Phone State Calls Minutes
## <chr> <chr> <int> <dbl> <chr> <int> <int>
## 1 Equipment LastName1, FirstName1 31642 5553334444 CA 3 20
## 2 Fruit LastName1, FirstName1 1000 5553334444 CA 3 20
## 3 Furniture LastName5, FirstName5 7654 5553334448 NC 7 60
In this document, we sought to demonstrate how to create a simple sqlite database and query data from it. We also show how to clean phone numbers and to get the top 3 customers.