In this document, we will analyze employee and department data stored in two tables: employee_01 and departments_01. The data is analyzed using SQL queries executed in an R environment.
First, we create an in-memory SQLite database and populate it with the employee and department data.
# Create SQLite database connection
con <- dbConnect(RSQLite::SQLite(), ":memory:")
# Create tables
dbExecute(con, "
CREATE TABLE employee_01 (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(50) NOT NULL,
department_id INT,
salary DECIMAL(10, 2),
hire_date DATE
);")
## [1] 0
dbExecute(con, "
CREATE TABLE departments_01 (
department_id INT PRIMARY KEY,
department_name VARCHAR(50) NOT NULL
);")
## [1] 0
# Insert data into departments_01
dbExecute(con, "INSERT INTO departments_01 VALUES (1, 'HR'), (2, 'IT'), (3, 'Finance'), (4, 'Marketing');")
## [1] 4
# Insert data into employee_01
dbExecute(con, "
INSERT INTO employee_01 VALUES
(101, 'Alice', 1, 60000, '2020-01-15'),
(102, 'Bob', 2, 75000, '2021-03-10'),
(103, 'Charlie', 3, 50000, '2019-07-22'),
(104, 'Diana', 4, 65000, '2022-06-01'),
(105, 'Eve', 2, 80000, '2020-11-11');
")
## [1] 5
Let’s verify that the data has been inserted correctly.
# Query to view departments_01 table
departments <- dbGetQuery(con, "SELECT * FROM departments_01")
print(departments)
## department_id department_name
## 1 1 HR
## 2 2 IT
## 3 3 Finance
## 4 4 Marketing
# Query to view employee_01 table
employees <- dbGetQuery(con, "SELECT * FROM employee_01")
print(employees)
## employee_id employee_name department_id salary hire_date
## 1 101 Alice 1 60000 2020-01-15
## 2 102 Bob 2 75000 2021-03-10
## 3 103 Charlie 3 50000 2019-07-22
## 4 104 Diana 4 65000 2022-06-01
## 5 105 Eve 2 80000 2020-11-11
We retrieve employee names, department names, and salaries by joining the employee_01 and departments_01 tables.
query1 <- dbGetQuery(con, "
SELECT e.employee_name, d.department_name, e.salary
FROM employee_01 e
JOIN departments_01 d
ON e.department_id = d.department_id;
")
print(query1)
## employee_name department_name salary
## 1 Alice HR 60000
## 2 Bob IT 75000
## 3 Charlie Finance 50000
## 4 Diana Marketing 65000
## 5 Eve IT 80000
We extract the names and salaries of employees earning more than 60,000.
query2 <- dbGetQuery(con, "
SELECT employee_name, salary
FROM employee_01
WHERE salary > 60000;
")
print(query2)
## employee_name salary
## 1 Bob 75000
## 2 Diana 65000
## 3 Eve 80000
We calculate the total salary paid in each department by grouping the data by department name.
query3 <- dbGetQuery(con, "
SELECT d.department_name, SUM(e.salary) AS total_salary
FROM employee_01 e
JOIN departments_01 d
ON e.department_id = d.department_id
GROUP BY d.department_name;
")
print(query3)
## department_name total_salary
## 1 Finance 50000
## 2 HR 60000
## 3 IT 155000
## 4 Marketing 65000
We retrieve the names and hire dates of employees hired after January 1, 2020.
query4 <- dbGetQuery(con, "
SELECT employee_name, hire_date
FROM employee_01
WHERE hire_date > '2020-01-01';
")
print(query4)
## employee_name hire_date
## 1 Alice 2020-01-15
## 2 Bob 2021-03-10
## 3 Diana 2022-06-01
## 4 Eve 2020-11-11
This analysis demonstrates how SQL queries can be executed in R using SQLite for database management. Each query provides insights into the employee and department data, such as salary distribution and recent hires.
# Disconnect from the database
dbDisconnect(con)