1.Introduction

In this document, we will analyze employee and department data stored in two tables: employee_01 and departments_01. The data is analyzed using SQL queries executed in an R environment.

2.Setting Up the Database

First, we create an in-memory SQLite database and populate it with the employee and department data.

# Create SQLite database connection
con <- dbConnect(RSQLite::SQLite(), ":memory:")

# Create tables
dbExecute(con, "
CREATE TABLE employee_01 (
  employee_id INT PRIMARY KEY,
  employee_name VARCHAR(50) NOT NULL,
  department_id INT,
  salary DECIMAL(10, 2),
  hire_date DATE
);")
## [1] 0
dbExecute(con, "
CREATE TABLE departments_01 (
  department_id INT PRIMARY KEY,
  department_name VARCHAR(50) NOT NULL
);")
## [1] 0
# Insert data into departments_01
dbExecute(con, "INSERT INTO departments_01 VALUES (1, 'HR'), (2, 'IT'), (3, 'Finance'), (4, 'Marketing');")
## [1] 4
# Insert data into employee_01
dbExecute(con, "
INSERT INTO employee_01 VALUES 
  (101, 'Alice', 1, 60000, '2020-01-15'),
  (102, 'Bob', 2, 75000, '2021-03-10'),
  (103, 'Charlie', 3, 50000, '2019-07-22'),
  (104, 'Diana', 4, 65000, '2022-06-01'),
  (105, 'Eve', 2, 80000, '2020-11-11');
")
## [1] 5

3.Viewing the Data

Let’s verify that the data has been inserted correctly.

# Query to view departments_01 table
departments <- dbGetQuery(con, "SELECT * FROM departments_01")
print(departments)
##   department_id department_name
## 1             1              HR
## 2             2              IT
## 3             3         Finance
## 4             4       Marketing
# Query to view employee_01 table
employees <- dbGetQuery(con, "SELECT * FROM employee_01")
print(employees)
##   employee_id employee_name department_id salary  hire_date
## 1         101         Alice             1  60000 2020-01-15
## 2         102           Bob             2  75000 2021-03-10
## 3         103       Charlie             3  50000 2019-07-22
## 4         104         Diana             4  65000 2022-06-01
## 5         105           Eve             2  80000 2020-11-11

Query 1: Join Employees and Departments

We retrieve employee names, department names, and salaries by joining the employee_01 and departments_01 tables.

query1 <- dbGetQuery(con, "
SELECT e.employee_name, d.department_name, e.salary
FROM employee_01 e
JOIN departments_01 d
ON e.department_id = d.department_id;
")
print(query1)
##   employee_name department_name salary
## 1         Alice              HR  60000
## 2           Bob              IT  75000
## 3       Charlie         Finance  50000
## 4         Diana       Marketing  65000
## 5           Eve              IT  80000

Query 2: Employees with Salary Greater Than 60,000

We extract the names and salaries of employees earning more than 60,000.

query2 <- dbGetQuery(con, "
SELECT employee_name, salary 
FROM employee_01 
WHERE salary > 60000;
")
print(query2)
##   employee_name salary
## 1           Bob  75000
## 2         Diana  65000
## 3           Eve  80000

Query 3: Total Salary by Department

We calculate the total salary paid in each department by grouping the data by department name.

query3 <- dbGetQuery(con, "
SELECT d.department_name, SUM(e.salary) AS total_salary
FROM employee_01 e
JOIN departments_01 d
ON e.department_id = d.department_id
GROUP BY d.department_name;
")
print(query3)
##   department_name total_salary
## 1         Finance        50000
## 2              HR        60000
## 3              IT       155000
## 4       Marketing        65000

Employees Hired After January 1, 2020

We retrieve the names and hire dates of employees hired after January 1, 2020.

query4 <- dbGetQuery(con, "
SELECT employee_name, hire_date
FROM employee_01
WHERE hire_date > '2020-01-01';
")
print(query4)
##   employee_name  hire_date
## 1         Alice 2020-01-15
## 2           Bob 2021-03-10
## 3         Diana 2022-06-01
## 4           Eve 2020-11-11

4.Conclusion

This analysis demonstrates how SQL queries can be executed in R using SQLite for database management. Each query provides insights into the employee and department data, such as salary distribution and recent hires.

# Disconnect from the database
dbDisconnect(con)