SQL | Yueyin Ji

SQL Project:
Employee Value Evaluation System

Introduction

The provided database contains the 11 tables (the database schema is below), with transactional data related to some company operations.

In this system, I have divided it into five criteria to comprehensively assess the value of an employee. These five criteria are the productivity index, quality index, market expansion and diversification index(diversity index), cost index, and customer relationship index.

Data Processing Design and create a data warehouse for the provided database.

In this part, I combined data from eight tables

CODE:

USE cis467_final_project;

SET sql_mode = (SELECT REPLACE(@@SQL_MODE, "ONLY_FULL_GROUP_BY", ""));

CREATE OR REPLACE VIEW employee_data AS

SELECT

e.employeeID,

CONCAT(e.FirstName, ' ', e.LastName) AS employee_name,

e.title,

e.reportsTo,

o.orderID,

ROUND(SUM(od.UnitPrice * od.Quantity * (1 - od.discount)), 2) AS order_amount,

COUNT(DISTINCT o.orderID) AS order_number,

STR_TO_DATE(o.shippedDate, '%m/%d/%Y') AS shippedDate,

STR_TO_DATE(o.requireddate, '%m/%d/%Y') AS requiredDate,

STR_TO_DATE(o.orderDate, '%m/%d/%Y') AS orderDate,

CASE

WHEN DATEDIFF(STR_TO_DATE(o.requiredDate, '%m/%d/%Y'), STR_TO_DATE(o.shippedDate, '%m/%d/%Y')) > 0 THEN 0

ELSE DATEDIFF(STR_TO_DATE(o.shippedDate, '%m/%d/%Y'), STR_TO_DATE(o.requiredDate, '%m/%d/%Y'))

END AS delay_day,

SUM(p.discontinued) AS missend,

p.reorderlevel,

p.unitsinstock,

p.unitsonorder,

SUM(o.freight) AS total_freight,

o.shipcountry,

COUNT(DISTINCT c.CategoryID) AS num_category,

ROUND(AVG(od.discount), 2) AS discount,

o.customerID,

r.regionID,

r.regiondescription,

p.productID,

c.categoryname,

t.TerritoryID,

t.TerritoryDescription

FROM

employees e

JOIN orders o ON e.EmployeeID = o.employeeID

JOIN order_details od ON o.orderID = od.orderID

JOIN products p ON p.ProductID = od.productID

JOIN categories c ON p.CategoryID = c.CategoryID

LEFT JOIN employeeterritories et ON e.employeeID = et.employeeID

LEFT JOIN territories t ON et.TerritoryID = t.TerritoryID

LEFT JOIN region r ON t.regionID = r.regionID

GROUP BY

e.employeeID,

o.orderID

ORDER BY e.employeeID;

select * from employee_data LIMIT 25;

Result

Data Analysis Create queries on the data warehouse.

I used complex queries including Joins, a Group By, UNION elements, a subquery, and some aggregate functions and summary calculations.

Main question: who is the most valuable employee?

Solution1: cost index

CODE:

SELECT

employeeID,

round(AVG(discount) * 100,2) AS average_discount_percentage,

NTILE(4) OVER(ORDER BY AVG(discount) ASC) AS cost_group

FROM

employee_data

GROUP BY

employeeID;

Description:

By calculating the average discount rate of the orders sold by each employee, I can derive the cost indicator. (A lower average discount rate is better.)

I found that employee 2 is the most valuable one in terms of the cost index.

Solution2: quality index

CODE:

SELECT

employeeID,

error_rate,

stock_management_efficiency,

NTILE(4) OVER (ORDER BY rank_error_rate + rank_efficiency ASC) AS quality_group

FROM (

SELECT

employeeID,

ROUND(100 * SUM(CASE WHEN shippedDate > requiredDate THEN 1 ELSE 0 END) / COUNT(DISTINCT orderID), 2) AS error_rate,

ROUND(100 * SUM(CASE WHEN (reorderlevel - unitsinstock - unitsonorder) > 0 THEN 1 ELSE 0 END) / COUNT(DISTINCT ProductID), 2) AS stock_management_efficiency,

RANK() OVER (ORDER BY ROUND(100 * SUM(CASE WHEN shippedDate > requiredDate THEN 1 ELSE 0 END) / COUNT(DISTINCT orderID),2) ASC) AS rank_error_rate,

RANK() OVER (ORDER BY ROUND(100 * SUM(CASE WHEN (reorderlevel - unitsinstock - unitsonorder) > 0 THEN 1 ELSE 0 END) / COUNT(DISTINCT ProductID), 2) DESC) AS rank_efficiency

FROM employee_data

GROUP BY employeeID

) AS subquery

ORDER BY quality_group;

Description:

The quality index is derived from two aspects. The first aspect is the order processing error rate, and the second aspect is the efficiency of product inventory management. A lower error rate is better, a higher stock management efficiency is better.

I found that employee 8 is the most valuable one in terms of the quality index.

Solution3: diversity index

CODE:

SELECT
employeeID,
Total_Distinct_Categories,
AverageFreight,
NTILE(4) OVER (ORDER BY Total_Distinct_Categories+AverageFreight DESC) AS diversity_group
FROM (
SELECT
employeeID,
COUNT(DISTINCT num_category) AS Total_Distinct_Categories,
round(AVG(total_freight),2) AS AverageFreight
FROM
employee_data
GROUP BY
employeeID
) AS subquery
ORDER BY diversity_group;

Description:

The diversity index is divided into two aspects: The first aspect is customer diversity. The higher the average freight of each employee's orders, the broader the region in which their customers are located, indicating greater customer diversity. The second aspect is product diversity, measured by the total number of product categories. The higher the total, the more diverse the products sold by the employee.

I found that employee 7 is the most valuable employee in terms of this index.

Solution4: Customer relationship index

CODE:

SELECT

employeeID,

retained_customers,

total_customers,

retention_rate,

NTILE(4) OVER (ORDER BY retention_rate DESC) AS relationship_group

FROM (

SELECT

employeeID,

COUNT(DISTINCT CASE WHEN DATEDIFF(max_order_date, min_order_date) > 0 THEN customerID END) AS retained_customers,

COUNT(DISTINCT customerID) AS total_customers,

ROUND(100 * COUNT(DISTINCT CASE WHEN DATEDIFF(max_order_date, min_order_date) > 0 THEN customerID END) / COUNT(DISTINCT customerID), 2) AS retention_rate

FROM (

SELECT

employeeID,

customerID,

MIN(orderDate) AS min_order_date,

MAX(orderDate) AS max_order_date

FROM

employee_data

GROUP BY

employeeID, customerID

) AS subquery

GROUP BY

employeeID

) AS final_query

ORDER BY relationship_group;

Description:

The customer relationship index is measured by the customer retention rate.

Customer retention rate = number of retained customers / number of total customers

If a customer places orders more than once, they are categorized as a retained customer.

I found that employee 4 is the most valuable employee in terms of this index.

Solution5: productivity index

CODE:

SELECT
employeeID,
total_sales,
total_orders,
productivity_per_day,
NTILE(4) OVER (ORDER BY rank_sales + rank_orders +rank_productivity_per_day ASC) AS productivity_group
FROM (
SELECT
employeeID,
ROUND(SUM(order_amount),2) AS total_sales,
COUNT(DISTINCT orderID) AS total_orders,
ROUND((SUM(order_amount) / DATEDIFF(MAX(orderdate), MIN(orderdate))),2) AS productivity_per_day,
RANK() OVER (ORDER BY ROUND(SUM(order_amount),2) DESC) AS rank_sales,
RANK() OVER (ORDER BY COUNT(DISTINCT orderID) DESC) AS rank_orders,
RANK() OVER (ORDER BY ROUND((SUM(order_amount) / DATEDIFF(MAX(orderdate), MIN(orderdate))),2) DESC) AS rank_productivity_per_day
FROM
employee_data
GROUP BY
employeeID
) AS DerivedTable
ORDER BY
productivity_group;

Description:

The productivity index is measured by total sales, number of completed orders, and productivity rate.
We found that employee7 is the most valuable employee in terms of this index.

Solution of the main problem

CODE:

USE cis467_final_project;
SET sql_mode = (SELECT REPLACE(@@SQL_MODE, "ONLY_FULL_GROUP_BY", ""));
CREATE OR REPLACE VIEW score_view AS
SELECT cost_index.employeeID, cost_index.employee_name, cost_group, quality_group, diversity_group, relationship_group, productivity_group,
NTILE(9) OVER(ORDER BY cost_group, quality_group, diversity_group, relationship_group, productivity_group DESC) AS final_score
FROM
(SELECT
employeeID, employee_name,
round(AVG(discount) * 100,2) AS average_discount_percentage,
NTILE(4) OVER(ORDER BY AVG(discount) ASC) AS cost_group
FROM
employee_data
GROUP BY
employeeID) AS cost_index
JOIN
(SELECT
employeeID,
error_rate,
stock_management_efficiency,
NTILE(4) OVER (ORDER BY rank_error_rate + rank_efficiency ASC) AS quality_group
FROM (
SELECT
employeeID,
ROUND(100 * SUM(CASE WHEN shippedDate > requiredDate THEN 1 ELSE 0 END) / COUNT(DISTINCT orderID), 2) AS error_rate,
ROUND(100 * SUM(CASE WHEN (reorderlevel - unitsinstock - unitsonorder) > 0 THEN 1 ELSE 0 END) / COUNT(DISTINCT ProductID), 2) AS stock_management_efficiency,
RANK() OVER (ORDER BY ROUND(100 * SUM(CASE WHEN shippedDate > requiredDate THEN 1 ELSE 0 END) / COUNT(DISTINCT orderID),2) ASC) AS rank_error_rate,
RANK() OVER (ORDER BY ROUND(100 * SUM(CASE WHEN (reorderlevel - unitsinstock - unitsonorder) > 0 THEN 1 ELSE 0 END) / COUNT(DISTINCT ProductID), 2) DESC) AS rank_efficiency
FROM employee_data
GROUP BY employeeID
) AS subquery
ORDER BY quality_group) AS quality_index ON cost_index.employeeID = quality_index.employeeID
JOIN
(SELECT
employeeID,
Total_Distinct_Categories,
AverageFreight,
NTILE(4) OVER (ORDER BY Total_Distinct_Categories+AverageFreight DESC) AS diversity_group
FROM (
SELECT
employeeID,
COUNT(DISTINCT num_category) AS Total_Distinct_Categories,
round(AVG(total_freight),2) AS AverageFreight
FROM
employee_data
GROUP BY
employeeID
) AS subquery
ORDER BY diversity_group) AS diversity_index ON quality_index.employeeID = diversity_index.employeeID
JOIN
(SELECT
employeeID,
retained_customers,
total_customers,
retention_rate,
NTILE(4) OVER (ORDER BY retention_rate DESC) AS relationship_group
FROM (
SELECT
employeeID,
COUNT(DISTINCT CASE WHEN DATEDIFF(max_order_date, min_order_date) > 0 THEN customerID END) AS retained_customers,
COUNT(DISTINCT customerID) AS total_customers,
ROUND(100 * COUNT(DISTINCT CASE WHEN DATEDIFF(max_order_date, min_order_date) > 0 THEN customerID END) / COUNT(DISTINCT customerID), 2) AS retention_rate
FROM (
SELECT
employeeID,
customerID,
MIN(orderDate) AS min_order_date,
MAX(orderDate) AS max_order_date
FROM
employee_data
GROUP BY
employeeID, customerID
) AS subquery
GROUP BY
employeeID
) AS final_query
ORDER BY relationship_group) AS costomer_relationship_index ON costomer_relationship_index.employeeID = diversity_index.employeeID
JOIN
(SELECT
employeeID,
total_sales,
total_orders,
productivity_per_day,
NTILE(4) OVER (ORDER BY rank_sales + rank_orders +rank_productivity_per_day ASC) AS productivity_group
FROM (
SELECT
employeeID,
ROUND(SUM(order_amount),2) AS total_sales,
COUNT(DISTINCT orderID) AS total_orders,
ROUND((SUM(order_amount) / DATEDIFF(MAX(orderdate), MIN(orderdate))),2) AS productivity_per_day,
RANK() OVER (ORDER BY ROUND(SUM(order_amount),2) DESC) AS rank_sales,
RANK() OVER (ORDER BY COUNT(DISTINCT orderID) DESC) AS rank_orders,
RANK() OVER (ORDER BY ROUND((SUM(order_amount) / DATEDIFF(MAX(orderdate), MIN(orderdate))),2) DESC) AS rank_productivity_per_day
FROM
employee_data
GROUP BY
employeeID
) AS DerivedTable
ORDER BY
productivity_group) AS productivity_index ON productivity_index.employeeID = costomer_relationship_index.employeeID;

SELECT * FROM score_view;

Description:

I combined these five indices and used NTILE to generate a final score to comprehensively evaluate employees. The lower the final score, the better the employee's performance, and thus, they are more valuable.

At the same time, I turned these queries into a view table to facilitate answers to subsequent questions.

I found that in this evaluation system, employee 2 is the most valuable employee in terms of all these five indexes.

Data Visualization

Create Tableau individual visualizations (graphs) on data warehouse with valuable information to present findings.

I used a heat map to represent the employee's score in five indexes.

The color scale on the right side, labeled as "Measure Values", ranges from 1.000 to 4.000. It suggests that the scores for each index can range between these values, the darker the color is, the lower the employee’s index score, thus the better performance he has.

The company can assign tasks to employees based on their performance in each index. For instance, employees with IDs 3 and 4 excel in maintaining customer relationships, so they can be tasked with expanding to new customers and nurturing relationships with existing clients.

I used a bar chart to represent the employee’s final score.

The color scale on the right side, labeled as "final score", ranges from 1 to 9. It suggests that the final score range. The darker the color is, the lower the employee’s final score, thus the better performance he has.

We can tell from the result that the employee Andrew Fuller has the best performance, and Rober King has the worst performance.

The company can reward the best-performing employees and issue warnings to the least-performing ones.

SQL Project: Employee Value Evaluation System

Introduction

Data Processing Design and create a data warehouse for the provided database.

Result

Data Analysis Create queries on the data warehouse.

Main question: who is the most valuable employee?

Solution1: cost index

Solution2: quality index

Solution3: diversity index

Solution4: Customer relationship index

Solution5: productivity index

Solution of the main problem

Data Visualization

SQL Project:
Employee Value Evaluation System