top of page

SQL Project:
Employee Value Evaluation System

Introduction

Introduction

The provided database contains the 11 tables (the database schema is below), with transactional data related to some company operations.

image.png

In this system,  I have divided it into five criteria to comprehensively assess the value of an employee. These five criteria are the productivity index, quality index, market expansion and diversification index(diversity index), cost index, and customer relationship index.

Data Processing Design and create a data warehouse for the provided database.

Data Processing

In this part, I combined data from eight tables

CODE:

USE cis467_final_project;

SET sql_mode = (SELECT REPLACE(@@SQL_MODE, "ONLY_FULL_GROUP_BY", ""));

 

CREATE OR REPLACE VIEW employee_data AS

SELECT

    e.employeeID,

    CONCAT(e.FirstName, ' ', e.LastName) AS employee_name,

    e.title,

    e.reportsTo,

    o.orderID,

    ROUND(SUM(od.UnitPrice * od.Quantity * (1 - od.discount)), 2) AS order_amount,

    COUNT(DISTINCT o.orderID) AS order_number,

            STR_TO_DATE(o.shippedDate, '%m/%d/%Y') AS shippedDate,

            STR_TO_DATE(o.requireddate, '%m/%d/%Y') AS requiredDate,

            STR_TO_DATE(o.orderDate, '%m/%d/%Y') AS orderDate,

    CASE

        WHEN DATEDIFF(STR_TO_DATE(o.requiredDate, '%m/%d/%Y'), STR_TO_DATE(o.shippedDate, '%m/%d/%Y')) > 0 THEN 0

        ELSE DATEDIFF(STR_TO_DATE(o.shippedDate, '%m/%d/%Y'), STR_TO_DATE(o.requiredDate, '%m/%d/%Y'))

    END AS delay_day,

    SUM(p.discontinued) AS missend,

    p.reorderlevel,

    p.unitsinstock,

    p.unitsonorder,

    SUM(o.freight) AS total_freight,

    o.shipcountry,

    COUNT(DISTINCT c.CategoryID) AS num_category,

    ROUND(AVG(od.discount), 2) AS discount,

    o.customerID,

    r.regionID,

    r.regiondescription,

    p.productID,

    c.categoryname,

    t.TerritoryID,

    t.TerritoryDescription

FROM

    employees e

JOIN orders o ON e.EmployeeID = o.employeeID

JOIN order_details od ON o.orderID = od.orderID

JOIN products p ON p.ProductID = od.productID

JOIN categories c ON p.CategoryID = c.CategoryID

LEFT JOIN employeeterritories et ON e.employeeID = et.employeeID

LEFT JOIN territories t ON et.TerritoryID = t.TerritoryID

LEFT JOIN region r ON t.regionID = r.regionID

GROUP BY

    e.employeeID,

    o.orderID

ORDER BY e.employeeID;

select * from employee_data LIMIT 25;

Result
image.png
image.png

Data Analysis Create queries on the data warehouse.

Data Analysis

I used complex queries including Joins, a Group By, UNION elements, a subquery, and some aggregate functions and summary calculations.

Main question: who is the most valuable employee?
Solution1: cost index
Solution1: cost index

CODE:

SELECT

    employeeID,

    round(AVG(discount) * 100,2) AS average_discount_percentage,

    NTILE(4) OVER(ORDER BY AVG(discount) ASC) AS cost_group

FROM

    employee_data

GROUP BY

employeeID;

image.png

Description:

By calculating the average discount rate of the orders sold by each employee, I can derive the cost indicator. (A lower average discount rate is better.)

I found that employee 2 is the most valuable one in terms of the cost index.

Solution2: quality index
Solution2: quality index

CODE:

SELECT

    employeeID,

    error_rate,

    stock_management_efficiency,

    NTILE(4) OVER (ORDER BY rank_error_rate + rank_efficiency ASC) AS quality_group

FROM (

    SELECT

        employeeID,

        ROUND(100 * SUM(CASE WHEN shippedDate > requiredDate THEN 1 ELSE 0 END) / COUNT(DISTINCT orderID), 2) AS error_rate,

        ROUND(100 * SUM(CASE WHEN (reorderlevel - unitsinstock - unitsonorder) > 0 THEN 1 ELSE 0 END) / COUNT(DISTINCT ProductID), 2) AS stock_management_efficiency,

                        RANK() OVER (ORDER BY ROUND(100 * SUM(CASE WHEN shippedDate > requiredDate THEN 1 ELSE 0 END) / COUNT(DISTINCT orderID),2) ASC) AS rank_error_rate,

        RANK() OVER (ORDER BY ROUND(100 * SUM(CASE WHEN (reorderlevel - unitsinstock - unitsonorder) > 0 THEN 1 ELSE 0 END) / COUNT(DISTINCT ProductID), 2) DESC) AS rank_efficiency

    FROM employee_data

    GROUP BY employeeID

) AS subquery

ORDER BY quality_group;

image.png

Description:

The quality index is derived from two aspects. The first aspect is the order processing error rate, and the second aspect is the efficiency of product inventory management. A lower error rate is better, a higher stock management efficiency is better.

I found that employee 8 is the most valuable one in terms of the quality index.

Solution3: diversity index
Solution3: diversity index

CODE:

SELECT 
    employeeID,
    Total_
Distinct_Categories,
    AverageFreight,
    NTILE(4) OVER (ORDER BY Total_Distinct_Categories+AverageFreight DESC) AS diversity_group
FROM (
    SELECT 
        employeeID,
        COUNT(DISTINCT num_category) AS Total_Distinct_Categories,
        round(AVG(total_freight),2) AS AverageFreight
    FROM 
        employee_data
    GROUP BY 
        employeeID
) AS subquery
ORDER BY diversity_group;

image.png

Description:

The diversity index is divided into two aspects: The first aspect is customer diversity. The higher the average freight of each employee's orders, the broader the region in which their customers are located, indicating greater customer diversity. The second aspect is product diversity, measured by the total number of product categories. The higher the total, the more diverse the products sold by the employee.

I found that employee 7 is the most valuable employee in terms of this index.

Solution4: Customer relationship index
Customer relationship index

CODE:

SELECT

    employeeID,

    retained_customers,

    total_customers,

    retention_rate,

    NTILE(4) OVER (ORDER BY retention_rate DESC) AS relationship_group

FROM (

    SELECT

        employeeID,

        COUNT(DISTINCT CASE WHEN DATEDIFF(max_order_date, min_order_date) > 0 THEN customerID END) AS retained_customers,

        COUNT(DISTINCT customerID) AS total_customers,

        ROUND(100 * COUNT(DISTINCT CASE WHEN DATEDIFF(max_order_date, min_order_date) > 0 THEN customerID END) / COUNT(DISTINCT customerID), 2) AS retention_rate

    FROM (

        SELECT

            employeeID,

            customerID,

            MIN(orderDate) AS min_order_date,

            MAX(orderDate) AS max_order_date

        FROM

            employee_data

        GROUP BY

            employeeID, customerID

    ) AS subquery

    GROUP BY

        employeeID

) AS final_query

ORDER BY relationship_group;

image.png

Description:

The customer relationship index is measured by the customer retention rate.

Customer retention rate = number of retained customers / number of total customers

If a customer places orders more than once, they are categorized as a retained customer.

I found that employee 4 is the most valuable employee in terms of this index.

Solution5: productivity index
Solution5: productivity index

CODE:

SELECT
    employeeID,
    total_sales,
    total_orders,
    productivity_per_day,
    NTILE(4) OVER (ORDER BY rank_sales + rank_orders +rank_productivity_per_day ASC) AS productivity_group
FROM (
    SELECT 
        employeeID,
        ROUND(SUM(order_amount),2) AS total_sales,  
        COUNT(DISTINCT orderID) AS total_orders,
        ROUND((SUM(order_amount) / DATEDIFF(MAX(orderdate), MIN(orderdate))),2) AS productivity_per_day,
        RANK() OVER (ORDER BY ROUND(SUM(order_amount),2) DESC) AS rank_sales,
        RANK() OVER (ORDER BY COUNT(DISTINCT orderID) DESC) AS rank_orders,
        RANK() OVER (ORDER BY ROUND((SUM(order_amount) / DATEDIFF(MAX(orderdate), MIN(orderdate))),2) DESC) AS rank_productivity_per_day
    FROM 
        employee_data
    GROUP BY 
        employeeID
) AS DerivedTable
ORDER BY 
    productivity_group;

image.png

Description:

The productivity index is measured by total sales, number of completed orders, and productivity rate.
We found that employee7 is the most valuable employee in terms of this index.

Solution of the main problem
Solution of the main problem

CODE:

USE cis467_final_project;
SET sql_mode = (SELECT REPLACE(@@SQL_MODE, "ONLY_FULL_GROUP_BY", ""));
CREATE OR REPLACE VIEW score_view AS
SELECT cost_index.employeeID, cost_index.employee_name, cost_group, quality_group, diversity_group, relationship_group, productivity_group,
NTILE(9) OVER(ORDER BY cost_group, quality_group, diversity_group, relationship_group, productivity_group DESC) AS final_score
FROM
(SELECT 
    employeeID, employee_name,
    round(AVG(discount) * 100,2) AS average_discount_percentage,
    NTILE(4) OVER(ORDER BY AVG(discount) ASC) AS cost_group
FROM 
    employee_data
GROUP BY 
    employeeID) AS cost_index 
JOIN 
(SELECT 
    employeeID,
    error_rate,
    stock_management_efficiency,
    NTILE(4) OVER (ORDER BY rank_error_rate + rank_efficiency ASC) AS quality_group
FROM (
    SELECT 
        employeeID,
        ROUND(100 * SUM(CASE WHEN shippedDate > requiredDate THEN 1 ELSE 0 END) / COUNT(DISTINCT orderID), 2) AS error_rate,
        ROUND(100 * SUM(CASE WHEN (reorderlevel - unitsinstock - unitsonorder) > 0 THEN 1 ELSE 0 END) / COUNT(DISTINCT ProductID), 2) AS stock_management_efficiency,
       RANK() OVER (ORDER BY ROUND(100 * SUM(CASE WHEN shippedDate > requiredDate THEN 1 ELSE 0 END) / COUNT(DISTINCT orderID),2) ASC) AS rank_error_rate,
        RANK() OVER (ORDER BY ROUND(100 * SUM(CASE WHEN (reorderlevel - unitsinstock - unitsonorder) > 0 THEN 1 ELSE 0 END) / COUNT(DISTINCT ProductID), 2) DESC) AS rank_efficiency
    FROM employee_data
    GROUP BY employeeID
) AS subquery
ORDER BY quality_group) AS quality_index ON cost_index.employeeID = quality_index.employeeID
JOIN
(SELECT 
    employeeID,
    Total_Distinct_Categories,
    AverageFreight,
    NTILE(4) OVER (ORDER BY Total_Distinct_Categories+AverageFreight DESC) AS diversity_group
FROM (
    SELECT 
        employeeID,
        COUNT(DISTINCT num_category) AS Total_Distinct_Categories,
        round(AVG(total_freight),2) AS AverageFreight
    FROM 
        employee_data
    GROUP BY 
        employeeID
) AS subquery
ORDER BY diversity_group) AS diversity_index ON quality_index.employeeID = diversity_index.employeeID
JOIN
(SELECT 
    employeeID,
    retained_customers,
    total_customers,
    retention_rate,
    NTILE(4) OVER (ORDER BY retention_rate DESC) AS relationship_group
FROM (
    SELECT 
        employeeID,
        COUNT(DISTINCT CASE WHEN DATEDIFF(max_order_date, min_order_date) > 0 THEN customerID END) AS retained_customers,
        COUNT(DISTINCT customerID) AS total_customers,
        ROUND(100 * COUNT(DISTINCT CASE WHEN DATEDIFF(max_order_date, min_order_date) > 0 THEN customerID END) / COUNT(DISTINCT customerID), 2) AS retention_rate
    FROM (
        SELECT 
            employeeID,
            customerID,
            MIN(orderDate) AS min_order_date,
            MAX(orderDate) AS max_order_date
        FROM 
            employee_data
        GROUP BY 
            employeeID, customerID
    ) AS subquery
    GROUP BY 
        employeeID
) AS final_query
ORDER BY relationship_group) AS costomer_relationship_index ON costomer_relationship_index.employeeID = diversity_index.employeeID
JOIN
(SELECT
    employeeID,
    total_sales,
    total_orders,
    productivity_per_day,
    NTILE(4) OVER (ORDER BY rank_sales + rank_orders +rank_productivity_per_day ASC) AS productivity_group
FROM (
    SELECT 
        employeeID,
        ROUND(SUM(order_amount),2) AS total_sales,  
        COUNT(DISTINCT orderID) AS total_orders,
        ROUND((SUM(order_amount) / DATEDIFF(MAX(orderdate), MIN(orderdate))),2) AS productivity_per_day,
        RANK() OVER (ORDER BY ROUND(SUM(order_amount),2) DESC) AS rank_sales,
        RANK() OVER (ORDER BY COUNT(DISTINCT orderID) DESC) AS rank_orders,
        RANK() OVER (ORDER BY ROUND((SUM(order_amount) / DATEDIFF(MAX(orderdate), MIN(orderdate))),2) DESC) AS rank_productivity_per_day
    FROM 
        employee_data
    GROUP BY 
        employeeID
) AS DerivedTable
ORDER BY 
    productivity_group) AS productivity_index ON productivity_index.employeeID = costomer_relationship_index.employeeID;
    
SELECT * FROM score_view;

image.png

Description:

I combined these five indices and used NTILE to generate a final score to comprehensively evaluate employees. The lower the final score, the better the employee's performance, and thus, they are more valuable.

At the same time, I turned these queries into a view table to facilitate answers to subsequent questions.

I found that in this evaluation system, employee 2 is the most valuable employee in terms of all these five indexes.

Data Visualization

​Create Tableau individual visualizations (graphs) on data warehouse with valuable information to present findings.

Data Visualization(Tableau)
image.png

I used a heat map to represent the employee's score in five indexes.

The color scale on the right side, labeled as "Measure Values", ranges from 1.000 to 4.000. It suggests that the scores for each index can range between these values, the darker the color is, the lower the employee’s index score, thus the better performance he has.

The company can assign tasks to employees based on their performance in each index. For instance, employees with IDs 3 and 4 excel in maintaining customer relationships, so they can be tasked with expanding to new customers and nurturing relationships with existing clients.

image.png

I used a bar chart to represent the employee’s final score.

The color scale on the right side, labeled as "final score", ranges from 1 to 9. It suggests that the final score range. The darker the color is, the lower the employee’s final score, thus the better performance he has.

We can tell from the result that the employee Andrew Fuller has the best performance, and Rober King has the worst performance.

The company can reward the best-performing employees and issue warnings to the least-performing ones.

bottom of page