Ramgopal Prajapat:

Learnings and Views

SQL Scenarios for Practice and Learning

By: Ram on Jan 28, 2021

In the previous blog, we have covered the MYSQL database design for the retail scenario, create a table structure, and import CSV data into these tables.

Also, we have covered 10 SQL scenarios to learn the basics of SQL programming for Data Analytics. In continuation of that, we are adding 11 more scenarios for further learning.

SQL Concepts for these scenarios are related below topics

  • CASE WHEN and Aggregation Functions together
  • GROUP BY, count function and CASE WHEN
  • LIKE Operator
  • UPPER function and LIKE Operator
  • Full-Text Search and Match/Against
  • Joins in SQL – Inner Join, Left Join, and Right Join
  • Mathematical Functions – log() and sqrt() functions
  • String Manipulations – Length() and Substr() functions

 

 

Scenario 11: Summarize (count or total) information based on a condition

For example, we want to calculate the number of items sold on an offer and the total discount offer.

SQL Code:

SELECT

SUM(CASE WHEN (mrp-selling_price )>0 THEN 1 ELSE 0 END) as products_on_offer,

SUM(CASE WHEN (mrp-selling_price )>0 THEN (mrp-selling_price ) else 0 end) as total_discount_amount

FROM order_items

;

OUTPUT

 

Explanation

Aggregate function SUM helps in getting the total for input values. In the about, we want to calculate the number of products is on offer. First, we assign value 1 if a product is on an offer otherwise 0, then we are summing the 1’s to find the total count of products on offer.

In the second part of the aggregation, we want to calculate the total discount amount. Based on the condition on MRP and Selling price, we calculate the discount amount for each product, and then we are summing up using the SUM function.

 

Scenario 12: Create a variable with multiple values based on conditions and aggregate values for each level

For example, there are different value for payment mode, we want to assign a label to each of the mode value.

SQL Code:

SELECT

CASE WHEN payment_mode=1 THEN 'Cash'

           WHEN payment_mode=2 THEN 'Card'

           WHEN payment_mode=3 THEN  'Sodexo'

           WHEN payment_mode=4 THEN  'UPI'

           ELSE 'Other'

         END as payment_mode,

      COUNT(*) as payment_count

from order_payments

GROUP BY 1;

OUTPUT

 

Explanation

The case when statement can help us assign values to variable values based on a condition. In this example, the input variable payment_mode takes values 1 to 4, and the meaning of these values are 1 – Cash, 2- Card, 3-Sodexo, and 4-UPI. We need to assign these values and then count occurrences. We can group into identical values and then the count function gives the use count of rows for each of the payment_mode values.

 

Scenario 13: Find values/rows between two dates or two numeric values

For example, find orders that are of values between 100 and 200 amount.

SQL Code:

SELECT *

FROM orders

WHERE amount BETWEEN 50 AND 100;

 

OUTPUT

Explanation

BETWEEN operator selects values within a given range.  In this example, we want to pull orders which has a value greater than and equal to 50 and less than equal to 100. This can be given using BETWEEN /AND

This can also be done using

SQL CODE

SELECT *

FROM orders

WHERE amount >=50

                AND AMOUNT<= 100;

OUTPUT

Scenario 14: Selecting rows based on string value start with a particular character

For example, find employees where employee names are starting with M.

SQL Code:

SELECT *

FROM employees

WHERE name LIKE "M%";

 

OUTPUT

Explanation

Character string value comparison is case sensitive. The LIKE operator in the WHERE clause can be used to search for a specified pattern.  In the above example, we are filtering employee names starting with “M”. % symbol indicates anything after M will be selected.

We can also find all the employees starting with any of the two characters and below is the code.

SQL Code

select *

from employees

where (name like "M%") or (name like "S%");

;

OUTPUT

 

Scenario 15: Finding all values where field containing a word

For example, we want to find all items (products) where the product name contains the “women” word.

We can do multiple ways.

SQL Code:

SELECT * FROM items

WHERE UPPER(name) LIKE "%WOMEN%";

OUTPUT

Explanation

We can use % before and after a word to do a comparison of the string values.  But women can come with capital and small case. To not miss any of the relevant transactions, we can change the case to upper case using the UPPER () function before comparing the string values.

The LIKE the operator generally doesn't use any index and for the large size of the data, the query can be costly. If the text column is made as Full-Text Search.

CREATE FULLTEXT INDEX `idx_items_name`  ON `dmart`.`items` (name) COMMENT '' ALGORITHM DEFAULT LOCK DEFAULT;

 

When a column is defined as Full-Text Search and if we are using the below method of search a text string, it will be faster/more efficient than using LIKE with wildcarding.

SELECT * FROM items WHERE MATCH(name)

AGAINST('women' IN NATURAL LANGUAGE MODE)

;

OUTPUT

Joins in MYSQL or SQL

So, all the scenarios discussed are related to manipulations with a single table. In a real project, we need to get data from multiple tables. So, for example in the current retail scenario, we want to see the number of distinct orders for each of the products. We need to join, items, and order tables to get the information required.

A JOIN clause can be used to combine two or more tables, based on a related column or columns between input tables.   By default, the combining of rows from input tables are done based on Cartesian products – each row of one table is linked to the other table. If any joining condition is applied, it filters out the rows that are not meeting the conditions.

 

Input Table: Customers

  select * from retail.customers;

 

Input Table: Products

 

Joining without any condition

select * from retail.customers join  retail.products;

 

Matching Tables based on a column

select * from retail.customers c join

    retail.products p on c.custid=p.custid;

OUTPUT

The matching row condition is given using the ON clause. In both input tables, the custid column is available and can be used for matching the rows. If we do that 5 rows have the same custid values.

This join is called inner join or natural join. The above code is the same as

SELECT  * FROM retail.customers A  inner join retail.Products B

                       ON A.custid =B.custid

                       ;

 

What should we do if we want to keep all the custid value of table customer's table?

SQL Code:

 SELECT  * FROM retail.customers A  left join retail.Products B

                       ON A.custid =B.custid

                       ;

OUTPUT

This table for custid=1002 is not available in the products table but is still kept. This is called left join as it keeps all the rows of the table in the left even if not available in the other table.

What should we do if we want to keep all the custid value of table products table?

SQL Code

 SELECT  * FROM retail.customers A  right join retail.Products B

                       ON A.custid =B.custid

                       ;

OUTPUT

 

Scenario 16: Finding item name, quantity, selling price, and the order number of all the items on a particular order id (say order id =5)

For example, in our scenario, the item name is available in the items table, and the order number is available in the order_items table, so we need to combine these tables and get all the information.

SQL Code:

select b.order_id as 'Order NO',

        c.name as 'Product',

       b.qty as 'Quantity',

       b.selling_price as 'Selling Price'     

from order_items b,

     items c

where b.item_id = c.id

and      b.order_id=5;

 

OUTPUT

 

Scenario 17: Retrieve data from multiple tables and aggregate data

For example, get Cashier Name when showing the count of orders by Cashiers

The cashier's name is available on the employee's table. We need to join the orders and employees table and then aggregate the details for each of the cashiers. Or we can aggregate or count orders for each of the cashiers and then get the names by joining with employees table.

SQL Code:

select a.*, b.name as cashier_name

from

   (select cashier_id, count(*) as orders

    from orders

    group by 1

    )  a

    left join

    employees b

    on a.cashier_id = b.id

;

OUTPUT

 

Scenario 18: Group by multiple columns and get the summary statistics

For example, get order summary statistics by store and cashier.

we can aggregate the data by multiple columns. In this scenario, we need to aggregate first by store and then by cashiers. Also, we need to calculate the number of orders and the total value of these orders as well.

SQL Code:

select s.name as store,

       c.name as Cashier,

       sum(o.amount) as amount,

       sum(o.qty) as qnty,

       count(*) as orders

from orders o inner join

     employees c    on o.cashier_id=c.id

     inner join

     stores s      on o.store_id=s.id

group by 1,2    

;

OUTPUT

Scenario 19: Create new aggregate tables

For example, we want to get the aggregated data stored in a table and apply variable transformations to the order quantity columns.

SQL Code:

create temporary table tm_store_cashier_orders as

select s.name as store,

       c.name as Cashier,

       sum(o.amount) as amount,

       sum(o.qty) as qnty,

       count(*) as orders

from orders o inner join

     employees c    on o.cashier_id=c.id

     inner join

     stores s      on o.store_id=s.id

group by 1,2    

;

 

select * from tm_store_cashier_orders;

OUTPUT

Explanation

We can create a new table using CREATE clause. In the above code, we are creating a TEMPORARY table. A temporary table is a special type of table that allows you to store a temporary result. So, we can use queries to manipulate the data.

 

Scenario 20: Mathematical Transformations using SQL

For example, we may want to find log and square root transformations of order quantity.

These transformations can help in improving the accuracy of the machine learning models.

SQL Code:

select a.*,

      log(qnty) as log_qnty,

      sqrt(qnty) as sqrt_qnty

from tm_store_cashier_orders a;

OUTPUT

 

We can round off the values and using the round() function.

select a.*,

      round(log(qnty),2) as log_qnty,

      round(sqrt(qnty),2) as sqrt_qnty

from tm_store_cashier_orders a;

 

Scenario 21: String Manipulations – Extracting Part of the String

For example, we want to extract the location of the store from the name.

SQL Code:

select a.*,

       substr(store, 8, length(store)) as location

from tm_store_cashier_orders a;

OUTPUT

Explanation

The length function helps us get the number of characters of the string.  And substr() helps us extract part of the input string based on starting position and the last string position.

 

Leave a comment