GitHub - ManuelGuerra1987/sql-exercises: SQL exercises from DataLemur

SQL Exercises

Highest Grossing Items

Assume you're given a table containing data on Amazon customers and their spending on products in different category, write a query to identify the top two highest-grossing products within each category in the year 2022. The output should include the category, product, and total spend.

product_spend Example Input:

Category	Product	User ID	Spend	Transaction Date
Appliance	Refrigerator	165	246.00	12/26/2021 12:00:00
Appliance	Refrigerator	123	299.99	03/02/2022 12:00:00
Appliance	Washing Machine	123	219.80	03/02/2022 12:00:00
Electronics	Vacuum	178	152.00	04/05/2022 12:00:00
Electronics	Wireless Headset	156	249.90	07/08/2022 12:00:00
Electronics	Vacuum	145	189.00	07/15/2022 12:00:00

Example Output:

Category	Product	Total Spend
Appliance	Refrigerator	299.99
Appliance	Washing Machine	219.80
Electronics	Vacuum	341.00
Electronics	Wireless Headset	249.90

Query:

WITH cte AS (

  SELECT * FROM product_spend
  WHERE EXTRACT(YEAR FROM transaction_date) = 2022
),

cte2 AS(

  SELECT
    category,
    product,
    SUM(spend) as total_spend
  
  FROM cte 
  GROUP BY category,product

),

cte3 AS(

  SELECT 
  *,
  RANK() OVER (PARTITION BY category ORDER BY total_spend DESC) AS rank
  
  FROM cte2

)

SELECT category,product,total_spend FROM cte3
WHERE rank = 1 OR rank = 2;

step by step

cte2 output:

WITH cte AS (

  SELECT * FROM product_spend
  WHERE EXTRACT(YEAR FROM transaction_date) = 2022
),

cte2 AS(

  SELECT
    category,
    product,
    SUM(spend) as total_spend
  
  FROM cte 
  GROUP BY category,product
)

Category	Product	Total Spend
Appliance	Refrigerator	299.99
Electronics	3.5mm Headphone Jack	7.99
Appliance	Washing Machine	439.80
Electronics	Computer Mouse	45.00
Electronics	Vacuum	486.66
Appliance	Microwave	49.99
Electronics	Wireless Headset	467.89

cte3 output:

cte3 AS(

  SELECT 
  *,
  RANK() OVER (PARTITION BY category ORDER BY total_spend DESC) AS rank
  
  FROM cte2

)

Category	Product	Total Spend	Rank
Appliance	Washing Machine	439.80	1
Appliance	Refrigerator	299.99	2
Appliance	Microwave	49.99	3
Electronics	Vacuum	486.66	1
Electronics	Wireless Headset	467.89	2
Electronics	Computer Mouse	45.00	3
Electronics	3.5mm Headphone Jack	7.99	4

final query output:

SELECT category,product,total_spend FROM cte3
WHERE rank = 1 OR rank = 2;

Category	Product	Total Spend
Appliance	Washing Machine	439.80
Appliance	Refrigerator	299.99
Electronics	Vacuum	486.66
Electronics	Wireless Headset	467.89

Top 5 Artists

Assume there are three Spotify tables: artists, songs, and global_song_rank, which contain information about the artists, songs, and music charts, respectively.

Write a query to find the top 5 artists whose songs appear most frequently in the Top 10 of the global_song_rank table. Display the top 5 artist names in ascending order, along with their song appearance ranking.

If two or more artists have the same number of song appearances, they should be assigned the same ranking, and the rank numbers should be continuous (i.e. 1, 2, 2, 3, 4, 5)

artists Example Input:

Artist ID	Artist Name	Label Owner
101	Ed Sheeran	Warner Music Group
120	Drake	Warner Music Group
125	Bad Bunny	Rimas Entertainment

songs Example Input:

Song ID	Artist ID	Name
55511	101	Perfect
45202	101	Shape of You
22222	120	One Dance
19960	120	Hotline Bling

global_song_rank Example Input:

Day	Song ID	Rank
1	45202	5
3	45202	2
1	19960	3
9	19960	15

Query:

WITH cte AS (

SELECT 
  artists.artist_name,
  songs.name,
  global_song_rank.rank
FROM global_song_rank
JOIN songs ON global_song_rank.song_id = songs.song_id
JOIN artists ON artists.artist_id = songs.artist_id


),

cte2 AS (

SELECT * FROM cte
WHERE rank <= 10


),

cte3 AS (

SELECT artist_name,count(rank) AS counter 
FROM cte2
GROUP BY artist_name

),

cte4 AS (

SELECT 
  *,
  DENSE_RANK() OVER (ORDER BY counter DESC) as artist_rank

  FROM cte3
)

SELECT artist_name,artist_rank 
FROM cte4
WHERE artist_rank <= 5

step by step

cte2 output:

WITH cte AS (

SELECT 
  artists.artist_name,
  songs.name,
  global_song_rank.rank
FROM global_song_rank
JOIN songs ON global_song_rank.song_id = songs.song_id
JOIN artists ON artists.artist_id = songs.artist_id


),

cte2 AS (

SELECT * FROM cte
WHERE rank <= 10


)

Artist Name	Song Name	Rank
Ed Sheeran	Shape of You	2
Ed Sheeran	Shape of You	2
Ed Sheeran	Shape of You	6
Ed Sheeran	Perfect	2
Drake	Hotline Bling	3
Bad Bunny	Mia	9
Bad Bunny	Mia	7
Bad Bunny	Mia	7

cte3 output:

cte3 AS (

SELECT artist_name,count(rank) AS counter 
FROM cte2
GROUP BY artist_name

)

Artist Name	Counter
Chris Brown	1
Lady Gaga	3
Katy Perry	2
Adele	5
Taylor Swift	8
Drake	7
Bad Bunny	7
Ariana Grande	1
Ed Sheeran	5

cte4 output:

cte4 AS (

SELECT 
  *,
  DENSE_RANK() OVER (ORDER BY counter DESC) as artist_rank

  FROM cte3
)

Artist Name	Counter	Artist Rank
Taylor Swift	8	1
Bad Bunny	7	2
Drake	7	2
Ed Sheeran	5	3
Adele	5	3
Lady Gaga	3	4
Katy Perry	2	5
Ariana Grande	1	6
Chris Brown	1	6

final query output:

SELECT artist_name,artist_rank 
FROM cte4
WHERE artist_rank <= 5

Artist Name	Artist Rank
Taylor Swift	1
Bad Bunny	2
Drake	2
Ed Sheeran	3
Adele	3
Lady Gaga	4
Katy Perry	5

Card Launch Success

Link

Your team at JPMorgan Chase is soon launching a new credit card. You are asked to estimate how many cards you'll issue in the first month.

Before you can answer this question, you want to first get some perspective on how well new credit card launches typically do in their first month.

Write a query that outputs the name of the credit card, and how many cards were issued in its launch month. The launch month is the earliest record in the monthly_cards_issued table for a given card. Order the results starting from the biggest issued amount.

monthly_cards_issued Example Input:

Issue Month	Issue Year	Card Name	Issued Amount
1	2021	Chase Sapphire Reserve	170000
2	2021	Chase Sapphire Reserve	175000
3	2021	Chase Sapphire Reserve	180000
3	2021	Chase Freedom Flex	65000
4	2021	Chase Freedom Flex	70000

Query:

WITH cte AS (
  SELECT 
    card_name,
    issued_amount,
    issue_month,
    issue_year,
    MAKE_DATE(issue_year, issue_month, 1) as date
  FROM monthly_cards_issued
),

cte2 AS (
  SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY card_name ORDER BY date ASC) as row_num
  FROM cte
)

SELECT 
  card_name,
  issued_amount
FROM cte2
WHERE row_num = 1
ORDER BY issued_amount DESC;

step by step

cte output:

WITH cte AS (

SELECT 

  card_name,
  issued_amount,
  issue_month,
  issue_year,
  MAKE_DATE(issue_year, issue_month, 1) as date

  
FROM monthly_cards_issued

)

Card Name	Issued Amount	Issue Month	Issue Year	Date
Chase Sapphire Reserve	160000	12	2020	12/01/2020 00:00:00
Chase Sapphire Reserve	170000	1	2021	01/01/2021 00:00:00
Chase Sapphire Reserve	175000	2	2021	02/01/2021 00:00:00
Chase Sapphire Reserve	180000	3	2021	03/01/2021 00:00:00
Chase Freedom Flex	55000	1	2021	01/01/2021 00:00:00
Chase Freedom Flex	60000	2	2021	02/01/2021 00:00:00
Chase Freedom Flex	65000	3	2021	03/01/2021 00:00:00
Chase Freedom Flex	70000	4	2021	04/01/2021 00:00:00
Chase Sapphire Reserve	150000	11	2020	11/01/2020 00:00:00

cte2 output:

cte2 AS (
  SELECT 
    *,
    ROW_NUMBER() OVER (PARTITION BY card_name ORDER BY date ASC) as row_num
  FROM cte
)

Card Name	Issued Amount	Issue Month	Issue Year	Date	Row Num
Chase Freedom Flex	55000	1	2021	01/01/2021 00:00:00	1
Chase Freedom Flex	60000	2	2021	02/01/2021 00:00:00	2
Chase Freedom Flex	65000	3	2021	03/01/2021 00:00:00	3
Chase Freedom Flex	70000	4	2021	04/01/2021 00:00:00	4
Chase Sapphire Reserve	150000	11	2020	11/01/2020 00:00:00	1
Chase Sapphire Reserve	160000	12	2020	12/01/2020 00:00:00	2
Chase Sapphire Reserve	170000	1	2021	01/01/2021 00:00:00	3
Chase Sapphire Reserve	175000	2	2021	02/01/2021 00:00:00	4

final query output:

SELECT 
  card_name,
  issued_amount
FROM cte2
WHERE row_num = 1
ORDER BY issued_amount DESC

Card Name	Issued Amount
Chase Sapphire Reserve	150000
Chase Freedom Flex	55000

International Call Percentage

Link

A phone call is considered an international call when the person calling is in a different country than the person receiving the call.

What percentage of phone calls are international? Round the result to 1 decimal.

phone_calls Example Input:

Caller ID	Receiver ID	Call Time
1	2	2022-07-04 10:13:49
1	5	2022-08-21 23:54:56
5	1	2022-05-13 17:24:06
5	6	2022-03-18 12:11:49

phone_info Example Input:

Caller ID	Country ID	Network	Phone Number
1	US	Verizon	+1-212-897-1964
2	US	Verizon	+1-703-346-9529
3	US	Verizon	+1-650-828-4774
4	US	Verizon	+1-415-224-6663
5	IN	Vodafone	+91 7503-907302
6	IN	Vodafone	+91 2287-664895

Query:

WITH cte AS (

SELECT 

info1.country_id as caller,
info2.country_id as receiver

FROM phone_calls 
JOIN phone_info info1 ON phone_calls.caller_id = info1.caller_id
JOIN phone_info info2 ON phone_calls.receiver_id = info2.caller_id

),

cte2 AS (

SELECT
  *,
  CASE WHEN cte.caller != cte.receiver THEN 1 ELSE 0 END AS inter
  
  FROM cte

)

SELECT ROUND(CAST(SUM(inter) * 100.0 / COUNT(*) AS NUMERIC), 1) AS international_calls_pct 
FROM cte2

step by step

cte output:

WITH cte AS (

SELECT 

info1.country_id as caller,
info2.country_id as receiver

FROM phone_calls 
JOIN phone_info info1 ON phone_calls.caller_id = info1.caller_id
JOIN phone_info info2 ON phone_calls.receiver_id = info2.caller_id

)

Caller	Receiver
UK	UK
IN	UK
US	UK
US	UK
DE	DE
DE	DE

cte2 output:

cte2 AS (

SELECT
  *,
  CASE WHEN cte.caller != cte.receiver THEN 1 ELSE 0 END AS inter
  
  FROM cte

)

Caller	Receiver	Inter
UK	UK	0
IN	UK	1
US	UK	1
US	UK	1
DE	DE	0
DE	DE	0
DE	DE	0
DE	DE	0
IN	DE	1
US	US	1
UK	US	1

final query output:

SELECT ROUND(CAST(SUM(inter) * 100.0 / COUNT(*) AS NUMERIC), 1) as international_calls_pct 
FROM cte2;

International Calls Percentage
54.5

Top Three Salaries

Link

As part of an ongoing analysis of salary distribution within the company, your manager has requested a report identifying high earners in each department. A 'high earner' within a department is defined as an employee with a salary ranking among the top three salaries within that department.

You're tasked with identifying these high earners across all departments. Write a query to display the employee's name along with their department name and salary. In case of duplicates, sort the results of department name in ascending order, then by salary in descending order. If multiple employees have the same salary, then order them alphabetically.

Note: Ensure to utilize the appropriate ranking window function to handle duplicate salaries effectively.

employee Example Input:

employee_id	name	salary	department_id	manager_id
1	Emma Thompson	3800	1	6
2	Daniel Rodriguez	2230	1	7
3	Olivia Smith	2000	1	8
4	Noah Johnson	6800	2	9
5	Sophia Martinez	1750	1	11
6	Liam Brown	13000	3	NULL
7	Ava Garcia	12500	3	NULL
8	William Davis	6800	2	NULL
9	Isabella Wilson	11000	3	NULL
10	James Anderson	4000	1	11

Query:

WITH cte AS(

  SELECT 
    department_name,
    name,
    salary
    
  FROM employee JOIN department 
  ON employee.department_id = department.department_id

),

cte2 AS(

  SELECT
  *,
  DENSE_RANK() OVER (PARTITION BY department_name ORDER BY salary DESC) AS rank
  
  FROM cte
)

SELECT department_name,name,salary FROM cte2
WHERE rank <= 3
ORDER BY department_name ASC;

step by step

cte output:

WITH cte AS(

  SELECT 
    department_name,
    name,
    salary
    
  FROM employee JOIN department 
  ON employee.department_id = department.department_id

)

department_name	name	salary
Data Analytics	Amelia Lee	4000
Data Analytics	James Anderson	4000
Data Analytics	Sophia Martinez	1750
Data Analytics	Olivia Smith	7000
Data Analytics	Daniel Rodriguez	2230
Data Analytics	Emma Thompson	3800
Data Science	Logan Moore	8000
Data Science	Charlotte Miller	7000

cte2 output:

cte2 AS(

  SELECT
  *,
  DENSE_RANK() OVER (PARTITION BY department_name ORDER BY salary DESC) AS rank
  
  FROM cte
)

department_name	name	salary	rank
Data Analytics	Olivia Smith	7000	1
Data Analytics	Amelia Lee	4000	2
Data Analytics	James Anderson	4000	2
Data Analytics	Emma Thompson	3800	3
Data Analytics	Daniel Rodriguez	2230	4
Data Analytics	Sophia Martinez	1750	5
Data Engineering	Liam Brown	13000	1
Data Engineering	Ava Garcia	12500	2
Data Engineering	Isabella Wilson	11000	3
Data Engineering	Mia Taylor	10800	4
Data Engineering	Benjamin Hernandez	9500	5
Data Science	Logan Moore	8000	1
Data Science	Charlotte Miller	7000	2
Data Science	Noah Johnson	6800	3
Data Science	William Davis	6800	3

final query output:

SELECT department_name,name,salary FROM cte2
WHERE rank <= 3
ORDER BY department_name ASC;

department_name	name	salary
Data Analytics	Olivia Smith	7000
Data Analytics	Amelia Lee	4000
Data Analytics	James Anderson	4000
Data Analytics	Emma Thompson	3800
Data Engineering	Liam Brown	13000
Data Engineering	Ava Garcia	12500
Data Engineering	Isabella Wilson	11000
Data Science	Logan Moore	8000
Data Science	Charlotte Miller	7000
Data Science	Noah Johnson	6800
Data Science	William Davis	6800

Active User Retention

Link

Assume you're given a table containing information on Facebook user actions. Write a query to obtain number of monthly active users (MAUs) in July 2022, including the month in numerical format "1, 2, 3".

An active user is defined as a user who has performed actions such as 'sign-in', 'like', or 'comment' in both the current month and the previous month.

user_actionsExample Input:

User ID	Event ID	Event Type	Event Date
445	7765	sign-in	05/31/2022 12:00:00
742	6458	sign-in	06/03/2022 12:00:00
445	3634	like	06/05/2022 12:00:00
742	1374	comment	06/05/2022 12:00:00
648	3124	like	06/18/2022 12:00:00

Query:

WITH cte AS (

SELECT 

*,
EXTRACT (MONTH FROM event_date) AS month,
EXTRACT (YEAR FROM event_date) AS year

FROM user_actions
)

select 
current.month,
count(DISTINCT current.user_id) as monthly_active_users
FROM cte current JOIN cte prev 
ON current.user_id = prev.user_id
AND current.year = prev.year
AND current.month = prev.month +1
WHERE current.month = 7 AND current.year = 2022
GROUP BY current.month

step by step

cte output:

WITH cte AS (

SELECT 

*,
EXTRACT (MONTH FROM event_date) AS month,
EXTRACT (YEAR FROM event_date) AS year

FROM user_actions
)

User ID	Event ID	Event Type	Event Date	Month	Year
445	7765	sign-in	05/31/2022 12:00:00	5	2022
445	3634	like	06/05/2022 12:00:00	6	2022
648	3124	like	06/18/2022 12:00:00	6	2022
648	2725	sign-in	06/22/2022 12:00:00	6	2022
648	8568	comment	07/03/2022 12:00:00	7	2022
445	4363	sign-in	07/05/2022 12:00:00	7	2022
445	2425	like	07/06/2022 12:00:00	7	2022
445	2484	like	07/22/2022 12:00:00	7	2022

Self join output:

SELECT
  current.user_id,
  current.event_type,
  current.month,
  current.year,
  prev.event_type,
  prev.month,
  prev.year
  
FROM cte current JOIN cte prev 
ON current.user_id = prev.user_id
AND current.year = prev.year
AND current.month = prev.month +1

User ID	Event Type	Month	Year	Event Type	Month	Year
445	like	6	2022	sign-in	5	2022
648	comment	7	2022	sign-in	6	2022
648	comment	7	2022	like	6	2022
445	sign-in	7	2022	like	6	2022
445	like	7	2022	like	6	2022
445	like	7	2022	like	6	2022
648	sign-in	7	2022	sign-in	6	2022
648	sign-in	7	2022	like	6	2022
445	comment	7	2022	like	6	2022

The self-join is used to find users who were active in consecutive months.

final query output:

select 
current.month,
count(DISTINCT current.user_id) as monthly_active_users
FROM cte current JOIN cte prev 
ON current.user_id = prev.user_id
AND current.year = prev.year
AND current.month = prev.month +1
WHERE current.month = 7 AND current.year = 2022
GROUP BY current.month

Month	Monthly Active Users
7	2

3 Topping Pizzas

Link

You’re a consultant for a major pizza chain that will be running a promotion where all 3-topping pizzas will be sold for a fixed price, and are trying to understand the costs involved.

Given a list of pizza toppings, consider all the possible 3-topping pizzas, and print out the total cost of those 3 toppings. Sort the results with the highest total cost on the top followed by pizza toppings in ascending order.

Break ties by listing the ingredients in alphabetical order, starting from the first ingredient, followed by the second and third.

Do not display pizzas where a topping is repeated. For example, ‘Pepperoni,Pepperoni,Onion Pizza’.

Ingredients must be listed in alphabetical order. For example, 'Chicken,Onions,Sausage'. 'Onion,Sausage,Chicken' is not acceptable.

pizza_toppings Example Input:

Topping Name	Ingredient Cost
Pepperoni	0.50
Sausage	0.70
Chicken	0.55
Extra Cheese	0.40

Query:

WITH cte AS (
  
  SELECT 
  
    pizza1.topping_name AS topping1,
    pizza1.ingredient_cost AS cost1,
    pizza2.topping_name AS topping2,
    pizza2.ingredient_cost AS cost2,
    pizza3.topping_name AS topping3,
    pizza3.ingredient_cost AS cost3
  
  FROM pizza_toppings pizza1
  CROSS JOIN pizza_toppings pizza2
  CROSS JOIN pizza_toppings pizza3
  WHERE pizza1.topping_name < pizza2.topping_name
  AND pizza2.topping_name < pizza3.topping_name

)
SELECT 

  CONCAT(topping1,',',topping2,',',topping3) AS pizza_concat, 
  cost1+cost2+cost3 AS total_cost

FROM cte
ORDER BY total_cost DESC, pizza_concat ASC

It uses CROSS JOIN to combine every row with every other row three times, generating all possible three-topping combinations.

The WHERE condition ensures that each topping appears only once per combination and enforces an order (topping1 < topping2 < topping3) to prevent duplicates

final query output:

Pizza Combination	Total Cost
Chicken, Pepperoni, Sausage	1.75
Chicken, Extra Cheese, Sausage	1.65
Extra Cheese, Pepperoni, Sausage	1.60
Chicken, Sausage, Spinach	1.55
Chicken, Mushrooms, Sausage	1.50
Chicken, Pineapple, Sausage	1.50
Pepperoni, Sausage, Spinach	1.50

Y on Y Growth Rate

Link

Assume you're given a table containing information about Wayfair user transactions for different products. Write a query to calculate the year-on-year growth rate for the total spend of each product, grouping the results by product ID.

The output should include the year in ascending order, product ID, current year's spend, previous year's spend and year-on-year growth percentage, rounded to 2 decimal places.

user_transactions Example Input:

transaction_id	product_id	spend	transaction_date
1341	123424	1500.60	12/31/2019 12:00:00
1423	123424	1000.20	12/31/2020 12:00:00
1623	123424	1246.44	12/31/2021 12:00:00
1322	123424	2145.32	12/31/2022 12:00:00

Query:

WITH cte AS (

SELECT

  EXTRACT(YEAR FROM transaction_date) AS year,
  product_id,
  SUM(spend) AS curr_year_spend

FROM user_transactions
GROUP BY year,product_id
ORDER BY year
),

cte2 AS (

  SELECT
    *,
    LAG(curr_year_spend) OVER (PARTITION BY product_id ORDER BY year) AS prev_year_spend
    
  FROM cte    

),

cte3 AS (

  SELECT
    *,
    ROUND((curr_year_spend / prev_year_spend - 1)*100,2) AS yoy_rate
  
  FROM cte2

)

SELECT * FROM cte3

step by step

cte output:

WITH cte AS (

SELECT

  EXTRACT(YEAR FROM transaction_date) AS year,
  product_id,
  SUM(spend) AS curr_year_spend

FROM user_transactions
GROUP BY year,product_id
ORDER BY year
)

year	product_id	curr_year_spend
2019	123424	1500.60
2019	234412	1800.00
2019	543623	6450.00
2020	123424	1000.20
2020	234412	1234.00
2020	543623	5348.12
2021	123424	1246.44
2021	234412	889.50

cte2 output:

cte2 AS (

  SELECT
    *,
    LAG(curr_year_spend) OVER (PARTITION BY product_id ORDER BY year) AS prev_year_spend
    
  FROM cte    

)

year	product_id	curr_year_spend	prev_year_spend
2019	123424	1500.60	NULL
2020	123424	1000.20	1500.60
2021	123424	1246.44	1000.20
2022	123424	2145.32	1246.44
2019	234412	1800.00	NULL
2020	234412	1234.00	1800.00
2021	234412	889.50	1234.00
2022	234412	2900.00	889.50
2019	543623	6450.00	NULL
2020	543623	5348.12	6450.00
2021	543623	2345.00	5348.12
2022	543623	5680.00	2345.00

cte3 output:

cte3 AS (

  SELECT
    *,
    ROUND((curr_year_spend / prev_year_spend - 1)*100,2) AS yoy_rate
  
  FROM cte2

)

SELECT * FROM cte3

year	product_id	curr_year_spend	prev_year_spend	yoy_rate
2019	123424	1500.60	NULL	NULL
2020	123424	1000.20	1500.60	-33.35
2021	123424	1246.44	1000.20	24.62
2022	123424	2145.32	1246.44	72.12
2019	234412	1800.00	NULL	NULL
2020	234412	1234.00	1800.00	-31.44
2021	234412	889.50	1234.00	-27.92
2022	234412	2900.00	889.50	226.03
2019	543623	6450.00	NULL	NULL
2020	543623	5348.12	6450.00	-17.08
2021	543623	2345.00	5348.12	-56.15
2022	543623	5680.00	2345.00	142.22

Repeated Payments

Link

Sometimes, payment transactions are repeated by accident; it could be due to user error, API failure or a retry error that causes a credit card to be charged twice.

Using the transactions table, identify any payments made at the same merchant with the same credit card for the same amount within 10 minutes of each other. Count such repeated payments.

The first transaction of such payments should not be counted as a repeated payment. This means, if there are two transactions performed by a merchant with the same credit card and for the same amount within 10 minutes, there will only be 1 repeated payment.

transactions Example Input:

transaction_id	merchant_id	credit_card_id	amount	transaction_timestamp
1	101	1	100	09/25/2022 12:00:00
2	101	1	100	09/25/2022 12:08:00
3	101	1	100	09/25/2022 12:28:00
4	102	2	300	09/25/2022 12:00:00
6	102	2	400	09/25/2022 14:00:00

Within 10 minutes after Transaction 1, Transaction 2 is conducted at Merchant 1 using the same credit card for the same amount. This is the only instance of repeated payment in the given sample data.

Query:

WITH cte AS (

  SELECT
    *,
    LAG(transaction_timestamp) OVER (PARTITION BY merchant_id, credit_card_id ORDER BY transaction_timestamp) AS prev_timestamp,
    LAG(merchant_id) OVER (PARTITION BY merchant_id, credit_card_id ORDER BY transaction_timestamp) AS prev_merch,
    LAG(credit_card_id) OVER (PARTITION BY merchant_id, credit_card_id ORDER BY transaction_timestamp) AS prev_cred,
    LAG(amount) OVER (PARTITION BY merchant_id, credit_card_id ORDER BY transaction_timestamp) AS prev_amount
    
  FROM transactions 

),
cte2 AS (

  SELECT
    *,
    EXTRACT(EPOCH FROM (transaction_timestamp - prev_timestamp)) / 60 AS diff_minutes
    
    FROM cte

),


cte3 AS (

  SELECT
  *,
  CASE 
  
    WHEN merchant_id = prev_merch AND credit_card_id = prev_cred AND amount = prev_amount 
    AND diff_minutes <= 10 THEN 1 ELSE 0 END AS counter
  
  FROM cte2

)

select SUM(counter) AS payment_count from cte3

step by step

cte output:

WITH cte AS (

  SELECT
    *,
    LAG(transaction_timestamp) OVER (PARTITION BY merchant_id, credit_card_id ORDER BY transaction_timestamp) AS prev_timestamp,
    LAG(merchant_id) OVER (PARTITION BY merchant_id, credit_card_id ORDER BY transaction_timestamp) AS prev_merch,
    LAG(credit_card_id) OVER (PARTITION BY merchant_id, credit_card_id ORDER BY transaction_timestamp) AS prev_cred,
    LAG(amount) OVER (PARTITION BY merchant_id, credit_card_id ORDER BY transaction_timestamp) AS prev_amount
    
  FROM transactions 

)

This Common Table Expression (CTE) retrieves all transactions and uses the LAG() window function to get details of the previous transaction for each (merchant_id, credit_card_id) pair, ordered by transaction_timestamp. This allows us to compare each transaction with its previous one

tran_id	mer_id	card_id	timestamp	amount	prev_tt	prev_mer	prev_card	prev_amount
1	101	1	09/25/2022 12:00:00	100	NULL	NULL	NULL	NULL
2	101	1	09/25/2022 12:08:00	100	09/25/2022 12:00:00	101	1	100
3	101	1	09/25/2022 12:28:00	100	09/25/2022 12:08:00	101	1	100
5	101	1	09/25/2022 13:37:00	100	09/25/2022 12:28:00	101	1	100
4	101	2	09/25/2022 12:20:00	300	NULL	NULL	NULL	NULL
6	102	2	09/25/2022 14:00:00	400	NULL	NULL	NULL	NULL
7	102	3	09/26/2022 10:00:00	300	NULL	NULL	NULL	NULL
8	102	3	09/26/2022 10:10:00	300	09/26/2022 10:00:00	102	3	300
9	102	3	09/26/2022 10:14:00	300	09/26/2022 10:10:00	102	3	300
10	103	4	09/27/2022 12:00:00	50	NULL	NULL	NULL	NULL
11	103	4	09/27/2022 12:09:00	50	09/27/2022 12:00:00	103	4	50
12	103	4	09/27/2022 22:00:00	50	09/27/2022 12:09:00	103	4	50
13	105	6	09/27/2022 12:00:00	200	NULL	NULL	NULL	NULL
14	105	6	09/27/2022 12:10:00	100	09/27/2022 12:00:00	105	6	200

cte2 output:

cte2 AS (

  SELECT
    *,
    EXTRACT(EPOCH FROM (transaction_timestamp - prev_timestamp)) / 60 AS diff_minutes
    
    FROM cte

)

This CTE calculates the time difference (in minutes) between each transaction and its previous one.

t_id	m_id	c_id	tt	amount	prev_tt	prev_m	prev_c	prev_amo	diff
1	101	1	09/25/2022 12:00:00	100	NULL	NULL	NULL	NULL	NULL
2	101	1	09/25/2022 12:08:00	100	09/25/2022 12:00:00	101	1	100	8.0
3	101	1	09/25/2022 12:28:00	100	09/25/2022 12:08:00	101	1	100	20.0
5	101	1	09/25/2022 13:37:00	100	09/25/2022 12:28:00	101	1	100	69.0
4	101	2	09/25/2022 12:20:00	300	NULL	NULL	NULL	NULL	NULL
6	102	2	09/25/2022 14:00:00	400	NULL	NULL	NULL	NULL	NULL
7	102	3	09/26/2022 10:00:00	300	NULL	NULL	NULL	NULL	NULL
8	102	3	09/26/2022 10:10:00	300	09/26/2022 10:00:00	102	3	300	10.0
9	102	3	09/26/2022 10:14:00	300	09/26/2022 10:10:00	102	3	300	4.0
10	103	4	09/27/2022 12:00:00	50	NULL	NULL	NULL	NULL	NULL
11	103	4	09/27/2022 12:09:00	50	09/27/2022 12:00:00	103	4	50	9.0
12	103	4	09/27/2022 22:00:00	50	09/27/2022 12:09:00	103	4	50	591.0
13	105	6	09/27/2022 12:00:00	200	NULL	NULL	NULL	NULL	NULL
14	105	6	09/27/2022 12:10:00	100	09/27/2022 12:00:00	105	6	200	10.0

cte3 output:

cte3 AS (

  SELECT
  *,
  CASE 
  
    WHEN merchant_id = prev_merch AND credit_card_id = prev_cred AND amount = prev_amount 
    AND diff_minutes <= 10 THEN 1 ELSE 0 END AS counter
  
  FROM cte2

)

This CTE determines if a transaction should be counted as potentially fraudulent.

This CTE checks if the current transaction has the same merchant_id, credit_card_id, and amount as the previous transaction and the time difference (diff_minutes) is 10 minutes or less

t_id	m_id	c_id	tt	amount	prev_tt	prev_me	prev_c	prev_am	diff	counter
1	101	1	09/25/2022 12:00:00	100	NULL	NULL	NULL	NULL	NULL	0
2	101	1	09/25/2022 12:08:00	100	09/25/2022 12:00:00	101	1	100	8.0	1
3	101	1	09/25/2022 12:28:00	100	09/25/2022 12:08:00	101	1	100	20.0	0
5	101	1	09/25/2022 13:37:00	100	09/25/2022 12:28:00	101	1	100	69.0	0
4	101	2	09/25/2022 12:20:00	300	NULL	NULL	NULL	NULL	NULL	0
6	102	2	09/25/2022 14:00:00	400	NULL	NULL	NULL	NULL	NULL	0
7	102	3	09/26/2022 10:00:00	300	NULL	NULL	NULL	NULL	NULL	0
8	102	3	09/26/2022 10:10:00	300	09/26/2022 10:00:00	102	3	300	10.0	1
9	102	3	09/26/2022 10:14:00	300	09/26/2022 10:10:00	102	3	300	4.0	1
10	103	4	09/27/2022 12:00:00	50	NULL	NULL	NULL	NULL	NULL	0
11	103	4	09/27/2022 12:09:00	50	09/27/2022 12:00:00	103	4	50	9.0	1
12	103	4	09/27/2022 22:00:00	50	09/27/2022 12:09:00	103	4	50	591.0	0
13	105	6	09/27/2022 12:00:00	200	NULL	NULL	NULL	NULL	NULL	0
14	105	6	09/27/2022 12:10:00	100	09/27/2022 12:00:00	105	6	200	10.0	0

final query output:

select SUM(counter) AS payment_count from cte3

Finally, the query sums up the counter column to get the total number of suspicious transactions

payment_count
4

Server Utilization Time

Link

Amazon Web Services (AWS) is powered by fleets of servers. Senior management has requested data-driven solutions to optimize server usage.

Write a query that calculates the total time that the fleet of servers was running. The output should be in units of full days.

Assumptions:

Each server might start and stop several times.
The total time in which the server fleet is running can be calculated as the sum of each server's uptime.

server_utilization Example Input:

server_id	session_status	status_time
1	start	08/02/2022 10:00:00
1	stop	08/04/2022 10:00:00
1	stop	08/13/2022 19:00:00
1	start	08/13/2022 10:00:00
3	stop	08/19/2022 10:00:00
3	start	08/18/2022 10:00:00
5	stop	08/19/2022 10:00:00
4	stop	08/19/2022 14:00:00
4	start	08/16/2022 10:00:00
3	stop	08/14/2022 10:00:00
3	start	08/06/2022 10:00:00
2	stop	08/24/2022 10:00:00
2	start	08/17/2022 10:00:00
5	start	08/14/2022 21:00:00

Query:

WITH cte AS (

  SELECT
    *,
    LAG(session_status) OVER (PARTITION BY server_id ORDER BY status_time) AS prev_status,
    LAG(status_time) OVER (PARTITION BY server_id ORDER BY status_time) AS prev_time
    
    FROM server_utilization

),

cte2 AS (

  SELECT
    *,
    CASE 
      WHEN session_status = 'stop' AND prev_status = 'start'
      THEN EXTRACT(EPOCH FROM (status_time - prev_time)) / 86400 END AS delta

  FROM cte

)

SELECT FLOOR(SUM(delta)) AS total_uptime_days FROM cte2

step by step

cte output:

WITH cte AS (

  SELECT
    *,
    LAG(session_status) OVER (PARTITION BY server_id ORDER BY status_time) AS prev_status,
    LAG(status_time) OVER (PARTITION BY server_id ORDER BY status_time) AS prev_time
    
    FROM server_utilization

)

This CTE adds the previous row’s session_status for the same server_id and the previous row’s status_time for the same server_id

server_id	session_status	status_time	prev_status	prev_time
1	start	08/02/2022 10:00:00	NULL	NULL
1	stop	08/04/2022 10:00:00	start	08/02/2022 10:00:00
1	start	08/13/2022 10:00:00	stop	08/04/2022 10:00:00
1	stop	08/13/2022 19:00:00	start	08/13/2022 10:00:00
2	start	08/17/2022 10:00:00	NULL	NULL
2	stop	08/24/2022 10:00:00	start	08/17/2022 10:00:00
3	start	08/06/2022 10:00:00	NULL	NULL
3	stop	08/14/2022 10:00:00	start	08/06/2022 10:00:00
3	start	08/18/2022 10:00:00	stop	08/14/2022 10:00:00
3	stop	08/19/2022 10:00:00	start	08/18/2022 10:00:00
4	start	08/16/2022 10:00:00	NULL	NULL
4	stop	08/19/2022 14:00:00	start	08/16/2022 10:00:00
5	start	08/14/2022 21:00:00	NULL	NULL
5	stop	08/19/2022 10:00:00	start	08/14/2022 21:00:00

cte2 output:

cte2 AS (

  SELECT
    *,
    CASE 
      WHEN session_status = 'stop' AND prev_status = 'start'
      THEN EXTRACT(EPOCH FROM (status_time - prev_time)) / 86400 END AS delta

  FROM cte

)

This CTE filters the rows where the session_status is "stop" and the prev_status was "start" (indicating a transition from running to stopped).Then it calculates the downtime in days.

server_id	session_status	status_time	prev_status	prev_time	delta
1	start	08/02/2022 10:00:00	NULL	NULL	NULL
1	stop	08/04/2022 10:00:00	start	08/02/2022 10:00:00	2.0000000000000000
1	start	08/13/2022 10:00:00	stop	08/04/2022 10:00:00	NULL
1	stop	08/13/2022 19:00:00	start	08/13/2022 10:00:00	0.37500000000000000000
2	start	08/17/2022 10:00:00	NULL	NULL	NULL
2	stop	08/24/2022 10:00:00	start	08/17/2022 10:00:00	7.0000000000000000
3	start	08/06/2022 10:00:00	NULL	NULL	NULL
3	stop	08/14/2022 10:00:00	start	08/06/2022 10:00:00	8.0000000000000000
3	start	08/18/2022 10:00:00	stop	08/14/2022 10:00:00	NULL
3	stop	08/19/2022 10:00:00	start	08/18/2022 10:00:00	1.00000000000000000000
4	start	08/16/2022 10:00:00	NULL	NULL	NULL
4	stop	08/19/2022 14:00:00	start	08/16/2022 10:00:00	3.1666666666666667
5	start	08/14/2022 21:00:00	NULL	NULL	NULL
5	stop	08/19/2022 10:00:00	start	08/14/2022 21:00:00	4.5416666666666667

final query output:

SELECT FLOOR(SUM(delta)) AS total_uptime_days FROM cte2

total_uptime_days
26

Department vs Company Salary

Link

The company wants to understand how the average salary in each department compares to the company's overall average salary each month.

Write a query to compare the average salary of employees in each department to the company's average salary for March 2024. Return the comparison result as 'higher', 'lower', or 'same' for each department. Display the department ID, payment month (in MM-YYYY format), and the comparison result.

employee Example Input:

employee_id	name	salary	department_id	manager_id
1	Emma Thompson	3800	1	6
2	Daniel Rodriguez	2230	1	7
3	Olivia Smith	7000	1	8
5	Sophia Martinez	1750	1	11

salary Example Input:

salary_id	employee_id	amount	payment_date
1	1	3800	01/31/2024 00:00:00
2	2	2230	01/31/2024 00:00:00
3	3	7000	01/31/2024 00:00:00
4	4	6800	01/31/2024 00:00:00
5	5	1750	01/31/2024 00:00:00

Query:

WITH cte AS (

  SELECT
    
    salary.amount,
    employee.department_id,
    TO_CHAR(payment_date, 'MM-YYYY') AS payment_date
    
  FROM salary 
  JOIN employee ON salary.employee_id = employee.employee_id

),

cte2 AS (

  SELECT
    *,
    AVG(amount) OVER (PARTITION BY department_id) AS avg_dep,
    AVG(amount) OVER () AS avg_global
    
  FROM cte  
),

cte3 AS (

  SELECT
    *,
    CASE WHEN avg_dep > avg_global THEN 'higher'
         WHEN avg_dep < avg_global THEN 'lower'
         ELSE 'same' END AS comparison
         
  FROM cte2       

)

SELECT DISTINCT 

  department_id,
  payment_date,
  comparison
  
FROM cte3
WHERE payment_date = '03-2024'

step by step

cte output:

WITH cte AS (

  SELECT
    
    salary.amount,
    employee.department_id,
    TO_CHAR(payment_date, 'MM-YYYY') AS payment_date
    
  FROM salary 
  JOIN employee ON salary.employee_id = employee.employee_id

)

amount	department_id	payment_date
3800	1	03-2024
3800	1	02-2024
3800	1	01-2024
7000	1	01-2024
6800	2	03-2024
6800	2	02-2024
6800	2	01-2024

cte2 output:

cte2 AS (

  SELECT
    *,
    AVG(amount) OVER (PARTITION BY department_id) AS avg_dep,
    AVG(amount) OVER () AS avg_global
    
  FROM cte  
)

amount	department_id	payment_date	avg_dep	avg_global
3800	1	03-2024	3796.6666666667	7212.0000000000
4000	1	01-2024	3796.6666666667	7212.0000000000
4000	1	02-2024	3796.6666666667	7212.0000000000
2230	1	02-2024	3796.6666666667	7212.0000000000
2230	1	01-2024	3796.6666666667	7212.0000000000
8000	2	03-2024	7150.0000000000	7212.0000000000
6800	2	03-2024	7150.0000000000	7212.0000000000
6800	2	02-2024	7150.0000000000	7212.0000000000
6800	2	01-2024	7150.0000000000	7212.0000000000
10800	3	03-2024	11360.0000000000	7212.0000000000
10800	3	02-2024	11360.0000000000	7212.0000000000
10800	3	01-2024	11360.0000000000	7212.0000000000
9500	3	03-2024	11360.0000000000	7212.0000000000
9500	3	02-2024	11360.0000000000	7212.0000000000
9500	3	01-2024	11360.0000000000	7212.0000000000

cte3 output:

cte3 AS (

  SELECT
    *,
    CASE WHEN avg_dep > avg_global THEN 'higher'
         WHEN avg_dep < avg_global THEN 'lower'
         ELSE 'same' END AS comparison
         
  FROM cte2       

)

amount	department_id	payment_date	avg_dep	avg_global	comparison
3800	1	03-2024	3796.67	7212.00	lower
4000	1	01-2024	3796.67	7212.00	lower
4000	1	02-2024	3796.67	7212.00	lower
2230	1	02-2024	3796.67	7212.00	lower
2230	1	01-2024	3796.67	7212.00	lower
8000	2	03-2024	7150.00	7212.00	lower
6800	2	03-2024	7150.00	7212.00	lower
6800	2	02-2024	7150.00	7212.00	lower
6800	2	01-2024	7150.00	7212.00	lower
10800	3	03-2024	11360.00	7212.00	higher
10800	3	02-2024	11360.00	7212.00	higher
10800	3	01-2024	11360.00	7212.00	higher
9500	3	03-2024	11360.00	7212.00	higher
9500	3	02-2024	11360.00	7212.00	higher
9500	3	01-2024	11360.00	7212.00	higher

final query output:

SELECT DISTINCT 

  department_id,
  payment_date,
  comparison
  
FROM cte3
WHERE payment_date = '03-2024'

department_id	payment_date	comparison
1	03-2024	lower
2	03-2024	lower
3	03-2024	higher

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
readme.MD		readme.MD

ManuelGuerra1987/sql-exercises

Folders and files

Latest commit

History

Repository files navigation

SQL Exercises

Table of contents

Highest Grossing Items

Query:

step by step

Top 5 Artists

Query:

step by step

Card Launch Success

Query:

step by step

International Call Percentage

Query:

step by step

Top Three Salaries

Query:

step by step

Active User Retention

Query:

step by step

3 Topping Pizzas

Query:

Y on Y Growth Rate

Query:

step by step

Repeated Payments

Query:

step by step

Server Utilization Time

Query:

step by step

Department vs Company Salary

Query:

step by step

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages