Airline Ticket Trend and Sales Analysis Using SQL

Hello and welcome to my portfolio blog. In this article, I will analyze an airline sales dataset using SQL, including data cleaning and answering several business-related questions.

As we know, transportation plays an important role in global communication, especially for international travel. One of the most widely used modes of transportation is air travel, operated by airlines that serve millions of customers and generate large amounts of data. Therefore, analyzing airline data to understand customer behavior is the main objective of this article. However, this analysis is limited to using SQL as the only tool. I aim to explore how far SQL alone can answer analytical questions.

Dataset Overview

First of all, let me introduce the dataset. I got this dataset on Kaggle.com which type of file is CSV, here is the link: https://www.kaggle.com/datasets/syedaeman2212/airline-ticket-prices-dataset. The dataset contains ticket data from eight airlines.

  • Dataset Name: Airline Ticket Prices Dataset
  • Rows (Records): 250 rows
  • Columns (Features): 8 columns
  • Data Quality: clean with no missing values (0%)
  • Data Dictionary
  • Ticket_ID: Numeric, unique ID for every transaction.
  • Airline:Text (String), name of 8 airlines
  • Origin: Text(String), origin city of departure.
  • Destination: Text(String), Destination city of flight.
  • Distance_km: Numeric, distance of flight route in kilometer unit
  • Class: Text(String), class of flight cabin (Economy, Business, First).
  • Days_Before_Departure: Numeric, the distance of time between ticket booking and the days of departure.
  • Price_USD: Numeric, flight ticket price in USD.

Short Statistical Summary

  • Average ticket price: $2,006.06
  • The Lowest ticket price: $106.07
  • The highest price: $8,852.69
  • Average of flight distances: 5,999 km
  • Average of booking time: 59 days before depart (H-59).

Analytical Questions

Using this dataset, I will attempt to answer ten questions, ordered from the simplest to the most complex:

  1. Qatar Airways is one of the most well-known airlines in the Middle East. Retrieve all flight tickets from “Qatar Airways” that depart from “Doha”, and sort them by ticket price from the cheapest to the most expensive.
  2. Filter all flight data where the booking is made 30 days before departure and the ticket price is less than $1,000.
  3. Retrieve all flights where both the origin and destination start with the letter “M”.
  4. What are the average, minimum, and maximum ticket prices for each cabin class?
  5. Identify the top three airlines with the highest number of ticket sales, along with their total revenue.
  6. Group tickets based on booking time into three categories:
    • Last Minute: 0–14 days
    • Normal: 15–60 days
    • Early Bird: more than 60 days
    • Also, calculate the average ticket price for each category.
  7. Retrieve detailed ticket information (Ticket_ID, Airline, Origin, Destination, Price_USD) for Economy class tickets that are priced higher than the average price of Business class tickets.
  8. Identify flight routes where ticket prices are above the overall dataset average.
  9. Rank airline tickets by price (from highest to lowest) within each airline, and return only the top three tickets per airline.
  10. For each airline, calculate the difference between each ticket price and the average ticket price of that airline within the same cabin class.

Solution Approach

After understanding the problem, in this section I will solve it using SQL syntax and show the resulting output.

Question 1
To solve the first question, I use “WHERE” and “ORDER BY” clauses.

SQL
select * from airlines
where airline = "Qatar Airways"
order by price_USD;

Question 2
To solve this question, I use the combination of “WHERE” and “AND” clauses.

SQL
select * from airlines
where 
  Class = "Business" and 
  days_before_departure <= 30 and 
  price_usd <= 1000;

Question 3
Here, I use the “LIKE” operator with the pattern “%” after “M” to retrieve all words that start with “M”.

SQL
select * from airlines
where 
  origin like "M%" or 
  destination like "M%";

Question 4
I use the aggregate function in this section to get the average, minimum, and maximum.

SQL
select
class,
avg(price_usd),
min(price_usd),
max(price_usd)
from airlines
group by class;

Question 5
To solve this question, I use “GROUP BY” and “ORDER BY” clauses to get the airline with the highest number of ticket sales, and “LIMIT” to retrieve only 3 of them.

SQL
select 
airline,
count(ticket_id) as tiket_terjual,
sum(round(ticket_id * price_usd)) as total_pendapatan
from airlines
group by airline
order by count(ticket_id) desc
limit 3;

Question 6
I use a Common Table Expression (CTE) in this section to group tickets based on booking time.

SQL
with waktu_pesan as (
select price_usd,
case
  when days_before_departure <=14 then "last_Minute"
  when days_before_departure between 15 and 60 then "Normal"
  when days_before_departure > 60 then "Early Bird"
end as waktu_pesan
from airlines)
select
round(avg(price_usd),2) as average_price,
waktu_pesan
from waktu_pesan
group by waktu_pesan;

Question 7
To retrieve only Economy class tickets that are priced higher than the average price of Business class tickets, I use a Subquery.

SQL
select 
Ticket_ID, 
Airline, 
Origin, 
Destination, 
Price_USD
from airlines
where 
  class = "Economy" and 
  price_usd > (
  select 
  avg(price_usd)
  from airlines
  where class = "Business");

Question 8
To solve this question, I use a Window Function to retrieve flight routes where ticket prices are above the overall dataset average.

SQL
select
Origin, 
Destination, 
Price_USD,
row_number() over () as peringkat
from airlines
where price_usd > (
  select avg(price_usd) from airlines)
  order by price_usd desc);

Question 9
Similar to question 8, I use a Window Function to rank airline tickets by price (from highest to lowest) within each airline.

SQL
with peringkat as (
select
airline,
Origin, 
Destination, 
Price_USD,
dense_rank() over (
partition by airline
order by price_usd desc) as peringkat
from airlines)
select * from peringkat
where peringkat <= 3;

Question 10
I use a Window Function to solve this question to get the difference between each ticket price and the average ticket price.

SQL
select
Ticket_ID, 
Airline, 
Class, 
Price_USD, 
round(avg(price_usd) over(partition by airline, class),2) as Avg_Price_Per_Airline_Class,
round(price_usd - avg(price_usd) over(partition by airline, class),2) as Price_Difference
from airlines;

Conclusion

Through these exercises, I applied a variety of SQL techniques, ranging from basic filtering and sorting to more advanced concepts such as subqueries, aggregate functions, Common Table Expressions (CTEs), and window functions. This approach demonstrates my ability to analyze data, extract meaningful insights, and structure queries efficiently. It also reflects my understanding of how different SQL features can be combined to solve real-world data problems.

Leave a Comment