Airline Support Report¶
This notebook analyzes a large number of airline support tickets and produces a report summarizing the complaints. This notebook mirrors the airline support tutorial in DocWrangler / DocETL (Shankar et al., 2024), allowing for direct comparison between DocETL and Semlib.
The processing pipeline consists of two phases:
- (map) In one LLM call per complaint, extract the chief complaint and other information as structured data.
- (apply) In a single LLM call, process all the structured data and produce a report summarizing the complaints. In DocETL's terminology, this is a "reduce" operation. Semlib's reduce mirrors the
reduce
higher-order function in functional programming, which combines elements from a list two at a time. Semlib's apply method processes data in a single LLM call.
This notebook uses the OpenAI API and costs about $0.05 to run.
Install and configure Semlib¶
If you don't already have Semlib installed, run:
%pip install semlib
We start by initializing a Semlib Session. A session provides a context for performing Semlib operations. We configure the session to cache LLM responses on disk in cache.db
, and we configure the default model to OpenAI's gpt-4o-mini
.
If your OPENAI_API_KEY
is not already set in your environment, you can uncomment the line at the bottom of the next cell and set your API key there.
If you wish to use a different LLM (e.g., anthropic/claude-3-5-sonnet-20240620
), change the model=
kwarg below. Make sure that the appropriate environment variable (e.g., ANTHROPIC_API_KEY
) is set in your environment. You can use any LLM supported by LiteLLM.
from semlib import OnDiskCache, Session
session = Session(cache=OnDiskCache("cache.db"), model="openai/gpt-4o-mini")
# Uncomment the following lines and set your OpenAI API key if not already set in your environment
# import os
# os.environ["OPENAI_API_KEY"] = "..."
Download and preview dataset¶
import json
import urllib
tickets = json.loads(
urllib.request.urlopen(
"https://gist.githubusercontent.com/anishathalye/9d13b58d7ea820b11bcbe8c7b5704649/raw/7f0ae3f64f1107553a2f1f425473899104b3d4f8/airline_support_chats_kaggle.json"
).read()
)
print(f"Total number of tickets: {len(tickets)}")
print()
print(f"Example ticket text: {tickets[0]['text'][:400]}...")
Total number of tickets: 250 Example ticket text: Here is the description of my terrible experience with FlightNetwork: I needed to book round trip tickets to Copenhagen for 3 people. We searched Google flights for the best deal and found a round trip flight for $843 from Flight Network. Since we needed to book using 3 different credit cards, I called FlightNetwork for the booking to ensure that we all get the same price and availability. First, ...
from pydantic import BaseModel
class Complaint(BaseModel):
chief_complaint: str
complaint_category: str
frustration_level: str
Next, we use the map method to prompt an LLM per-ticket to extract the data we're looking for. Semlib supports structured data extraction via the return_type=
kwarg, which supports both Pydantic models as well as Bare types.
We take this prompt template from the DocETL tutorial and translate it from the Jinja2 syntax to the Python lambda / f-string supported by Semlib.
The following cell takes about 45 seconds, with the default concurrency level, to process 250 tickets. If you have high OpenAI rate limits, you can experiment with higher concurrency by passing max_concurrency=<number>
to the Session
constructor above.
complaints: list[Complaint] = await session.map(
tickets,
lambda ticket: f"""
Describe the chief complaint from the user, including direct quotes from the user to capture their exact words.
{ ticket['text'] }
Additionally, categorize the complaint into one of the following categories: pricing, tech issue, user error, customer service, or animal/pet assistance. Ensure that the category is in lowercase.
Determine the level of frustration expressed by the user, using the values: high, medium, or low (all in lowercase).
Format the output as follows:
- Chief Complaint: [User's quoted complaint]
- Complaint Category: [Selected category]
- Frustration Level: [high/medium/low]
Example output:
- Chief Complaint: "I was charged extra fees that were not disclosed during booking."
- Complaint Category: pricing
- Frustration Level: high
""".strip(),
return_type=Complaint,
)
We can preview what one of the complaints looks like:
complaints[0]
Complaint(chief_complaint='"Overall absolutely terrible terrible nightmarish experience with FlightNetwork. Not worth the headache and lack of transparency."', complaint_category='customer service', frustration_level='high')
Combine individual complaints into a single report¶
Next, we use an LLM to analyze all the complaints together and generate a report. Here, we use the apply method to apply an LLM prompt to a value (which is itself a thin wrapper around a simple LLM prompt).
We take this prompt template from the DocETL tutorial as well; we translate it from the Jinja2 syntax as follows. First, we define a function for formatting a single complaint:
def format_complaint(x: tuple[int, Complaint]) -> str:
return f"""
Ticket #{x[0]}:
- Complaint Category: { x[1].complaint_category }
- Frustration Level: { x[1].frustration_level }
- Chief Complaint: { x[1].chief_complaint }
""".strip()
Next, we invoke the apply
method, passing it:
enumerate(complaints)
, which adds ticket numbers- A callable prompt
template
, which takes as an argument the list of (ticket number, complaint) tuples and returns a formatted string, which will be passed to the LLM
Here, we do not specify a return_type
, so we get back a string containing the report.
report = await session.apply(
enumerate(complaints),
lambda c: f"""
Here are some complaints found in the dataset:
{'\n\n'.join(map(format_complaint, c))}
Summarize the common complaints across all tickets, and highlight how they differ across frustration levels.
""".strip(),
)
Cost analysis¶
We used GPT-4o-mini, a low-cost LLM. We can check the total cost for the 251 LLM operations:
f"${session.total_cost():.3f}"
'$0.037'
Final result¶
The final report is written in Markdown. We can render this Markdown in the notebook with the following:
from IPython.display import display_markdown
display_markdown(report, raw=True)
Common Complaints summarized across all tickets:¶
Customer Service Issues:
- Many complaints center on poor customer service experiences, including unhelpful staff, inaccessible customer support (long wait times), and lack of clear communication regarding flight cancellations or changes.
- High frustration level tickets often express feelings of being disregarded, misled, or treated poorly, while medium frustration tickets may highlight confusion or disappointment rather than outright anger.
Flight Cancellations and Delays:
- A significant number of complaints relate to flight cancellations or delays and the inadequate responsiveness of airlines in addressing the situation. High frustration tickets emphasize the resulting inconveniences, such as missed connections and financial losses.
- Medium frustration tickets highlight general irritation with the situation without an extreme emotional reaction.
Pricing and Hidden Charges:
- Numerous complaints address unexpected costs or fees that passengers encountered, such as higher charges for luggage and unanticipated pricing discrepancies.
- High frustration tickets denote a sense of being exploited or frustrated by the pricing policies, while medium frustration tickets express confusion or annoyance at the difficulty of navigating costs.
Baggage Issues:
- Complaints about lost or damaged luggage are prevalent, with many expressing disbelief over the compensation or support available for such incidents. High frustration tickets articulate distress over valuable items lost or the impact on travel plans.
- Medium frustration tickets may relay concern without reaching the level of significant anxiety.
Tech Issues and Booking Complications:
- Several complaints relate to technical problems during the booking process or confusion regarding ticket rules and regulations, particularly around connecting flights and baggage handling.
- High frustration tickets are marked by feelings of helplessness regarding procedural issues, while medium frustration tickets reveal confusion about technology but not an overwhelming sense of urgency.
Seating Assignments:
- Problems with seating arrangements and failures to honor pre-allocated seats are reported frequently, often leading to significant frustrations when families are separated or downgraded.
- High frustration complaints typically involve outright anger over the perceived injustices, while medium-level frustration might indicate disappointment or uncertainty.
Frustration Levels:¶
High Frustration Level:
- Complaints are intense, often expressing a mixture of anger, betrayal, or a feeling of being wronged. The customers frequently demand action, compensation, or vocalize a firm decision to avoid using that airline again. There’s often a strong emotional reaction tied to personal losses or stressful situations.
Medium Frustration Level:
- These complaints convey dissatisfaction but often include a more measured tone. Customers are frustrated but might be seeking advice, expressing confusion about policies, or sharing experiences without an extreme emotional response. They may still be disappointed or annoyed but less likely than high frustration customers to express outrage or threat of action.
Low Frustration Level:
- Complaints at this level generally reflect positive remarks or minor issues. Customers express satisfaction with some aspects of service, appreciate efforts made by staff, or confirm that their experiences were generally acceptable, aside from small complaints or suggestions for improvement.
In summary, the overall complaints indicate major issues with customer service, flight management, and pricing policies, with the intensity of complaints varying significantly based on the frustration levels of the customers. High frustration tickets tend to express feelings of helplessness and betrayal, while medium frustration tickets reflect confusion and irritation without extreme emotional responses. Low frustration complaints were rare and often offered constructive feedback instead of outright complaints.