Disneyland Reviews Synthesis¶
This notebook analyzes tens of thousands of reviews of Disneyland to produce a list of the top complaints, ordered by frequency, with citations.
The analysis is implemented with the following pipeline:
- (map) Extract criticism from each review, using one LLM call per review.
- (non-semantic filter) Filter out reviews that do not contain any complaints.
- (reduce) Combine criticism into a single summary, using O(n) LLM calls, using an associative combinator to improve latency (to O(log n) with unlimited concurrency).
- (apply) Parse the criticism summary into separate items.
- (map) Compute citations: for each review, determine which of the criticisms are supported by the review.
- (non-semantic sort) Order the criticism by frequency.
This notebook uses the OpenAI API and costs about $5 to run. If you want to reduce costs and/or running time, you can sub-sample the data (e.g., reviews = reviews[:100]
) or switch to a faster and cheaper model (e.g., gpt-4.1-nano
).
Install and configure Semlib¶
If you don't already have Semlib installed, run:
%pip install semlib
We start by initializing a Semlib Session. A session provides a context for performing Semlib operations. We configure the session to cache LLM responses on disk in cache.db
, set the default model to OpenAI's gpt-4o-mini
, and set max concurrency to 100.
We use a high max_concurrency=
kwarg because we'll be making tens of thousands of LLM queries, and this will speed up the computation. If you run into issues due to your OpenAI rate limits, you can decrease this number. You can also edit the notebook to sub-sample data if you want to reduce costs and running time.
If your OPENAI_API_KEY
is not already set in your environment, you can uncomment the line at the bottom of the next cell and set your API key there.
If you wish to use a different LLM (e.g., anthropic/claude-3-5-sonnet-20240620
), change the model=
kwarg below. Make sure that the appropriate environment variable (e.g., ANTHROPIC_API_KEY
) is set in your environment. You can use any LLM supported by LiteLLM.
from semlib import Bare, Box, OnDiskCache, Session
session = Session(cache=OnDiskCache("cache.db"), model="openai/gpt-4o-mini", max_concurrency=100)
# Uncomment the following lines and set your OpenAI API key if not already set in your environment
# import os
# os.environ["OPENAI_API_KEY"] = "..."
Download and preview dataset¶
!curl -s -L -o disneyland-reviews.zip https://www.kaggle.com/api/v1/datasets/download/arushchillar/disneyland-reviews
!unzip -q -o disneyland-reviews.zip
In this notebook, we only consider the reviews for Disneyland California.
import csv
with open("DisneylandReviews.csv", encoding="latin-1") as f_in:
csv_file = csv.reader(f_in)
header = next(csv_file)
reviews = [dict(zip(header, row, strict=False)) for row in csv_file]
reviews = [r for r in reviews if r["Branch"] == "Disneyland_California"]
print(f"Loaded {len(reviews)} reviews\n")
print(f"Example review: {reviews[0]['Review_Text']}")
Loaded 19406 reviews Example review: This place has always been and forever will be special. The feeling you get entering the park, seeing the characters and different attractions is just priceless. This is definitely a dream trip for all ages, especially young kids. Spend the money and go to Disneyland, you will NOT regret it
extracted_criticism = await session.map(
reviews,
template=lambda r: f"""
Extract any criticism from this review of Disneyland California, as a succinct bulleted list. If there is none, respond '(none)'.
{r["Review_Text"]}
""".strip(),
)
We can see what some of these items looks like.
print(f"Criticism from review 0:\n{extracted_criticism[0]}\n")
print(f"Criticism from review 1:\n{extracted_criticism[1]}")
Criticism from review 0: (none) Criticism from review 1: - Nothing is cheap; it may be expensive for some visitors. - Restrictions on items allowed at entry gates (selfie sticks, glass refill bottles, etc).
Filter out reviews without complaints¶
For downstream analysis, we only want to process the non-empty criticism. We already asked the LLM to output "(none)" when the review contains no criticism, so we don't need a semantic operator for this step.
criticism = [text for text in extracted_criticism if text != "(none)"]
print(f"{len(criticism)} out of {len(reviews)} reviews contain criticism")
13293 out of 19406 reviews contain criticism
Combine criticism into a single summary¶
Now, we have tens of thousands of individual complaints. We want to combine this into a single summary of criticism. One option is to dump all the complaints into a long-context LLM and ask it to produce a summary in one go; this could work, but some research has shown that LLMs can exhibit poor performance when processing such large amounts of data in a single shot (this is one motivation behind research works like DocETL (Shankar et al., 2024) and LOTUS (Patel et al., 2024)).
In this notebook, we use Semlib's reduce operator to process the data, which is analogous to the reduce or fold higher-order function in functional programming. The reduce operator takes as an argument a template that explains how to combine an "accumulator" with a single item, and applies this template n times to process each item in the dataset.
As a performance optimization, if we implement an associative template, then we can pass associative=True
to the call to Semlib's reduce()
, and then rather than issue O(n) LLM calls serially, Semlib will arrange the computation in a tree structure, with depth O(log n), which has a huge impact on decreasing the latency of the operation.
For computations that are "naturally" associative (e.g., addition), this is straightforward; in our case, the individual items in our data to process (the leaf nodes in the reduction tree) are individual complaints, while internal nodes are summaries of complaints. To support this, we "tag" leaf nodes in the data with Box, and write a template that behaves differently based on whether inputs are leaf nodes or internal nodes of the reduction tree.
In the case of this particular task, merging individual complaints into a running summary, a single LLM prompt (regardless of whether it's handling two individual complaints, a complaint and a summary, or two summaries) would probably work fine, but for other more complex tasks, this complexity may be necessary, so this tutorial demonstrates how to do it.
# common formatting instructions in all cases
FORMATTING_INSTRUCTIONS = """Ensure that each bullet point is as succinct as possible, representing a single logical idea. Write separate criticisms as separate bullet points. Combine any similar criticism into the same bullet point. Output your answer as a single-level bulleted list with no other formatting."""
def merge_template(a: str | Box[str], b: str | Box[str]) -> str:
if isinstance(a, Box) and isinstance(b, Box):
# both are leaf nodes in the reduction tree (raw criticism)
return f"""
Consider the following two lists of criticisms about Disneyland California, and return a bulleted list summarizing the criticism from the two lists.
<criticism 1>
{a.value}
</criticism 1>
<criticism 2>
{b.value}
</criticism 2>
{FORMATTING_INSTRUCTIONS}
""".strip()
if not isinstance(a, Box) and not isinstance(b, Box):
# both are internal nodes in the reduction tree (summaries)
return f"""
Consider the following two lists summarizing criticism about Disneyland California, combine them into a single summary of criticism.
<criticism summary 1>
{a}
</critisim summary 1>
<criticism summary 2>
{b}
</criticism summary 2>
{FORMATTING_INSTRUCTIONS}
""".strip()
# when the tree isn't perfectly balanced, there will be cases where one input is a leaf node and the other is an internal node
# so we need to handle the case where one input is a raw criticism and the other is a summary
if isinstance(a, Box) and not isinstance(b, Box):
feedback = b
criticism = a.value
if not isinstance(a, Box) and isinstance(b, Box):
feedback = a
criticism = b.value
return f"""
Consider the following summary of criticism about Disneyland California, and the following criticism from a single individual. Merge that individual's criticism into the summary.
<criticism summary>
{feedback}
</criticism summary>
<criticism>
{criticism}
</criticism>
{FORMATTING_INSTRUCTIONS}
""".strip()
After we've implemented the template, we can kick off our reduce()
call. We wrap all of the inputs with Box, and we pass associative=True
to the operator.
The following cell takes about 5 minutes to run, with the max_concurrency=100
set at the top of this notebook.
merged_criticism = await session.reduce(map(Box, criticism), template=merge_template, associative=True)
Parse criticism summary into individual items¶
The merged_criticism
is a single string that contains a bulleted list. If the LLM performed well and outputted a well-formed Markdown list with each item on a single line, we could parse the result into separate items using some basic Python code, like [item.strip("- ").strip() for item in merged_criticism.split("\n") if item.strip().startswith("-")]
. This could break if the feedback was hard-wrapped across multiple lines or used a different bullet point format (like *
bullets instead of -
bullets), for example. So instead of parsing using Python code, here we parse the response using an LLM.
We want a list[str]
, so we use Semlib's Bare to annotate the return type.
criticism_items = await session.apply(
merged_criticism, "Turn this list into a JSON array of strings.\n\n{}", return_type=Bare(list[str])
)
For display purposes, we switch to numbered items.
numbered_criticism = "\n".join(f"{i+1:d}. {item}" for i, item in enumerate(criticism_items))
print(numbered_criticism)
1. High admission prices and expensive food and merchandise contribute to perceptions of Disneyland as overpriced and unaffordable for average families. 2. Long wait times for rides and character interactions often exceed 30 minutes to over 2 hours, with some attractions reaching up to 6 hours, leading to visitor frustration. 3. Severe overcrowding year-round complicates navigation, creates discomfort, and can leave parks declared 'full.' 4. The FastPass system is largely ineffective due to limited availability, poor management, and confusion among guests. 5. Many attractions cater primarily to younger children, resulting in limited options for older kids and adults, leading to dissatisfaction. 6. Frequent ride breakdowns and closures occur without prior notice, causing disappointment and safety concerns. 7. Food quality is often low, with few healthy options available, and high prices contribute to visitor dissatisfaction. 8. Maintenance and cleanliness issues, such as overflowing trash cans, dirty restrooms, and unkempt areas, detract from the overall experience. 9. Limited seating areas, quiet spots, and insufficient shading result in discomfort and difficulty resting throughout the park. 10. Reports of declining staff enthusiasm, rudeness, and poor service diminish overall service quality and atmosphere. 11. Limited character interactions and excessive wait times for meet-and-greets leave guests feeling underwhelmed. 12. Safety concerns related to poor lighting, crowd management, and stroller congestion affect visitor comfort. 13. Fireworks shows are limited to weekends or canceled without clear communication, disappointing guests. 14. Guests feel the experience lacks the magic of past visits and is increasingly commercialized with a focus on merchandise sales. 15. Parking navigation issues and traffic congestion during peak hours add to travel frustrations. 16. The presence of homeless individuals near the park contributes to visitor discomfort. 17. Many visitors can only experience a limited number of rides per day, necessitating extensive planning that detracts from enjoyment. 18. Unpleasant odors and sewer issues further diminish the guest experience. 19. Insufficient accommodations for individuals with disabilities raise concerns for families with young children. 20. Poor communication regarding events and logistics leads to guest frustration and confusion. 21. Major construction disrupts access to attractions and amenities, impacting the overall experience. 22. Excessive walking and long park hours lead to fatigue and detract from enjoyment. 23. Limited viewing spots and seating for shows result in discomfort and detract from entertainment experiences. 24. Many merchandise offerings are repetitive and of disappointing quality, impacting the shopping experience. 25. Weather discomfort, including excessive heat during the day and cold temperatures at night, affects visitor comfort. 26. Cumbersome security checks and bag searches add frustration to the entry process. 27. The World of Color show is often considered underwhelming and fails to meet expectations. 28. Recommendations suggest visiting during off-peak times for better experiences with lighter crowds.
def citation_template(review):
return f"""
Which of the following pieces of criticism, if any, about Disneyland California is substantiated by the following review?
<criticism>
{numbered_criticism}
</criticism>
<review>
{review["Review_Text"]}
</review>
Respond with a list of the numbers of the pieces of criticism that are substantiated by the review.
""".strip()
Again, we use the Bare annotation to get back structured data.
The following cell takes about 2 minutes to run, with the max_concurrency=100
set at the top of this notebook.
per_review_citations = await session.map(reviews, citation_template, return_type=Bare(list[int]))
Now, we can see, for example, which criticisms from the summary are substantiated by a particular review.
print(per_review_citations[1])
print(reviews[1]["Review_Text"])
[1, 4] A great day of simple fun and thrills. Bring cash, nothing is cheap, but we knew that it's Disney. But they are great letting you bring in your own food, drinks, etc but read the list closely, we list several items at the entry gates (selfy sticks, glass refill bottles, etc). It is worth buying the photo pass and fastpass. Have fun!
Sort criticism by frequency¶
Now that we have citations on a per-review basis, we can figure out citations on a per-criticism basis, and that'll let us sort by frequency, to find the most frequent criticisms of Disneyland California.
# map from criticism index (1-based) to set of review indices (0-based) that cite it
citations: dict[int, set[int]] = {i + 1: set() for i in range(len(criticism_items))}
for review_idx, cited in enumerate(per_review_citations):
for feedback_idx in cited:
# sometimes the structured output isn't perfect, and it includes numbers that are out of range,
# so we filter those out here
if 1 <= feedback_idx <= len(criticism_items):
citations[feedback_idx].add(review_idx)
We don't need a semantic sort for this step, a regular old Python sort will do.
by_count = [
(i[1], i[2])
for i in sorted(((len(citations[i + 1]), feedback, i) for i, feedback in enumerate(criticism_items)), reverse=True)
]
Results¶
Finally, we can look at the results. Here, we look at the top 10 criticisms, showing the count of people, the summary of the criticism, a couple citations (so we could follow up by looking at individual reviews), and a single review that substantiates that criticism.
for feedback, i in by_count[:10]:
sorted_citations = sorted(citations[i + 1])
cite_str = f"{', '.join([str(c) for c in sorted_citations][:3])}, ..."
print(f"({len(citations[i+1])}) {feedback} [{cite_str}]\n")
some_cite = sorted_citations[min(i * 10, len(sorted_citations) - 1)] # get some variety
print(f" Review {some_cite}: {reviews[some_cite]['Review_Text']}\n\n")
(7267) Severe overcrowding year-round complicates navigation, creates discomfort, and can leave parks declared 'full.' [2, 3, 9, ...] Review 76: We had a great time at Disneyland. It was nice to see Fantasmic after missing the last few times although we didn't have a great spot. We did our first trip 12 years ago and it's likely our last with the kids so it was special. We stayed off site but close enough to walk. Pros: It's Disney. Love Pirates, Indiana Jones, Space Mountain,etc. We got to see a little of Star Wars Land which looks pretty impressive. Cons: It was very crowded. The Castle was being refurbished . Not an issue for us, but if you had a 6 year old Princess lover it could be a problem. The current fireworks are not nearly as good as the old ones. Kind of odd for us.I think this is our 6th trip so we know the park very well and move around it all the time. Our kids had the app down to get quick time estimates. We found some times when ride times were low and were generally out of park mid afternoon but there early and late. We quickly found out it was best to hit rope drop a the park not on early hours to get in some quick rides. (5619) Long wait times for rides and character interactions often exceed 30 minutes to over 2 hours, with some attractions reaching up to 6 hours, leading to visitor frustration. [2, 7, 10, ...] Review 43: Long lines 2 1 2 hrs for one ride was a little much for active kiddos. But the ride was great once we got on it.. (5184) High admission prices and expensive food and merchandise contribute to perceptions of Disneyland as overpriced and unaffordable for average families. [1, 8, 9, ...] Review 1: A great day of simple fun and thrills. Bring cash, nothing is cheap, but we knew that it's Disney. But they are great letting you bring in your own food, drinks, etc but read the list closely, we list several items at the entry gates (selfy sticks, glass refill bottles, etc). It is worth buying the photo pass and fastpass. Have fun! (4532) The FastPass system is largely ineffective due to limited availability, poor management, and confusion among guests. [1, 2, 3, ...] Review 104: Spring break crowds were noticeable in mid march. Max pass worth the extra money for the popular rides. The Magic Mix Fireworks was a great show! We arrived at the Fantasmic Fast Pass Area about 15 minutes before the time on the fast pass and the line was already very long and we were not really able to see once we made it to the fast past viewing area. Really enjoyed the Minnie and Friends Breakfast at the Plaza Inn. (2063) Reports of declining staff enthusiasm, rudeness, and poor service diminish overall service quality and atmosphere. [6, 16, 30, ...] Review 766: Disneyland hasn't change much. It been 15 years ago. But was nice to visit again. I love just spend time with family. But park is okay need improvement. (1987) Food quality is often low, with few healthy options available, and high prices contribute to visitor dissatisfaction. [12, 18, 26, ...] Review 612: The happiest place on earth as they say. Disneyland has a magical atmosphere especially with the fancy names of rides and restaurants. Was slightly disappointing that the characters don't pop up more often like they used to years ago. But it's quite organised that they have characters in different locations at particular times for autograph signing. Food is very expensive and there isn't enough choice for vegetarians considering people visit from all over the world. Overall, disney has a way to bring out the kid in everyone and we deffo made the most of our 6th trip (1959) Many visitors can only experience a limited number of rides per day, necessitating extensive planning that detracts from enjoyment. [3, 7, 18, ...] Review 1506: We spent 2 days at Disneyland and California Adventure park on a hopper tickets and despite the crowds had a great time. In fact the crowds never really were an issue.We used the Disneyland App extensively and I would highly recommend this.We also used Max pass which allowed us to do fast pass selections on the mobile device at a higher frequency then standard this was awesome and allowed us to do heaps of rides we otherwise would have done.Also we used e tickets on entry and didn t do the print thing. The other piece of advice was with kids 2 days is required and if you want to sit down for lunch do it by mid day latest!What a fantastic time we had (1872) Limited character interactions and excessive wait times for meet-and-greets leave guests feeling underwhelmed. [51, 67, 82, ...] Review 1041: Los Angeles Disneyland located in Anaheim, it s not as big as the other Disneyland around the world, but it s still fun and the photo package is cheaper compared to other Disneyland, some of the characters meet and greet are not stationary, they walk around the park, so please do approach them if you see them in the park and ask for a photograph or signature, have fun! (1615) Frequent ride breakdowns and closures occur without prior notice, causing disappointment and safety concerns. [12, 48, 51, ...] Review 551: It s Disney, of course it s going to be great!!! Even the bad experience we had with rides breaking down was more than put right by Guest Relations. A magical time at the happiest place on earth! (1540) Major construction disrupts access to attractions and amenities, impacting the overall experience. [12, 22, 25, ...] Review 2184: Most of the cast members were average or slightly rude. I'm used to the happy cast members of the Magic Kingdom that want to brighten everyone's day. We went on a November 2nd (Thursday) the park was not overly busy. Be aware: The line to search your bag is long and the line to enter took much longer than expected. It would be nice to see some updates. Many parts of the park appear dated. We met Snow White she had chipped nail polish. Just not what I expected. We loved the Royal Theater performances!!! All the actors were fabulous!!! Highly recommend it!Overall there were some cool things to experience, but no need to come back. We will go to Disney World's Magic Kingdom from now on.