arXiv Paper Recommendations¶
This notebook analyzes the latest papers published on arXiv and surfaces reading recommendations based on your interests.
The analysis is a two-stage pipeline:
- (filter) Filter out irrelevant papers: ones on topics you're not interested in.
- (sort) Sort the remaining papers according to your research interests and other factors like reputation of authors.
This notebook uses the OpenAI API and costs about $2.50 to run. You can reduce costs by sub-sampling the dataset or using cheaper models.
%pip install semlib arxiv
We start by initializing a Semlib Session. A session provides a context for performing Semlib operations. We configure the session to cache LLM responses on disk in cache.db
.
This notebook uses OpenAI models. If your OPENAI_API_KEY
is not already set in your environment, you can uncomment the line at the bottom of the next cell and set your API key there.
import semlib
from semlib import OnDiskCache, Session
session = Session(cache=OnDiskCache("cache.db"))
# Uncomment the following lines and set your OpenAI API key if not already set in your environment
# import os
# os.environ["OPENAI_API_KEY"] = "..."
Download and preview data¶
We start by defining a function to fetch arXiv paper metadata given a set of categories along with a date range.
from datetime import date
import arxiv
def get_papers(categories: list[str], start_date: date, end_date: date) -> list[arxiv.Result]:
query_cat = " OR ".join(f"cat:{cat}" for cat in categories)
query_date = f"submittedDate:[{start_date.strftime('%Y%m%d')} TO {end_date.strftime('%Y%m%d')}]"
query = f"({query_cat}) AND {query_date}"
search = arxiv.Search(query)
client = arxiv.Client()
return list(client.results(search))
Next, we fetch a batch of papers. Feel free to edit the list of categories to match your interests, or update the date range to get the most recent papers at the time you're running this notebook.
papers = get_papers(["cs.AI", "cs.LG"], date(2025, 8, 29), date(2025, 9, 4))
print(f"Number of papers: {len(papers)}\n")
print(f"Example title: {papers[0].title}\n")
print(f"Example abstract: {papers[0].summary[:400].replace('\n', ' ')}...")
Number of papers: 907 Example title: MyGO: Memory Yielding Generative Offline-consolidation for Lifelong Learning Systems Example abstract: Continual or Lifelong Learning aims to develop models capable of acquiring new knowledge from a sequence of tasks without catastrophically forgetting what has been learned before. Existing approaches often rely on storing samples from previous tasks (experience replay) or employing complex regularization terms to protect learned weights. However, these methods face challenges related to data priva...
Find most relevant papers¶
Filter out irrelevant papers¶
We start off by filtering out papers that are irrelevant, given a list of topics you're definitely not interested in. We do this with Semlib's filter method. By default, this method keeps items matching a criteria, but sometimes, LLMs perform better on an "inverse" binary classification problem, like "is this paper about any of the following topics?", rather than "is this paper NOT about any of the following topics?", so this method supports a negate=True
argument where it keeps all items that do not match the criteria given to the LLM.
For this rough filtering stage, we use a low-cost model, gpt-4.1-nano
.
Feel free to edit the list of topics in the prompt below to match your preferences.
The following cell takes about 30 seconds to run with the default max_concurrency
level (feel free to change it in the Session
constructor above).
relevant = await session.filter(
papers,
template=lambda p: f"""
Your task is to determine if the following academic paper is on any of the following topics.
Paper title: {p.title}
Paper abstract: {p.summary}
It the paper about any of the following topics?
- Medicine
- Healthcare
- Biology
- Chemistry
- Physics
""".strip(),
model="openai/gpt-4.1-nano",
negate=True,
)
We can see how many papers we managed to filter out, and also take a look at some of the irrelevant papers, to make sure the filter worked well.
print(f"Filtered out {len(papers) - len(relevant)} irrelevant papers, including:")
for paper in list({i.title for i in papers} - {i.title for i in relevant})[:5]:
print(f"- {paper}")
Filtered out 228 irrelevant papers, including: - Abex-rat: Synergizing Abstractive Augmentation and Adversarial Training for Classification of Occupational Accident Reports - Deep Self-knowledge Distillation: A hierarchical supervised learning for coronary artery segmentation - Quantum-Enhanced Natural Language Generation: A Multi-Model Framework with Hybrid Quantum-Classical Architectures - Multimodal learning of melt pool dynamics in laser powder bed fusion - Temporally-Aware Diffusion Model for Brain Progression Modelling with Bidirectional Temporal Regularisation
Sort papers by relevance¶
Next, we sort the list of papers by relevance using Semlib's sort method, which sorts items by using an LLM to perform pairwise comparisons. The API supports framing the comparison task in a number of ways. Here, we ask the LLM to choose the better fit between options "A" and "B".
We start by defining the prompt for the LLM. We put the static instructions at the start of the LLM prompt to take advantage of prompt caching.
Feel free to edit the list of interests to match yours.
COMPARISON_TEMPLATE = """
You are a research assistant. Help me pick a research paper to read, based on what is most relevant to my interests and what is most likely to be high-quality work based on the title, authors, and abstract.
You will be given context on my interests, and two paper abstracts.
My research interests include:
- Machine learning and artificial intelligence
- Systems
- Security
- Formal methods
Here is paper Option A:
<option A>
{}
</option A>
Here is paper Option B:
<option B>
{}
</option B>
Choose the option (either A or B) that is more relevant to my interests and likely to be a high-quality work.
""".strip()
The sort API supports a variety of alternatives for supplying a prompt template, such as providing a callable that takes a pair of items and returns a string. In this notebook, we supply a to_str
function that converts items to a string representation, and a prompt template that is a format string with two placeholders.
Next, we define the to_str
function, which converts a paper (metadata object) to a string.
def to_str(paper: arxiv.Result) -> str:
return f"""
Title: {paper.title}
Authors: {', '.join(author.name for author in paper.authors)}
Abstract: {paper.summary}
""".strip()
Finally, we're ready to call sort()
. Earlier, we used the gpt-4.1-nano
model to filter papers because that's an easy task and this model is cheaper. For the following sort operation, we use the gpt-4.1-mini
model. Semlib lets you choose the model on a per-operation basis to control the cost-quality-latency tradeoff.
Here, we use the Quicksort algorithm for an average O(n log n) LLM calls. By default, sort performs O(n^2) LLM calls to achieve a higher-quality result.
The following cell takes about 7 minutes to run with the default max_concurrency
setting.
sorted_results = await session.sort(
relevant,
to_str=to_str,
template=COMPARISON_TEMPLATE,
algorithm=semlib.sort.QuickSort(randomized=False),
model="openai/gpt-4.1-mini",
)
Cost analysis¶
f"${session.total_cost():.2f}"
'$2.53'
def format_paper(paper: arxiv.Result) -> str:
return f"""{paper.title} ({', '.join(author.name for author in paper.authors)})
{paper.entry_id}
{paper.summary[:200].replace('\n', ' ')}..."""
for i, p in enumerate(reversed(sorted_results[-5:])):
print(f"{i+1}. {format_paper(p)}\n\n")
1. Enabling Trustworthy Federated Learning via Remote Attestation for Mitigating Byzantine Threats (Chaoyu Zhang, Heng Jin, Shanghao Shi, Hexuan Yu, Sydney Johns, Y. Thomas Hou, Wenjing Lou) http://arxiv.org/abs/2509.00634v1 Federated Learning (FL) has gained significant attention for its privacy-preserving capabilities, enabling distributed devices to collaboratively train a global model without sharing raw data. However... 2. zkLoRA: Fine-Tuning Large Language Models with Verifiable Security via Zero-Knowledge Proofs (Guofu Liao, Taotao Wang, Shengli Zhang, Jiqun Zhang, Shi Long, Dacheng Tao) http://arxiv.org/abs/2508.21393v1 Fine-tuning large language models (LLMs) is crucial for adapting them to specific tasks, yet it remains computationally demanding and raises concerns about correctness and privacy, particularly in unt... 3. An Information-Flow Perspective on Explainability Requirements: Specification and Verification (Bernd Finkbeiner, Hadar Frenkel, Julian Siber) http://arxiv.org/abs/2509.01479v1 Explainable systems expose information about why certain observed effects are happening to the agents interacting with them. We argue that this constitutes a positive flow of information that needs to... 4. Poisoned at Scale: A Scalable Audit Uncovers Hidden Scam Endpoints in Production LLMs (Zhiyang Chen, Tara Saba, Xun Deng, Xujie Si, Fan Long) http://arxiv.org/abs/2509.02372v1 Large Language Models (LLMs) have become critical to modern software development, but their reliance on internet datasets for training introduces a significant security risk: the absorption and reprod... 5. ANNIE: Be Careful of Your Robots (Yiyang Huang, Zixuan Wang, Zishen Wan, Yapeng Tian, Haobo Xu, Yinhe Han, Yiming Gan) http://arxiv.org/abs/2509.03383v1 The integration of vision-language-action (VLA) models into embodied AI (EAI) robots is rapidly advancing their ability to perform complex, long-horizon tasks in humancentric environments. However, EA...
Least aligned¶
for i, p in enumerate(sorted_results[:5]):
print(f"{i+1}. {format_paper(p)}\n\n")
1. Content and Engagement Trends in COVID-19 YouTube Videos: Evidence from the Late Pandemic (Nirmalya Thakur, Madeline D Hartel, Lane Michael Boden, Dallas Enriquez, Boston Joyner Ricks) http://arxiv.org/abs/2509.01954v1 This work investigated about 10,000 COVID-19-related YouTube videos published between January 2023 and October 2024 to evaluate how temporal, lexical, linguistic, and structural factors influenced eng... 2. Generative KI für TA (Wolfgang Eppler, Reinhard Heil) http://arxiv.org/abs/2509.02053v1 Many scientists use generative AI in their scientific work. People working in technology assessment (TA) are no exception. TA's approach to generative AI is twofold: on the one hand, generative AI is ... 3. Why it is worth making an effort with GenAI (Yvonne Rogers) http://arxiv.org/abs/2509.00852v1 Students routinely use ChatGPT and the like now to help them with their homework, such as writing an essay. It takes less effort to complete and is easier to do than by hand. It can even produce as go... 4. Community-Centered Spatial Intelligence for Climate Adaptation at Nova Scotia's Eastern Shore (Gabriel Spadon, Oladapo Oyebode, Camilo M. Botero, Tushar Sharma, Floris Goerlandt, Ronald Pelot) http://arxiv.org/abs/2509.01845v1 This paper presents an overview of a human-centered initiative aimed at strengthening climate resilience along Nova Scotia's Eastern Shore. This region, a collection of rural villages with deep ties t... 5. Quantifying the Social Costs of Power Outages and Restoration Disparities Across Four U.S. Hurricanes (Xiangpeng Li, Junwei Ma, Bo Li, Ali Mostafavi) http://arxiv.org/abs/2509.02653v1 The multifaceted nature of disaster impact shows that densely populated areas contribute more to aggregate burden, while sparsely populated but heavily affected regions suffer disproportionately at th...