arXiv is Overwhelming — Here's How to Filter What Matters

If you are a researcher who checks arXiv every morning, you already know the feeling. You open the new submissions page, scroll through dozens of titles, and twenty minutes later you have three tabs open and zero confidence that you haven't missed something important. arXiv is the backbone of open science, but it was never designed to help you find the right paper. It was designed to host all of them.

The numbers tell the story. arXiv now receives over 500 new submissions per day across all categories. In cs.AI alone, you can expect 50 to 100 new papers on any given weekday. Subfields like machine learning, computer vision, and NLP each add their own firehose on top of that. If your research sits at the intersection of two or three areas — as most interesting work does — the volume becomes genuinely unmanageable.

The question every active researcher eventually asks is simple: how to filter arXiv papers so that the important ones surface and the noise fades away. This article walks through what works, what doesn't, and a modern approach that finally gets it right.

Why arXiv's Built-in Tools Fall Short

arXiv is a preprint server, not a recommendation engine. Its tools reflect that origin.

Category subscriptions are too coarse. You can subscribe to cs.AI or cs.LG, but these categories are enormous tents. cs.AI covers everything from knowledge representation to autonomous driving to AI ethics. If you study reinforcement learning for robotics, you will wade through dozens of irrelevant papers for every one that matters. There is no way to subscribe to a sub-topic — only to an entire category.

The email digest is a wall of text. arXiv's daily email notification sends you a plain list of titles and abstracts. No ranking, no personalization, no indication of which papers are close to your interests versus tangentially related. You are expected to read every title and mentally filter. For a category with 80 new papers, that is a brutal daily chore.

Search is keyword-matching, not semantic. arXiv's search works, but it is a basic keyword index. It does not understand that "policy optimization in manipulation tasks" is related to "reinforcement learning for robotic grasping." If you don't guess the exact terminology the authors used, you miss the paper.

No learning from your behavior. arXiv treats every user identically. It doesn't know that you saved three papers on diffusion models last week or that you always skip papers about theorem proving. There is no feedback loop, no personalization, no memory.

For years, researchers have patched over these gaps with external tools. Some of those tools are excellent. Most have significant limitations.

The Landscape of arXiv Filtering Tools

arxiv-sanity (and arxiv-sanity-lite)

Andrej Karpathy's arxiv-sanity was the gold standard for personalized arXiv filtering. It let you build a library of papers and used that library to rank new submissions by relevance. The concept was ahead of its time.

The problem: the original arxiv-sanity is effectively discontinued. arxiv-sanity-lite still runs, but it covers a narrow slice of categories, the interface is minimal, and there is no mobile experience. For many researchers, it was a proof of concept that the ecosystem never fully replaced.

Hugging Face Daily Papers

Hugging Face curates a daily selection of noteworthy papers, driven by community upvotes and editorial picks. If you work in NLP or core ML, this is a genuinely useful resource.

But it is narrowly scoped by design. If your field is biomedical imaging, computational neuroscience, or materials science, Hugging Face Daily Papers has little to offer. It is a community feed for one corner of research, not a general-purpose filtering tool.

arXiv Vanity / ar5iv

These tools render arXiv papers as responsive HTML instead of PDF. That is a real quality-of-life improvement for reading on phones and tablets, but they solve a formatting problem, not a discovery problem. You still need to find the paper before you can read it in a nicer layout.

Semantic Scholar / Connected Papers

Semantic Scholar provides excellent citation-based discovery and paper recommendations. Connected Papers builds visual graphs of related work. Both are powerful for exploratory research — following citation chains, mapping a field.

They are less effective for daily monitoring. If you want to know what dropped on arXiv today that matches your interests, citation graphs built on established papers won't surface brand-new preprints that haven't been cited yet.

RSS Readers and Custom Scripts

Some researchers build their own pipelines: arXiv RSS feeds piped into Feedly, filtered by regex, sometimes with a GPT layer on top. This works surprisingly well if you enjoy maintaining personal infrastructure. Most people don't, and the setup breaks every time arXiv changes its feed format.

The common thread across all these approaches is that none of them close the feedback loop. They push papers at you. They don't learn from what you actually read, save, or ignore.

A Better Approach: Keywords In, Personalized Papers Out

What if figuring out how to filter arXiv papers didn't require subscribing to categories, configuring RSS feeds, or writing scripts? What if you just told a system what you care about, and it searched everywhere for you?

That is the core idea behind ZiNote. The workflow is radically simple:

Enter your research keywords. Type something like "reinforcement learning for robotics" or "single-cell RNA sequencing normalization." Be as specific or as broad as you want.
The system searches across all sources. Not just arXiv — also PubMed, Semantic Scholar, and other academic databases. You don't pick categories. You don't select sources. The system handles all of that behind the scenes.
Papers arrive in a swipe interface. Each paper is presented with its title, authors, abstract, and an AI-generated summary. Swipe right to save, left to skip.
Every swipe teaches the system. The papers you save and the papers you skip create a feedback signal. Over days and weeks, your feed gets sharper. The system learns not just your keywords but your taste — which sub-topics you actually engage with, which writing styles you prefer, which author clusters you gravitate toward.

This is what modern paper discovery should feel like. You are not manually checking arXiv listings. You are not scanning email digests. You are spending your limited reading time on papers that a system — trained on your behavior — thinks you should see.

Why Cross-Source Search Matters

One underappreciated problem with arXiv-only tools is that not all important papers land on arXiv first. Biomedical research often appears on PubMed or bioRxiv before (or instead of) arXiv. Conference papers sometimes go directly to proceedings. Workshop papers might only exist in OpenReview.

When you learn how to filter arXiv papers effectively, the next realization is that arXiv is only one source. A system that searches across multiple databases with a single set of keywords eliminates the need to maintain separate workflows for each platform. You set your interests once and get results from everywhere.

From Zero to First Swipe in Five Minutes

Here is what the onboarding actually looks like:

Download ZiNote on your phone (iOS or Android).
Create an account — takes thirty seconds.
Add your first keyword set. The app prompts you to describe your research interests. You can add multiple keyword groups if you work across fields. For example, one set for "graph neural networks for drug discovery" and another for "LLM evaluation benchmarks."
Your feed populates immediately. The system runs your keywords against its sources and returns an initial batch of papers, sorted by relevance.
Start swiping. Each paper takes 10-15 seconds to evaluate. The AI summary gives you the gist without reading the full abstract. Swipe right on anything worth reading later. Swipe left to move on.

Within a single commute, you can process more papers — with better precision — than you would in an hour of manual arXiv browsing. And unlike that hour of browsing, every action you take makes tomorrow's feed better.

The AI translation feature deserves a mention here too. If you encounter papers with dense technical language or if English is not your first language, the built-in AI translation can render abstracts and summaries in your preferred language. It is a small feature that removes a real friction point for the global research community.

Bonus: One-Click Zotero Sync

If you use Zotero — and a huge number of researchers do — you know the pain of the save-to-Zotero workflow on mobile. Find the paper, copy the DOI, open Zotero, add by identifier, wait, tag it, file it.

ZiNote shortcuts all of that. Papers you save sync directly to your Zotero library. The metadata is already structured. You swipe right, and the paper appears in Zotero, ready to be organized into collections and annotated. No copy-pasting, no manual entry, no browser extension required.

This turns your swiping session into a genuine literature review workflow. At the end of the week, your Zotero library has a curated set of new papers that reflect your actual interests, not a random sample of what you happened to stumble across.

The Real Cost of Not Filtering

Researchers often treat the time spent browsing arXiv as unavoidable overhead — just part of the job. But consider the math. If you spend 30 minutes a day scanning new submissions manually, that is over 180 hours per year. More than four full work weeks, spent not reading papers, not writing, not running experiments — just looking for papers.

Learning how to filter arXiv papers efficiently is not a productivity hack. It is a structural improvement to how you do research. The less time you spend on discovery, the more time you have for the work that actually moves your field forward.

Start Filtering Smarter

If your current arXiv workflow involves email digests, manual category browsing, or a prayer that Twitter surfaces the right papers, there is a better way.

Download ZiNote and set up your first keyword feed in under five minutes. Let the AI search arXiv and every other source for you. Swipe through what it finds. Save what matters. Skip what doesn't. Watch your feed get smarter every day.

Your next important paper is already on arXiv. The question is whether you'll find it — or whether it'll get buried on page three of a category listing you forgot to check.