Ronak Pradeep

PhD Student
Apple PhD Fellow

Hi! I am a PhD student in the David R. Cheriton School of Computer Science at the University of Waterloo, advised by Jimmy Lin. The past few months, I've been working on LLMs (and that sort of things) at Yupp AI. During my PhD, I've also had the chance to work at Google and Apple.

Previously, I completed my undergraduate studies at the University of Waterloo, where I majored in Computer Science and Combinatorics and Optimization. During my undergrad, I've had the chance to intern at Quebec Artificial Intelligence Institute (Mila), ContextLogic, and RBC Research.

Ronak Pradeep

Research

My research interests lie at the intersection of Information Retrieval and Natural Language Processing. More specifically, I'm interested in tasks such as Open Domain Question Answering, Fact Verification, and Document Ranking. In recent months, I have also been investigating the memory component of Large Language Models and the interplay between the inherent reasoning and memory modules, entangled in a single LLM or otherwise. I look forward to contributing to the next generation of reasoners capable of working with a constantly evolving ocean of both structured and unstructured data. Some of my earlier work explores how to build neural search systems that promote correct and reliable information and work well in low-resource domains such as biomedical texts.

Updates

Apr 2025

We have fives papers accepted at SIGIR 2025! See you all in Padova!

Nov 2024

We had a successful first year of the TREC RAG 2024 Track with many submissions! Looking forward to Y2.

Feb 2024

Organizing the TREC RAG 2024 Track! Do submit your systems :))

Dec 2023

We introduced RankZephyr which garnered great community engagement!

Dec 2023

I'm excited to visit Singapore for EMNLP 2023 to present our work "How Does Generative Retrieval Scale to Millions of Passages?"

Nov 2023

I will be leading the TREC 2024 Retrieval-Augmented Generation Track! More information coming soon!

Sep 2023

We introduced RankVicuna, the first zero-shot listwise reranker that leverages open-source LLMs!

Papers

[39] RankLLM: A Python Package for Reranking with LLMs

Sahel Sharifymoghaddam, Ronak Pradeep, Andre Slavescu, Ryan Nguyen, Andrew Xu, Zijian Chen, Yilin Zhang, Yidi Chen, Jasper Xian, Jimmy Lin

SIGIR 2025

[38] Gosling Grows Up: Retrieval with Learned Dense and Sparse Representations Using Anserini

Jimmy Lin, Arthur Haonan Chen, Carlos Lassance, Xueguang Ma, Ronak Pradeep, Tommaso Teofili, Jasper Xian, Jheng-Hong Yang, Brayden Zhong, Vincent Zhong

SIGIR 2025

[37] Support Evaluation for the TREC 2024 RAG Track: Comparing Human versus LLM Judges

Nandan Thakur, Ronak Pradeep, Shivani Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin

SIGIR 2025

[36] The Great Nugget Recall: Automating Fact Extraction and RAG Evaluation with Large Language Models

Ronak Pradeep, Nandan Thakur, Shivani Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin

SIGIR 2025

[35] Initial Nugget Evaluation Results for the TREC 2024 RAG Track with the AutoNuggetizer Framework

Ronak Pradeep, Nandan Thakur, Shivani Upadhyay, Daniel Campos, Nick Craswell, Jimmy Lin

arXiv

[34] A Large-Scale Study of Relevance Assessments with Large Language Models: An Initial Look

Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Daniel Campos, Nick Craswell, Ian Soboroff, Hoa Trang Dang, Jimmy Lin

Under Review for a Suitable Conference

[32] Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track

Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick Craswell, Jimmy Lin

Under Review for a Suitable Conference

[31] Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels

Jasper Xian, Saron Samuel, Faraz Khoubsirat, Ronak Pradeep, Md Arafat Sultan, Radu Florian, Salim Roukos, Avirup Sil, Christopher Potts, Omar Khattab

Under Review for a Suitable Conference

[30] UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor

Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Nick Craswell, Jimmy Lin

Under Review for a Suitable Conference

[29] ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models

Ronak Pradeep, Daniel Lee, Ali Mousavi, Jeffrey Pound, Yisi Sang, Jimmy Lin, Ihab Ilyas, Saloni Potdar, Mostafa Arefiyan, Yunyao Li

EMNLP 2024 Industry Track, EACL 2024 KaLLM + Knowledgeable Language Models Workshop

[28] Entity Disambiguation via Fusion Entity Decoding

Junxiong Wang, Ali Mousavi, Omar Attia, Ronak Pradeep, Saloni Potdar, Alexander Rush, Umar Farooq Minhas, Yunyao Li

NAACL 2024

[27] Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages

Mofetoluwa Adeyemi, Akintunde Oladipo, Ronak Pradeep, Jimmy Lin

ACL 2024

[26] Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models

Manveer Singh Tamber, Ronak Pradeep, Jimmy Lin

Under Review for a Suitable Conference

[25] RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!

Ronak Pradeep, Sahel Sharifymoghaddam, Jimmy Lin

Under Review for a Suitable Conference

[24] RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

Ronak Pradeep, Sahel Sharifymoghaddam, Jimmy Lin

Under Review for a Suitable Conference

[23] Vector Search with OpenAI Embeddings: Lucene Is All You Need

Jimmy Lin, Ronak Pradeep, Tommaso Teofili, Jasper Xian

WSDM 2024 Demo

[22] End-to-End Health Misinformation-Free Search with a Large Language Model

Ronak Pradeep, Jimmy Lin

Under Review for a Suitable Conference

[21] How Does Generative Retrieval Scale to Millions of Passages?

Ronak Pradeep, Kai Hui, Jai Gupta, Adam D Lelkes, Honglei Zhuang, Jimmy Lin, Donald Metzler, Vinh Q Tran

EMNLP 2023, SIGIR 2023 GenIR Workshop

[19] Zero-Shot Listwise Document Reranking with a Large Language Model

Xueguang Ma, Xinyu Zhang, Ronak Pradeep, Jimmy Lin

arXiv

[18] Pre-processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering

Manveer Singh Tamber, Ronak Pradeep, Jimmy Lin

ECIR 2023 Reproducibility

[17] PyGaggle: A Gaggle of Resources for Open-Domain Question Answering

Ronak Pradeep, Haonan Chen, Lingwei Gu, Manveer Singh Tamber, Jimmy Lin

ECIR 2023 Reproducibility

[15] Document Expansion Baselines and Learned Sparse Lexical Representations for MS MARCO v1 and v2

Xueguang Ma, Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin

SIGIR 2022 Reproducibility

[14] Another Look at DPR: Reproduction of Training and Replication of Retrieval

Xueguang Ma, Kai Sun, Ronak Pradeep, Minghan Li, Jimmy Lin

ECIR 2022 Reproducibility

[13] New Nails for Old Hammers: Anserini and Pyserini at TREC 2021

Jimmy Lin, Haonen Chen, Chengcheng Hu, Sheng-Chieh Lin, Yilin Li, Xueguang Ma, Ronak Pradeep, Jheng-Hong Yang, Chuan-Ju Wang, Andrew Yates, Xinyu Zhang

TREC 2021 Proceedings

[11] Chatty Goose: A Python Framework for Conversational Search

Edwin Zhang, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin

SIGIR 2021 Demo

[10] Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira

SIGIR 2021 Resource

[9] H₂oloo at TAC 2020: Epidemic Question Answering

Justin Borromeo, Ronak Pradeep, Jimmy Lin

TAC 2020 Proceedings

[7] H₂oloo at TREC 2020: When all you got is a Hammer... Deep Learning, Health Misinformation, and Precision Medicine

Ronak Pradeep, Xueguang Ma, Xinyu Zhang, Hang Cui, Ruizhou Xu, Rodrigo Nogueira, Jimmy Lin

TREC 2020 Proceedings

[6] Scientific Claim Verification with VerT5erini

Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, Jimmy Lin

LOUHI: EACL 2021 Workshop

[5] A Replication Study of Dense Passage Retriever

Xueguang Ma, Kai Sun, Ronak Pradeep, Jimmy Lin

Will be submitted to a suitable venue

[4] Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset

Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang, Jimmy Lin

Scholarly Document Processing: EMNLP 2020 Workshop

[3] The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models

Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin

Will be submitted to a suitable venue

[2] Document Ranking with a Pretrained Sequence-to-Sequence Model

Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, Jimmy Lin

EMNLP 2020 Findings

[1] Foveated Down-Sampling Techniques

Parsa Torabian, Ronak Pradeep, Jeff Orchard, Bryan Tripp

CVIS 2020

... and a few more papers I might have potentially missed! See my Google Scholar for the complete list.

Playlists

Thanks for making it to here :-) As a token of gratitude and since you asked nicely for it, I shall also introduce you to a few of my Spotify playlists.

And everything under the sun is in tune. A music dump of sorts. Updated regularly.

A Day In The Life

An allusion to the Beatles song. Curated by a younger me for a someone who stole my heart. Not updated anymore.

Liebesträume

And what exactly is a dream of love? Here I take on Liszt and attempt to provide a longer answer to aid with my sleep. Updated semi-regularly.