Ronak Pradeep

Hi! I am a PhD student in the David R. Cheriton School of Computer Science at the University of Waterloo, advised by Jimmy Lin. I am an Apple PhD Fellow. During my PhD, I've also had the chance to work at Google and Apple.

Previously, I completed my undergraduate studies at the University of Waterloo, where I majored in Computer Science and Combinatorics and Optimization. During my undergrad, I've had the chance to intern at Quebec Artificial Intelligence Institute (Mila), ContextLogic, and RBC Research.

I am actively looking for internships in Fall 2024.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github  /  Spotify

profile photo
Research

My research interests lie at the intersection of Information Retrieval and Natural Language Processing. More specifically, I'm interested in tasks such as Open Domain Question Answering, Fact Verification, and Document Ranking. In recent months, I have also been investigating the memory component of Large Language Models and the interplay between the inherent reasoning and memory modules, entangled in a single LLM or otherwise. I look forward to contributing to the next generation of reasoners capable of working with a constantly evolving ocean of both structured and unstructured data. Some of my earlier work explores how to build neural search systems that promote correct and reliable information and work well in low-resource domains such as biomedical texts.

Updates
  • Feb 2023: Organizing the (TREC RAG 2024 Track! Do submit your systems :)
  • Dec 2023: We introduced RankZephyr which garnered great community engagement ([1] & [2])!
  • Dec 2023: I'm excited to visit Singapore for EMNLP 2023 to present our work "How Does Generative Retrieval Scale to Millions of Passages?"
  • Nov 2023: I will be leading the TREC 2024 Retrieval-Augmented Generation Track! More information coming soon!
  • Sep 2023: We introduced RankVicuna, the first zero-shot listwise reranker that leverages open-source LLMs!
Papers
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick Craswell, Jimmy Lin
Under Review for a Suitable Conference
paper
Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels
Jasper Xian, Saron Samuel, Faraz Khoubsirat, Ronak Pradeep, Md Arafat Sultan, Radu Florian, Salim Roukos, Avirup Sil, Christopher Potts, Omar Khattab
Under Review for a Suitable Conference
paper
UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor
Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Nick Craswell, Jimmy Lin
Under Review for a Suitable Conference
paper
ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models
Ronak Pradeep, Daniel Lee, Ali Mousavi, Jeffrey Pound, Yisi Sang, Jimmy Lin, Ihab Ilyas, Saloni Potdar, Mostafa Arefiyan, Yunyao Li
ACL 2024 KaLLM + Knowledgeable Language Models Workshop
Under Review for a Suitable Conference
paper
Entity Disambiguation via Fusion Entity Decoding
Junxiong Wang, Ali Mousavi, Omar Attia, Ronak Pradeep, Saloni Potdar, Alexander Rush, Umar Farooq Minhas, Yunyao Li
NAACL 2024
paper
Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages
Mofetoluwa Adeyemi, Akintunde Oladipo, Ronak Pradeep, Jimmy Lin
ACL 2024
paper
Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models
Manveer Singh Tamber, Ronak Pradeep, Jimmy Lin
Under Review for a Suitable Conference
code / paper
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
Ronak Pradeep, Sahel Sharifymoghaddam, Jimmy Lin
Under Review for a Suitable Conference
code / paper
RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models
Ronak Pradeep, Sahel Sharifymoghaddam, Jimmy Lin
Under Review for a Suitable Conference
code / paper
Vector Search with OpenAI Embeddings: Lucene Is All You Need
Jimmy Lin, Ronak Pradeep, Tommaso Teofili, Jasper Xian
WSDM 2024 Demo
End-to-End Health Misinformation-Free Search with a Large Language Model
Ronak Pradeep, Jimmy Lin
Under Review for a Suitable Conference
How Does Generative Retrieval Scale to Millions of Passages?
Ronak Pradeep, Kai Hui, Jai Gupta, Adam D Lelkes, Honglei Zhuang, Jimmy Lin, Donald Metzler, Vinh Q Tran
EMNLP 2023, SIGIR 2023 GenIR Workshop
ReadProbe: A Demo of Retrieval-Enhanced Large Language Models to Support Lateral Reading
Dake Zhang, Ronak Pradeep
arXiv
Zero-Shot Listwise Document Reranking with a Large Language Model
Xueguang Ma, Xinyu Zhang, Ronak Pradeep, Jimmy Lin
arXiv
Pre-processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering
Manveer Singh Tamber, Ronak Pradeep, Jimmy Lin
ECIR 2023 Reproducibility
PyGaggle: A Gaggle of Resources for Open-Domain Question Answering
Ronak Pradeep, Haonan Chen, Lingwei Gu, Manveer Singh Tamber, Jimmy Lin
ECIR 2023 Reproducibility
Neural Query Synthesis and Domain-Specific Ranking Templates for Multi-Stage Clinical Trial Matching
Ronak Pradeep, Yilin Li, Yuetong Wang, Jimmy Lin
SIGIR 2022
Document Expansion Baselines and Learned Sparse Lexical Representations for MS MARCO v1 and v2
Xueguang Ma, Ronak Pradeep>, Rodrigo Nogueira, Jimmy Lin
SIGIR 2022 Reproducibility
Another Look at DPR: Reproduction of Training and Replication of Retrieval
Xueguang Ma, Kai Sun, Ronak Pradeep, Minghan Li, Jimmy Lin
ECIR 2022 Reproducibility
code
New Nails for Old Hammers: Anserini and Pyserini at TREC 2021
Jimmy Lin, Haonen Chen, Chengcheng Hu, Sheng-Chieh Lin, Yilin Li, Xueguang Ma, Ronak Pradeep, Jheng-Hong Yang, Chuan-Ju Wang, Andrew Yates, Xinyu Zhang
TREC 2021 Proceedings
code
Vera: Prediction Techniques for Reducing Harmful Misinformation In Consumer Health Search
Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, Jimmy Lin
SIGIR 2021
code / paper
Chatty Goose: A Python Framework for Conversational Search
Edwin Zhang, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin
SIGIR 2021 Demo
code / paper
Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations
Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira
SIGIR 2021 Resource
code / paper
H2oloo at TAC 2020: Epidemic Question Answering
Justin Borromeo, Ronak Pradeep, Jimmy Lin
TAC 2020 Proceedings
Code and paper to be added.
Exploring Listwise Evidence Reasoning with T5 for Fact Verification
Kelvin Jiang, Ronak Pradeep, Jimmy Lin
ACL 2021
code / paper
H2oloo at TREC 2020: When all you got is a Hammer... Deep Learning, Health Misinformation, and Precision Medicine
Ronak Pradeep, Xueguang Ma, Xinyu Zhang, Hang Cui, Ruizhou Xu, Rodrigo Nogueira, Jimmy Lin
TREC 2020 Proceedings
code / paper
Scientific Claim Verification with VerT5erini
Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, Jimmy Lin
LOUHI: EACL 2021 Workshop
code / paper
A Replication Study of Dense Passage Retriever
Xueguang Ma, Kai Sun, Ronak Pradeep, Jimmy Lin
Will be submitted to a suitable venue
code / paper
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset
Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira,
Kyunghyun Cho, Hui Fang, Jimmy Lin
Scholarly Document Processing: EMNLP 2020 Workshop
code / paper / website
The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models
Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin
Will be submitted to a suitable venue
code / paper
Document Ranking with a Pretrained Sequence-to-Sequence Model
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, Jimmy Lin
EMNLP 2020 Findings
code / paper
Foveated Down-Sampling Techniques
Parsa Torabian, Ronak Pradeep, Jeff Orchard, Bryan Tripp
CVIS 2020
paper
Playlists

Thanks for making it to here :-) As a token of gratiude and since you asked nicely for it, I shall also introduce you to a few of my Spotify playlists.

And everything under the sun is in tune. A music dump of sorts. Updated regularly.

A Day In The Life

An allusion to the Beatles song. Curated by a younger me for a someone who stole my heart. Not updated anymore.

Liebesträume

And what exactly is a dream of love? Here I take on Liszt and attempt to provide a longer answer to aid with my sleep. Updated semi-regularly.



Code based on Jon Barron's wonderful website!