Research
My research interests lie at the intersection of Information Retrieval and Natural Language Processing.
More specifically, I'm interested in tasks such as Open Domain Question Answering, Fact Verification, and Document Ranking.
In recent months, I have also been investigating the memory component of Large Language Models and the interplay between the inherent reasoning and memory modules, entangled in a single LLM or otherwise.
I look forward to contributing to the next generation of reasoners capable of working with a constantly evolving ocean of both structured and unstructured data.
Some of my earlier work explores how to build neural search systems that promote correct and reliable information and work well in low-resource domains such as biomedical texts.
|
- Feb 2023: Organizing the (TREC RAG 2024 Track! Do submit your systems :)
- Dec 2023: We introduced RankZephyr which garnered great community engagement ([1] & [2])!
- Dec 2023: I'm excited to visit Singapore for EMNLP 2023 to present our work "How Does Generative Retrieval Scale to Millions of Passages?"
- Nov 2023: I will be leading the TREC 2024 Retrieval-Augmented Generation Track! More information coming soon!
- Sep 2023: We introduced RankVicuna, the first zero-shot listwise reranker that leverages open-source LLMs!
Ragnarök: A Reusable RAG Framework and Baselines for TREC 2024 Retrieval-Augmented Generation Track
Ronak Pradeep, Nandan Thakur, Sahel Sharifymoghaddam, Eric Zhang, Ryan Nguyen, Daniel Campos, Nick
Craswell, Jimmy Lin
Under Review for a Suitable Conference
paper
|
Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with
10 Gold Labels
Jasper Xian, Saron Samuel, Faraz Khoubsirat, Ronak Pradeep, Md Arafat Sultan, Radu Florian, Salim
Roukos, Avirup Sil, Christopher Potts, Omar Khattab
Under Review for a Suitable Conference
paper
|
UMBRELA: UMbrela is the (Open-Source Reproduction of the) Bing RELevance Assessor
Shivani Upadhyay, Ronak Pradeep, Nandan Thakur, Nick Craswell, Jimmy Lin
Under Review for a Suitable Conference
paper
|
ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large
Language Models
Ronak Pradeep, Daniel Lee, Ali Mousavi, Jeffrey Pound, Yisi Sang, Jimmy Lin, Ihab Ilyas, Saloni
Potdar, Mostafa Arefiyan, Yunyao Li
ACL 2024 KaLLM + Knowledgeable Language Models Workshop
Under Review for a Suitable Conference
paper
|
Entity Disambiguation via Fusion Entity Decoding
Junxiong Wang, Ali Mousavi, Omar Attia, Ronak Pradeep, Saloni Potdar, Alexander Rush, Umar Farooq
Minhas, Yunyao Li
NAACL 2024
paper
|
Zero-Shot Cross-Lingual Reranking with Large Language Models for Low-Resource Languages
Mofetoluwa Adeyemi, Akintunde Oladipo, Ronak Pradeep, Jimmy Lin
ACL 2024
paper
|
Scaling Down, LiTting Up: Efficient Zero-Shot Listwise Reranking with Seq2seq Encoder-Decoder Models
Manveer Singh Tamber, Ronak Pradeep, Jimmy Lin
Under Review for a Suitable Conference
code / paper
|
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
Ronak Pradeep, Sahel Sharifymoghaddam, Jimmy Lin
Under Review for a Suitable Conference
code / paper
|
RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models
Ronak Pradeep,
Sahel Sharifymoghaddam,
Jimmy Lin
Under Review for a Suitable Conference
code / paper
|
Vector Search with OpenAI Embeddings: Lucene Is All You Need
Jimmy Lin,
Ronak Pradeep,
Tommaso Teofili,
Jasper Xian
WSDM 2024 Demo
|
End-to-End Health Misinformation-Free Search with a Large Language Model
Ronak Pradeep,
Jimmy Lin
Under Review for a Suitable Conference
|
How Does Generative Retrieval Scale to Millions of Passages?
Ronak Pradeep,
Kai Hui,
Jai Gupta,
Adam D Lelkes,
Honglei Zhuang,
Jimmy Lin,
Donald Metzler,
Vinh Q Tran
EMNLP 2023, SIGIR 2023 GenIR Workshop
|
ReadProbe: A Demo of Retrieval-Enhanced Large Language Models to Support Lateral Reading
Dake Zhang,
Ronak Pradeep
arXiv
|
Zero-Shot Listwise Document Reranking with a Large Language Model
Xueguang Ma,
Xinyu Zhang,
Ronak Pradeep,
Jimmy Lin
arXiv
|
Pre-processing Matters! Improved Wikipedia Corpora for Open-Domain Question Answering
Manveer Singh Tamber,
Ronak Pradeep,
Jimmy Lin
ECIR 2023 Reproducibility
|
PyGaggle: A Gaggle of Resources for Open-Domain Question Answering
Ronak Pradeep,
Haonan Chen,
Lingwei Gu,
Manveer Singh Tamber,
Jimmy Lin
ECIR 2023 Reproducibility
|
Neural Query Synthesis and Domain-Specific Ranking Templates for Multi-Stage Clinical Trial Matching
Ronak Pradeep,
Yilin Li,
Yuetong Wang,
Jimmy Lin
SIGIR 2022
|
Document Expansion Baselines and Learned Sparse Lexical Representations for MS MARCO v1 and v2
Xueguang Ma,
Ronak Pradeep>,
Rodrigo Nogueira,
Jimmy Lin
SIGIR 2022 Reproducibility
|
Another Look at DPR: Reproduction of Training and Replication of Retrieval
Xueguang Ma,
Kai Sun,
Ronak Pradeep,
Minghan Li,
Jimmy Lin
ECIR 2022 Reproducibility
code
|
New Nails for Old Hammers: Anserini and Pyserini at TREC 2021
Jimmy Lin,
Haonen Chen,
Chengcheng Hu,
Sheng-Chieh Lin,
Yilin Li,
Xueguang Ma,
Ronak Pradeep,
Jheng-Hong Yang,
Chuan-Ju Wang,
Andrew Yates,
Xinyu Zhang
TREC 2021 Proceedings
code
|
Vera: Prediction Techniques for Reducing Harmful Misinformation In Consumer Health Search
Ronak Pradeep,
Xueguang Ma,
Rodrigo Nogueira,
Jimmy Lin
SIGIR 2021
code
/
paper
|
Chatty Goose: A Python Framework for Conversational Search
Edwin Zhang,
Sheng-Chieh Lin,
Jheng-Hong Yang,
Ronak Pradeep,
Rodrigo Nogueira,
Jimmy Lin
SIGIR 2021 Demo
code
/
paper
|
Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations
Jimmy Lin,
Xueguang Ma,
Sheng-Chieh Lin,
Jheng-Hong Yang,
Ronak Pradeep,
Rodrigo Nogueira
SIGIR 2021 Resource
code
/
paper
|
H2oloo at TAC 2020: Epidemic Question Answering
Justin Borromeo,
Ronak Pradeep,
Jimmy Lin
TAC 2020 Proceedings
Code and paper to be added.
|
Exploring Listwise Evidence Reasoning with T5 for Fact Verification
Kelvin Jiang,
Ronak Pradeep,
Jimmy Lin
ACL 2021
code
/
paper
|
H2oloo at TREC 2020: When all you got is a Hammer... Deep Learning, Health Misinformation, and Precision Medicine
Ronak Pradeep,
Xueguang Ma,
Xinyu Zhang,
Hang Cui,
Ruizhou Xu,
Rodrigo Nogueira,
Jimmy Lin
TREC 2020 Proceedings
code
/
paper
|
Scientific Claim Verification with VerT5erini
Ronak Pradeep,
Xueguang Ma,
Rodrigo Nogueira,
Jimmy Lin
LOUHI: EACL 2021 Workshop
code
/
paper
|
A Replication Study of Dense Passage Retriever
Xueguang Ma,
Kai Sun,
Ronak Pradeep,
Jimmy Lin
Will be submitted to a suitable venue
code
/
paper
|
Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset
Edwin Zhang,
Nikhil Gupta,
Raphael Tang,
Xiao Han,
Ronak Pradeep,
Kuang Lu,
Yue Zhang,
Rodrigo Nogueira,
Kyunghyun Cho,
Hui Fang,
Jimmy Lin
Scholarly Document Processing: EMNLP 2020 Workshop
code
/
paper
/
website
|
The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models
Ronak Pradeep,
Rodrigo Nogueira,
Jimmy Lin
Will be submitted to a suitable venue
code
/
paper
|
Document Ranking with a Pretrained Sequence-to-Sequence Model
Rodrigo Nogueira,
Zhiying Jiang,
Ronak Pradeep,
Jimmy Lin
EMNLP 2020 Findings
code
/
paper
|
Foveated Down-Sampling Techniques
Parsa Torabian,
Ronak Pradeep,
Jeff Orchard,
Bryan Tripp
CVIS 2020
paper
|
Playlists
Thanks for making it to here :-) As a token of gratiude and since you asked nicely for it, I shall also introduce you to a few of my Spotify playlists.
|
ॐ
And everything under the sun is in tune. A music dump of sorts. Updated regularly.
|
Liebesträume
And what exactly is a dream of love? Here I take on Liszt and attempt to provide a longer answer to aid with my sleep. Updated semi-regularly.
|
|