As an undergraduate researcher in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley, I collaborated with Alexander J. Spangher under the guidance of Professor Costas J. Spanos. Our research focused on Conditional Information Retrieval (CIR), aiming to enhance the relevance and accuracy of information retrieval systems by incorporating contextual conditions into query processing.
We investigated human information retrieval behaviors and analyzed prior works to establish a comprehensive understanding of CIR. A significant aspect of our research involved producing a large silver standard dataset to serve as ground truth for training and evaluating retrieval models. Additionally, we curated a query-context relationship dataset from 18,000 articles to train a HayStack Dense Passage Retriever, leveraging datapoint embeddings to improve retrieval performance.
Developed a substantial silver standard dataset by studying human information retrieval behaviors and analyzing existing literature. This dataset serves as a ground truth benchmark for evaluating CIR models.

Curated a dataset encompassing query-context relationships from 18,000 articles, facilitating the training of retrieval models to understand and leverage contextual information effectively.

Utilized the curated dataset to train a HayStack Dense Passage Retriever, enhancing the model's ability to retrieve relevant information by understanding the context of queries through datapoint embeddings.

This research project provided valuable insights into the complexities of information retrieval and the importance of context in enhancing retrieval performance. Key learnings include:
Overall, this experience deepened my understanding of natural language processing and machine learning techniques applied to information retrieval, highlighting the critical role of context in developing more effective and user-centric retrieval systems.