This is a concise and complete review of the history of IR research up to 2010. It gives an excellent
historical view of some of the most important major ideas and techniques and their application impact.
Read the whole article without worrying about understanding all the details.
This is an excellent overview paper of IR (up to 2001 obviously) slightly biased toward empirically effective techniques. Your goal of reading it is to know about the general history of IR and a summary of IR techniques from empirical perspective. Read the whole paper.
The goal of reading these notes is to know about some basic concepts in probability, statistics, and information theory. You should read at least Section 3 of the estimation note and all of the information theory note except
for section 1.1.6. You should fully understand
the derivation of the maximum likelihood estimate for the binomial distribution, and most of the
contents in the information theory notes. If you can't understand these, you may want to read
relevant discussions in a textbook on probability and
statistics, and a book on information theory. Any book on these topics should be sufficient.
This is truly a classic paper. Read it to appreciate Bush's great vision more than 6 decades ago, which still has NOT yet completely realized today. As a minimum, read everything starting from section 6.
This is a required reading for completing assignment #1.
This is a classic paper about early SMART experiments. Your goal of reading this paper should be to understand how the authors designed their experiments to test many different hypotheses, and
how they did statistical significance tests to check all the hypotheses. You should also know what are the major conclusions drawn in this paper. Read the whole paper.
This is a nice review of the test collection evaluation method (i.e., Cranfield method) and research on such an evaluation methodology. Your goal of reading this paper is to understand precisely the major IR measures (including precision, recall, fallout, average precision, precision at k documents, MRR, F measure, nDCG, bPREF, etc) and when to use each. Another goal is to get an overview of research work done in IR evaluation. Read the entire paper, but the most important content
is in Chapters 2-4.
This is a good example of an empirical exploration of retrieval models. Read the entire paper to understand
how the pivoted length normalization formulas was derived through experimental study. Read the whole paper.
This is a nice introduction to how one of the most effective retrieval functions, BM25, has been developed and
extended later. Read the entire survey.
This book has a chapter (Chapter 2) on a general survey of major retrieval models and an extensive coverage of statistical language models for IR, which might be useful if you want to have a
good picture of all kinds of retrieval models in general. Chapters 3-5 are most useful for understanding
language models for retrieval. Chapter 7 is useful for understanding topic models. Read other chapters if interested.
This thesis is a systematic study of a new way of developing retrieval models based on an axiomatic framework with promising research results. Your goal of reading it should be to understand the basic idea of this axiomatic approach
and know how formal constraints may help evaluating a retrieval function without doing experiments. Read Chapter 3 and Chapter 4.
Read the entire note.
Read the entire note to understand the EM algorithm rigorously.
Read whatever you can to understand the basic idea of learning to rank.
This is an excellent tutorial on the implementation of a search engine. Read
whatever you can to understand how inverted index is constructed and how it
can be used for scoring documents quickly for a query. Make sure if you know
the basic idea of variable length encoding and how it can be used for compression of integers.
Read chapters 2-4 to understand how MapReduce works and how it can be used for parallel indexing.
This is an excellent survey of probabilistic retrieval models with rigorous treatment of all the major ideas up to early 1990s, including early ideas on learning to rank.
This paper gives a general decision-theoretic framework for modeling information retrieval that can cover many existing
retrieval models. Read the entire paper except for section 5.3.