CS598CXZ Advanced Topics in Information Retrieval (Fall 2014)

Instructor: ChengXiang Zhai

| Home | Basic Information | Schedule |
| Readings | Assignments | Project | Resources |




Readings


Note: In general, the lecture slides are the best "definition" of the core contents -- the contents to be tested in the midterm. That is, you are expected to understand all the major points, models, and techniques that we have discussed in the class; anything beyond the slides can be regarded as optional. Thus, all the lecture slides are required readings.

Required Readings

  1. V. Bush, As we may think, 1945 .

    This is truly a classic paper. Read it to appreciate Bush's great vision more than 6 decades ago, which still has NOT yet completely realized today. As a minimum, read everything starting from section 6.

  2. IR History

    This is a concise and complete review of the history of IR research up to 2010. It gives an excellent historical view of some of the most important major ideas and techniques and their application impact. Read the whole article without worrying about understanding all the details.

  3. A. Singhal, Modern Information Retrieval: A Brief Overview, In IEEE Data Engineering Bulletin 24(4), pages 35-43, 2001. (Error)

    This is an excellent overview paper of IR (up to 2001 obviously) slightly biased toward empirically effective techniques. Your goal of reading it is to know about the general history of IR and a summary of IR techniques from empirical perspective. Read the whole paper.

  4. Rosenfeld's notes (estimation and information theory)

    The goal of reading these notes is to know about some basic concepts in probability, statistics, and information theory. You should read at least Section 3 of the estimation note and all of the information theory note except for section 1.1.6. You should fully understand the derivation of the maximum likelihood estimate for the binomial distribution, and most of the contents in the information theory notes. If you can't understand these, you may want to read relevant discussions in a textbook on probability and statistics, and a book on information theory. Any book on these topics should be sufficient.

  5. [Singhal et al. 96]

    This is a good example of an empirical exploration of retrieval models. Read the entire paper to understand how the pivoted length normalization formulas was derived through experimental study. Read the whole paper.

  6. [Robertson and Zaragoza 2009]

    This is a nice introduction to how one of the most effective retrieval functions, BM25, has been developed and extended later. Read the entire survey.

  7. SLMIR

    This book has a chapter (Chapter 2) on a general survey of major retrieval models and an extensive coverage of statistical language models for IR, which might be useful if you want to have a good picture of all kinds of retrieval models in general. Chapters 3-6 are most useful for understanding language models for retrieval. Chapter 7 is useful for understanding topic models.

  8. [Fang 07]

    This thesis is a systematic study of a new way of developing retrieval models based on an axiomatic framework with promising research results. Your goal of reading it should be to understand the basic idea of this axiomatic approach and know how formal constraints may help evaluating a retrieval function without doing experiments. Read Chapter 3 and Chapter 4.

  9. Note on KL-div Retrieval Model

    Read the entire note.

  10. Note on EM

    Read the entire note to understand the EM algorithm rigorously.

  11. Introduction to Learning to Rank

    Read whatever you can to understand the basic idea of learning to rank.