(due 11:59pm, Sunday, Sept. 14, 2014)

**Probabilistic Reasoning and Bayes Rule [50 points]**Consider the problem of detecting email messages that may carry a virus. This problem can be modeled probabilistically by treating each email message as representing an observation of values of the following 4 random variables: (1) A: whether the message has an attachment (1 for yes); (2) K: whether the sender is unknown to the receiver (1 for yes); (3) L: whether the message is not longer than 10 words (1 for yes); and (4) V: whether the message carries a virus (1 for yes). Given a message, we can observe the values of A, L, and K, and we want to infer its value of V. In terms of probabilistic reasoning, we are interested in evaluating the conditional probability p(V|A, L, K), and we would say that the message carries a virus if p(V=1|A, L,K)>p(V=0|A,L,K).

We make a further assumption that p(A,L,K|V)=p(A|V)p(L|V)p(K|V) for V=0 and V=1, i.e., given the status whether a message carries a virus, the values of A, K, and L are independent.

**A. [5 points]**Suppose we observe 12 samples as in Table 1:Table 1

------------------------------------ | sample # | A | K | L | V | |----------|-----|-----|-----|-----| | 0 | 1 | 1 | 1 | 1 | |----------|-----|-----|-----|-----| | 1 | 1 | 1 | 0 | 1 | |----------|-----|-----|-----|-----| | 2 | 1 | 1 | 0 | 1 | |----------|-----|-----|-----|-----| | 3 | 1 | 0 | 0 | 0 | |----------|-----|-----|-----|-----| | 4 | 1 | 1 | 1 | 0 | |----------|-----|-----|-----|-----| | 5 | 0 | 1 | 0 | 1 | |----------|-----|-----|-----|-----| | 6 | 0 | 1 | 1 | 1 | |----------|-----|-----|-----|-----| | 7 | 0 | 0 | 0 | 0 | |----------|-----|-----|-----|-----| | 8 | 0 | 1 | 0 | 0 | |----------|-----|-----|-----|-----| | 9 | 0 | 1 | 1 | 0 | |----------|-----|-----|-----|-----| | 10 | 0 | 0 | 0 | 0 | |----------|-----|-----|-----|-----| | 11 | 1 | 0 | 0 | 1 | -----------------------------------|

Fill in Table 2 with conditional probabilities based on*only*information in the 12 examples.Table 2

---------------------------------------------- | V |p(A=1|V)|p(K=1|V) |p(L=1|V) |prior P(V) | |----|--------|---------|---------|-----------| | 0 | | | | 1/2 | |----|--------|---------|---------|-----------| | 1 | 4/6 | | | | ----------------------------------------------|

**B. [10 points]**With the independence assumption, use the Bayes formula and probabilities you just computed in problem A to compute the probability that a message M with A=0, K=1, and L=0 carries a virus. i.e., compute p(V=1|A=0,K=1,L=0) and p(V=0|A=0,K=1,L=0). Would we conclude that message M carries a virus?**C. [5 points]**Now, compute p(V=1|A=0, K=1, L=0) and p(V=0|A=0,K=1,L=0) directly from the 12 examples in Table 1, just like what you did in problem A. Do you get the same value as in problem B? Why?**D. [5 points]**Now, ignore Table 1, and consider any possibilities you can fill in Table 2. Are there any constraints on these values that we must respect when assigning these values? In other words, can we fill in Table 2 with 8 arbitrary values between 0 and 1?**E. [5 points]**Can you change your conclusion of problem B (i.e., whether message M carries a virus) by only changing the value A (i.e., if the message has an attachment) in 1 example of Table 1?**F. [10 points]**Note that the conditional independency assumption p(A,L,K|V)=p(A|V)p(L|V)p(K|V) helps simplify the computation of p(A,L,K|V). In particular, with this assumption, we can compute p(A,L,K|V) based on p(A|V), p(L|V), and p(K|V). If we were to specify the values for p(A,L,K|V) directly, what is the minimum number of probability values that we would have to specify in order to fully characterize the conditional probability distribution p(A,L,K|V)? Why? Note that all the probability values must sum to 1.**G. [10 points]**Explain why the independence assumption p(A,L,K|V)=p(A|V)p(L|V)p(K|V) does not necessarily hold in reality.**Maximum Likelihood Estimation [50 points]**A Poisson distribution is often used to model the word frequency. Specifically, the number of occurrences of a word in a document with fixed length can be assumed to follow a Poisson distribution given by

u^x * exp(-u) p(wordcount=x | u) = --------------, u > 0 x!

where "u^x" means u raised to the power of x, and u is the parameter of the Poisson distribution, which happens to be its mean. Now, suppose we observe a sample of counts of a word w, {x_{1},...,x_{N}}, from N documents with the same length (x_{i}is the counts of w in one document). We want to estimate the parameter u of the Poisson distribution for word w. One commonly used method is the maximum likelihood method, in which we choose a value for u that maximizes the likelihood of our data {x_{1},...,x_{N}}, i.e., û= argmax_{u}p(x_{1}, ..., x_{N}| u), u > 0**A. [40 points]**Derive a closed form formula for this estimate.(

*Hint: Write down the log likelihood of {x*)_{1},...,x_{N}}, which would be a function of u. Set the derivative of this function w.r.t. u to zero, and solve the equation for u.**B. [10 points]**Now, suppose u has a prior exponential distributionp(u) = λ e^{- λ * u}, u > 0

where λ is a given parameter. Derive a closed form solution for the Maximum A Posteriori estimation, i.e., û= argmax_{u}p(x_{1},...,x_{N}| u) p(u), u > 0(

*Hint: For Maximum A Posteriori estimation, please refer to this Wikipedia page, and find 'Section 4 Example'.*)

**Please submit your solutions via Compass. The Compass submission page will come out very soon.**