讲解CSCI 739.02、辅导讲解Python设计

2019.10.11 - 首页 >> Python编程

CSCI 739.02 - Human Behavior Modeling Homework 2 October 8, 2019

Homework 2: Making Inferences from the Posterior Distribution

Solutions to this assignment are to be submitted in myCourses via Assignment (formerly

known as Dropbox). The submission deadline is Wednesday October 16, 2019 at

11:59pm. You should submit a zipped file containing a pdf (this could be a scanned

handwritten document or a latex/Word generated pdf) with your written answers and the

Jupyter notebook with any code needed for the assignment. Use comments to explain your

code. All code and plots should be in the notebook while

descriptions/explanations/derivations should be in the PDF.

Question: Inferring the posterior for a Gaussian likelihood example [PDF] The goal of this

assignment is to review Module 2 in class where we discussed different techniques for deriving the

parameters of the posterior distribution. We reviewed the direct estimation of parameters using conjugate

priors, point estimations (MAP and MLE) of the posterior, and lastly, simulating and testing parameters

in order to eventually generate samples directly (Metropolis-Hastings Sampling a form of Markov Chain

Monte Carlo - MCMC). Statistics of the samples can then be calculated.

If x1, x2, . . . , xn are independent observations of a random variable x in a dataset of size n, then the

likelihood for the model (or the joint probability of all x0is) is:

f(X|θ) = Yni=1f(Xi|θ)

Because the Gaussian distribution is used quite a bit in behavior modeling, we will dive into working

with the Gaussian likelihood function in the following exercises:

(a) (20 points) Derive the form of the posterior distribution if the likelihood is a Gaussian with known

variance σ2, but unknown mean µ, where the conjugate prior is also of the Gaussian form. This

is a contrived example since we generally do not know σ2, but it keeps the mathematics simpler,while still making the concepts clear. Use f(µ|X) ∝ f(X|µ)f(µ)What can you say about the relationship between the parameters of the posterior, prior and likelihood

functions?

(b) (30 points) For the example described above, derive the expressions for the maximum likelihood

θMLE and maximum a posteriori θMAP

(HR) are measured and the mean HR is ¯x = 75, with a standard deviation σ = 10 (in line with the

derivations above, variance is known). Heart rate can give a measure of how stressed the students

are going in to an exam. But having taken similar measurements before, over different semester

exams, the past HR means have given us an overall mean µ of 70. The past means have varied from

semester to semester giving us a standard deviation of the means of τ = 5, i.e. τ reflects how much

our past means have varied but does not really reflect the variability of the individual heart rates.

You goal is ultimately to update the knowledge of µ in f(µ|x). Using the expressions obtained

above, find the value of θMAP. Be careful when substituting the different values of means and

variances/std dev in your formula.

Which function has more influence on the posterior in this problem? The prior or the likelihood?

Why do you conclude this?

(d) (15 points) Using the Metropolis-Hastings algorithm, write your own sampler to simulate points

from the posterior. The steps to accomplish this are:

1. Establish the starting value for parameter θ

j=0; set the counter j=1

2. Draw a “candidate” parameter (or proposal) θc

from a proposal distribution (usually another

Gaussian)

3. Compute the ratio ρ = min(1,f(θc)f(X|θc)f(θj−1)f(X|θj−1))

4. Compare ρ with a random draw u from U(0, 1). If ρ > u, then accept the proposal by setting; Record the number of accepted proposals. Efficiency of the

algorithm will be computed as #num accepts

#num iterations

5. Set j = j + 1 and return to step 2 until enough draws are obtained

You are provided with a Jupyter notebook Sampling.ipynb that was written for a binomial

likelihood and beta prior. Modify this sample code to (i) write your MCMC sampler for the

problem described in part (c) and (ii) plot the true posterior, the distribution of your simulated

samples as well as the distribution of prior samples, all on the same figure. Note that your

likelihood is a Gaussian with known variance and there is a lot more code here than is required

for your homework.