辅导MATH4/68091、讲解Statistical Computing

2019.10.18 - 首页 >> Matlab编程

MATH4/68091 Statistical Computing Coursework 1 (Oct 2019)

To be submitted by 7.00 pm on Sunday 27 October 2019

This coursework is worth 40 marks all together.

Late Submission of Work: Any student’s work that is submitted after the given deadline will be

classed as late, unless an extension has already been agreed via mitigating circumstances or a DASS

extension. The following rules for the application of penalties for late submission are quoted from the

University guidance on late submission document, version 1.3 (dated July 2019).

”Any work submitted at any time within the first 24 hours following the published submission deadline

will receive a penalty of 10% of the maximum amount of marks available. Any work submitted at any

time between 24 hours and up to 48 hours late will receive a deduction of 20% of the marks available, and

so on, at the rate of an additional 10% of available marks deducted per 24 hours, until the assignment is

submitted or no marks remain.”

Your submitted solutions should all be in one document. You are strongly advised to produce this

document using LaTex, but if you do feel that you have to use Word, then this will not be penalised.

For each part of the three questions you should provide explanations as to how you completed what is

required, show your working and also comment on computational results, where applicable.

When you include a plot be sure to give it a title and label the axes correctly.

When you have written or used R code to answer any of the parts, then you should list this R code after

the particular written answer to which it applies. This may be the R code for a function you have written

and/or code you have used to produce numerical results and plots.

If you use Latex to produce your report, you can include R code and output from R using the verbatim

environment. ie. type

\begin{verbatim}

copy and paste lines of text from R here

\end{verbatim}

Your file should be submitted through the module site on Blackboard to the Turnitin assessment entitled

”SC Coursework 1” by the above time and date. The work will be marked anonymously on Blackboard so

please ensure that your filename is clear but that it does not contain your name and student id number.

Similarly, do not include your name and id number in the document itself.

Turnitin will generate a similarity report for your submitted document and indicate matches to other

sources, including billions of internet documents (both live and archived), a subscription repository of

periodicals, journals and publications, as well as submissions from other students . Please ensure that the

document you upload represents your own work and is written in your own words. The Turnitin report

will be available for you to see shortly after the due date.

This coursework should hopefully help to reinforce some of the methodology you have been studying, as

well as the skills in R you have been developing in the module so far.

1. A sequence of n pseudo-randomized observations on the interval [0, 1] can be obtained using the

linear congruential generator which works as follows:

yi = (ayi−1 + c) (mod m), ui =

m − 1

, i = 1, . . . , n

for suitable choices of the modulus m, the multiplier a, the increment c and the seed 0 ≤ y0 < m.

(See Section 1.8 (Week 1) of the course notes.)

(i) Write an R function with the five arguments n, m, a, c and initial value y0 to generate a sequence

of n random observations on [0, 1] which are output from the function in a single vector of

length n.

Run your function with the parameter values m = 231 − 1, a = 48271, c = 0 and y0 = 7948 to

generate a sample of size n = 1000.

Please list the first six values u1, . . . , u6 that you simulate, but do not print out all 1000 values

in your report!

[3 marks]

(ii) Manually produce (ie. write the code to do it yourself, rather than relying on any generic R

functions to directly compute the desired results) a probability-probability (P-P) plot (explained

below) of your simulated data (u1, . . . , u1000) to obtain a graphical impression of

whether they can be regarded as being a random sample from a U(0, 1) distribution.

[The general procedure for producing a P-P plot for a sample of data to check whether it

looks plausible that it may be drawn from a particular hypothesised distribution is as follows:

If we have a random sample which is denoted by x1, . . . , xn then the empirical distribution

function of the data is given by

where the indicator function IA is defined to be equal to 1 if the event A occurs and equal to

0 otherwise. Hence,

Fˆn(x) = number of sample observations ≤ xn

Let the hypothesised distribution from which the sample is assumed to be drawn from be

denoted by F0(x).

To obtain the probability plot for your data, construct a scatterplot of the pairs

(Fˆn(x1), F0(x1)), . . . ,(Fˆn(xn), F0(xn))

and look for how well they agree. The reasoning is that we are comparing the nonparametric

Fˆn(x), which can be shown to be an unbiased estimator of the true distribution function of

the random sample, with the hypothesised parametric distribution function F(x). If our hypothesis is correct, then the two should match well when they are evaluated at each of the

sample values.]

Can you graphically enhance your probability plot by providing an aid for subjectively judging

whether the hypothesis of a U(0, 1) distribution is plausible?

Calculate an appropriate summary statistic to quantify the strength of agreement in some

sense between Fˆ

n(x) and F0(x). Give your view as to the suitability of your chosen statistic

and comment on its numerical value.

[Nb. You do not have to write your own code here to manually calculate your choice of

summary statistic; you may use a function already available in R.]

[7 marks]

[10 marks for Q1]

2. It is proposed to use the inverse cdf method to simulate random data from the standard double

exponential distribution with pdf f(x) = 1

(i) Find the cdf of the standard double exponential distribution.

[2 marks]

(ii) Describe in detail the steps of the inverse cdf method to obtain a random sample of size n

from f(x).[3 marks]

(iii) Write a function to implement the steps you have described in part (ii) which will output a

random sample of size n from f(x).

Run your function written for part (iii) to generate a random sample of size n = 5000 from

f(x). (Note that you can use the R function runif to generate the required random Uniform

data here.)

[4 marks]

(iv) Construct a histogram of the data generated in part(ii), superimpose the pdf f(x) and comment

on the goodness-of-fit.

[3 marks]

[12 marks for Q2]

3. Suppose now that we want to simulate random data from the standard Normal distribution whose

pdf is given by

It is proposed to use a rejection sampling algorithm where the proposal distribution is the standard

double exponential distribution from Q2 with pdf

(i) Show that the value of the constant K such that

(ii) Produce on the same plot graphs of the functions f(x) and Kg(x) using the value of K

determined in part(i).

[2 marks]

(iii) Describe the steps of the rejection sampling algorithm which can be used to sample from the

standard Normal f(x) here with the proposed g(x).

[4 marks]

(iv) Theoretically, how efficient should your rejection sampling algorithm described in part (iii)

be?

[1 mark]

(v) Write a function which will generate a random sample of size n from f(x) using your algorithm

from part (iii). Your function should include a calculation of an empirical estimate of its

efficiency.

(Note that you should use the R function you have written in Q2(iii) to generate random

observations from the standard double exponential distribution. If this does not appear to be

working correctly for you, then as an alternative, you can generate random standard double

exponential data using the rexp R function, together with random U(0, 1) (using R function

runif) observations in order to decide whether sampled Ex(1) values should be positive or

negative. A sampled Ex(1) observation will be assigned either a positive or negative sign ,

each with probability 0.5, to produce a random observation from g(x) here.)

Run your algorithm to generate a random sample of size n = 5000 from the standard Normal

distribution f(x).

[5 marks]

(vi) Choose a single type of plot yourself to enable you to graphically explore whether your random

data is consistent with being sampled from a standard Normal distribution.

Note that, as along as your plot is appropriate for the purpose, then one choice will not be

regarded as being more correct than another. The important things are being able to explain

and interpret the plot you produce correctly.

[3 marks]

[18 marks for Q3]