辅导MATH4/68091、讲解Statistical Computing
- 首页 >> Matlab编程MATH4/68091 Statistical Computing Coursework 1 (Oct 2019)
To be submitted by 7.00 pm on Sunday 27 October 2019
This coursework is worth 40 marks all together.
Late Submission of Work: Any student’s work that is submitted after the given deadline will be
classed as late, unless an extension has already been agreed via mitigating circumstances or a DASS
extension. The following rules for the application of penalties for late submission are quoted from the
University guidance on late submission document, version 1.3 (dated July 2019).
”Any work submitted at any time within the first 24 hours following the published submission deadline
will receive a penalty of 10% of the maximum amount of marks available. Any work submitted at any
time between 24 hours and up to 48 hours late will receive a deduction of 20% of the marks available, and
so on, at the rate of an additional 10% of available marks deducted per 24 hours, until the assignment is
submitted or no marks remain.”
Your submitted solutions should all be in one document. You are strongly advised to produce this
document using LaTex, but if you do feel that you have to use Word, then this will not be penalised.
For each part of the three questions you should provide explanations as to how you completed what is
required, show your working and also comment on computational results, where applicable.
When you include a plot be sure to give it a title and label the axes correctly.
When you have written or used R code to answer any of the parts, then you should list this R code after
the particular written answer to which it applies. This may be the R code for a function you have written
and/or code you have used to produce numerical results and plots.
If you use Latex to produce your report, you can include R code and output from R using the verbatim
environment. ie. type
\begin{verbatim}
copy and paste lines of text from R here
\end{verbatim}
Your file should be submitted through the module site on Blackboard to the Turnitin assessment entitled
”SC Coursework 1” by the above time and date. The work will be marked anonymously on Blackboard so
please ensure that your filename is clear but that it does not contain your name and student id number.
Similarly, do not include your name and id number in the document itself.
Turnitin will generate a similarity report for your submitted document and indicate matches to other
sources, including billions of internet documents (both live and archived), a subscription repository of
periodicals, journals and publications, as well as submissions from other students . Please ensure that the
document you upload represents your own work and is written in your own words. The Turnitin report
will be available for you to see shortly after the due date.
This coursework should hopefully help to reinforce some of the methodology you have been studying, as
well as the skills in R you have been developing in the module so far.
1
1. A sequence of n pseudo-randomized observations on the interval [0, 1] can be obtained using the
linear congruential generator which works as follows:
yi = (ayi−1 + c) (mod m), ui =
yi
m − 1
, i = 1, . . . , n
for suitable choices of the modulus m, the multiplier a, the increment c and the seed 0 ≤ y0 < m.
(See Section 1.8 (Week 1) of the course notes.)
(i) Write an R function with the five arguments n, m, a, c and initial value y0 to generate a sequence
of n random observations on [0, 1] which are output from the function in a single vector of
length n.
Run your function with the parameter values m = 231 − 1, a = 48271, c = 0 and y0 = 7948 to
generate a sample of size n = 1000.
Please list the first six values u1, . . . , u6 that you simulate, but do not print out all 1000 values
in your report!
[3 marks]
(ii) Manually produce (ie. write the code to do it yourself, rather than relying on any generic R
functions to directly compute the desired results) a probability-probability (P-P) plot (explained
below) of your simulated data (u1, . . . , u1000) to obtain a graphical impression of
whether they can be regarded as being a random sample from a U(0, 1) distribution.
[The general procedure for producing a P-P plot for a sample of data to check whether it
looks plausible that it may be drawn from a particular hypothesised distribution is as follows:
If we have a random sample which is denoted by x1, . . . , xn then the empirical distribution
function of the data is given by
where the indicator function IA is defined to be equal to 1 if the event A occurs and equal to
0 otherwise. Hence,
Fˆn(x) = number of sample observations ≤ xn
Let the hypothesised distribution from which the sample is assumed to be drawn from be
denoted by F0(x).
To obtain the probability plot for your data, construct a scatterplot of the pairs
(Fˆn(x1), F0(x1)), . . . ,(Fˆn(xn), F0(xn))
and look for how well they agree. The reasoning is that we are comparing the nonparametric
Fˆn(x), which can be shown to be an unbiased estimator of the true distribution function of
the random sample, with the hypothesised parametric distribution function F(x). If our hypothesis is correct, then the two should match well when they are evaluated at each of the
sample values.]
Can you graphically enhance your probability plot by providing an aid for subjectively judging
whether the hypothesis of a U(0, 1) distribution is plausible?
Calculate an appropriate summary statistic to quantify the strength of agreement in some
sense between Fˆ
n(x) and F0(x). Give your view as to the suitability of your chosen statistic
and comment on its numerical value.
[Nb. You do not have to write your own code here to manually calculate your choice of
summary statistic; you may use a function already available in R.]
[7 marks]
[10 marks for Q1]
2. It is proposed to use the inverse cdf method to simulate random data from the standard double
exponential distribution with pdf f(x) = 1
(i) Find the cdf of the standard double exponential distribution.
[2 marks]
(ii) Describe in detail the steps of the inverse cdf method to obtain a random sample of size n
from f(x).[3 marks]
(iii) Write a function to implement the steps you have described in part (ii) which will output a
random sample of size n from f(x).
Run your function written for part (iii) to generate a random sample of size n = 5000 from
f(x). (Note that you can use the R function runif to generate the required random Uniform
data here.)
[4 marks]
(iv) Construct a histogram of the data generated in part(ii), superimpose the pdf f(x) and comment
on the goodness-of-fit.
[3 marks]
[12 marks for Q2]
4
3. Suppose now that we want to simulate random data from the standard Normal distribution whose
pdf is given by
It is proposed to use a rejection sampling algorithm where the proposal distribution is the standard
double exponential distribution from Q2 with pdf
(i) Show that the value of the constant K such that
(ii) Produce on the same plot graphs of the functions f(x) and Kg(x) using the value of K
determined in part(i).
[2 marks]
(iii) Describe the steps of the rejection sampling algorithm which can be used to sample from the
standard Normal f(x) here with the proposed g(x).
[4 marks]
(iv) Theoretically, how efficient should your rejection sampling algorithm described in part (iii)
be?
[1 mark]
(v) Write a function which will generate a random sample of size n from f(x) using your algorithm
from part (iii). Your function should include a calculation of an empirical estimate of its
efficiency.
(Note that you should use the R function you have written in Q2(iii) to generate random
observations from the standard double exponential distribution. If this does not appear to be
working correctly for you, then as an alternative, you can generate random standard double
exponential data using the rexp R function, together with random U(0, 1) (using R function
runif) observations in order to decide whether sampled Ex(1) values should be positive or
negative. A sampled Ex(1) observation will be assigned either a positive or negative sign ,
each with probability 0.5, to produce a random observation from g(x) here.)
Run your algorithm to generate a random sample of size n = 5000 from the standard Normal
distribution f(x).
[5 marks]
(vi) Choose a single type of plot yourself to enable you to graphically explore whether your random
data is consistent with being sampled from a standard Normal distribution.
Note that, as along as your plot is appropriate for the purpose, then one choice will not be
regarded as being more correct than another. The important things are being able to explain
and interpret the plot you produce correctly.
[3 marks]
[18 marks for Q3]