代写Data Science Statistics Assignment III代做Python编程
- 首页 >> Python编程Assignment III
The assignment requires to perform a simple Monte Carlo experiment.
The physics of the process we are studying imposes that the real-valued observations we are measuring, given the classes, follow Laplace distributions. These distributions are characterized by the probability density function
where λ > 0 is the scale parameter, and μ ∈ R is the location parameter. Denote this situation as X ~ Lap(μ, λ).
A deeper investigation into the phenomenon and how the measurements are obtained results in further assuming that all scale parameters are equal to 1, i.e., different classes are characterized only by different location parameters.
Assume that we have two classes, A and B. The observations, given the classes, follow Lap(μξ , 1) distributions, where ξ ∈ {A, B}.
We want to understand the effect of the distribution of the observations on the classification performance of the Bayes classifier. To that aim, design a Monte Carlo experiment to estimate the overall accuracy.
Your response to this assignment must be organised as a research report, i.e., in four sections: Intro- duction (your wording of the problem, including your scientific question), Methodology (what you did), Results (what you observed: the plot), and Discussion (your answer to the scientific question).
The following tasks split the problem in a logical sequence of steps that, if well executed, will result in all the material you need to produce the research report. Optionally, your report may refer to the step number.
1. [No marks:] Familiarise yourself with the Laplace distribution. Browse Chapter 24 from John- son et al. (1995) (this book is available at our library). In particular, take note that if X = (X1 , X2, … , Xn ) is a random sample from the Lap(μ, λ) model, then the maximum likelihood estimator of μ is ^μ = q1/2 (X), the median of X.
2. [2 marks:] Make plots of densities in linear and semilogarithmic scales obeying the recommen- dations stated in the “Presentations” part of this course. See how they change depending on the scale and/or location parameters. Discuss the plots (no discussions will result in losing the marks for this task).
3. [2 marks:] Obtain the Bayes classifier that discriminates observations between classes A and B. This classifier is defined by the point x* that satisfies
Pr(Y = A)fx (x* ; μA , 1) = Pr(Y = B)fx (x* ; μB , 1),
where fx (x; μ, λ) is the density that characterises the Lam(μ, λ) model, and Pr(Y = A) is the prior probability of class A. Show your working; failure to do so will result in a loss of marks for this task.
4. [6 marks:] Make a simulation study with the following parameters:
• A unique seed for the pseudorandom number generator, fixed before the replications loop begins.
• nA = 1000, the sample size of class A;
• nB = 300, the sample size of class B;
• λA = λB = 1, the scales of models A and B;
• μA = 0, the location parameter of model A;
• μB ∈ {—1, —0.5, 3}, the location parameters of model B.
• For each μB , replicate R = 500 times the following experiment:
o Simulate a sample of size nA from the Lap(μA , λA ) model (you may use the rlaplace function from the extraDistr package in R).
o Simulate a sample of size nB from the Lap(μB , λB ) model.
o Randomly select the training samples from each class with probability P = 4/5, and use the remaining observations as test samples.
o Using the training samples, compute ^pA , ^μA , ^μB , the maximum likelihood estimators of the unknown parameters PA , μA , and μB . The maximum likelihood estimator of a probability, in our case, is the sample proportion.
o Find x* using ^pA , ^μA , and ^μB .
o Apply x* to the test sample, and compute the overall accuracy attained in this replica- tion.
o Compute the average overall accuracy using the R values. Also compute the sample standard deviation of these R values.
o Make a plot of the average overall accuracy as a function of μB . Add a measure of accuracy to these average values. Justify your choice. This plot must follow the recom- mendations stated in the “Presentations” part of this course.
• Discuss your findings (no discussions will result in losing the marks for this task).
References
Johnson, N. L., Kotz, S., & Balakrishnan, N. (1995). Continuous univariate distributions (2nd ed., Vol.
2). John Wiley & Sons.