代做STSCI 5740 Machine Learning and Data Mining帮做R编程
- 首页 >> Python编程Machine Learning and Data Mining
Problem 1 (6 points)
1. Express Var(X1 - X2) through the variances and covariances of X1 , X2 (assuming all variances exist).
2. Assume that X1 , . . . , Xn are i.i.d. real-valued random variables with finite variances. Show that
3. Assume that X, Y are independent random variables with E[X] = 0, E[Y] = 1, Var(X) = 1, Var(Y) = 2. Compute
E[(3X + Y)(5Y + 2X - 1)].
Problem 2 (8 points)
Assume that we have the regression model
Y = f(X) + ε,
where ε is independent of X and E[ε] = 0, E[ε2] = σ 2 . Assume that the training data (x1 , y1 ), . . . , (xn , yn ) are used to construct an estimate of f(x), denoted by fˆ(x). Given a new random vector (X, Y) (test data independent of the training data):
1. Show that
2. Show that
3. Explain the bias–variance trade-off based on the above equation.
4. Explain the difference between training MSE and test MSE. Can expected test MSE be smaller than σ 2 ?
Problem 3 (6 points)
Consider a classification problem where the response Y takes values in C = {1, 2, 3}. For a fixed x0 , suppose
P(Y = 1 | X = x0 ) = 0.6, P(Y = 2 | X = x0 ) = 0.3, P(Y = 3 | X = x0 ) = 0.1.
1. What is the value of the Bayes classifier at X = x0 ?
2. What is the Bayes error rate at X = x0 ?
3. Consider a naive classifierf(ˆ)(x0 ), called random guessing: we pick one number uniformly
from C = {1, 2, 3} as the label. Compute its expected test error rate, and show that the Bayes error rate is smaller.
Problem 4 (8 points)
Solve Problem 1 on page 52 (Chapter 2.4) in the textbook Introduction to Statistical Learning (2nd edition).
Problem 5 (12 points)
Solve Problem 9 on page 56 (Chapter 2.4) in the textbook Introduction to Statistical Learning (2nd edition). The dataset Auto . data can be found at:
https://www.statlearning.com/resources-second-edition
Follow the code in Chapter 2.3.4 to load the data. For this and later problems, include all R code and R output that you use.
Problem 6 (5 points, required for STSCI 5740, optional for 3740)
Classification is an important research area. In this problem, we study the excess risk of a classifier.
Let Y ∈ {0, 1}. The Bayes classifier is
Since p1 (x) is unknown, we estimate it withˆ(p)1 (x) ∈ [0, 1] and define the plug-in classifier fˆ(x) = 1 if ˆ(p)1 (x) > 1/2, and 0 otherwise.
Define the excess risk as
R(fˆ) — R(f*),
where R(f) = P(Y ≠ f(X)). Prove that
R(fˆ) — R(f*) ≤ 2E [|ˆ(p)1 (X) — p1 (X)|].
Hint: You may first prove