代做STSCI 5740 Machine Learning and Data Mining帮做R编程

2025.10.09 - 首页 >> Python编程

Machine Learning and Data Mining

Problem 1 (6 points)

1. Express Var(X1 - X2) through the variances and covariances of X1 , X2 (assuming all variances exist).

2. Assume that X1 , . . . , Xn are i.i.d. real-valued random variables with finite variances. Show that

3. Assume that X, Y are independent random variables with E[X] = 0, E[Y] = 1, Var(X) = 1, Var(Y) = 2. Compute

E[(3X + Y)(5Y + 2X - 1)].

Problem 2 (8 points)

Assume that we have the regression model

Y = f(X) + ε,

where ε is independent of X and E[ε] = 0, E[ε2] = σ 2 . Assume that the training data (x1 , y1 ), . . . , (xn , yn ) are used to construct an estimate of f(x), denoted by fˆ(x). Given a new random vector (X, Y) (test data independent of the training data):

1. Show that

2. Show that

3. Explain the bias–variance trade-off based on the above equation.

4. Explain the difference between training MSE and test MSE. Can expected test MSE be smaller than σ 2 ?

Problem 3 (6 points)

Consider a classification problem where the response Y takes values in C = {1, 2, 3}. For a fixed x0 , suppose

P(Y = 1 | X = x0 ) = 0.6, P(Y = 2 | X = x0 ) = 0.3, P(Y = 3 | X = x0 ) = 0.1.

1. What is the value of the Bayes classifier at X = x0 ?

2. What is the Bayes error rate at X = x0 ?

3. Consider a naive classifierf(ˆ)(x0 ), called random guessing: we pick one number uniformly

from C = {1, 2, 3} as the label. Compute its expected test error rate, and show that the Bayes error rate is smaller.

Problem 4 (8 points)

Solve Problem 1 on page 52 (Chapter 2.4) in the textbook Introduction to Statistical Learning (2nd edition).

Problem 5 (12 points)

Solve Problem 9 on page 56 (Chapter 2.4) in the textbook Introduction to Statistical Learning (2nd edition). The dataset Auto . data can be found at:

https://www.statlearning.com/resources-second-edition

Follow the code in Chapter 2.3.4 to load the data. For this and later problems, include all R code and R output that you use.

Problem 6 (5 points, required for STSCI 5740, optional for 3740)

Classification is an important research area. In this problem, we study the excess risk of a classifier.

Let Y ∈ {0, 1}. The Bayes classifier is

Since p1 (x) is unknown, we estimate it withˆ(p)1 (x) ∈ [0, 1] and define the plug-in classifier fˆ(x) = 1 if ˆ(p)1 (x) > 1/2, and 0 otherwise.

Define the excess risk as

R(fˆ) — R(f*),

where R(f) = P(Y ≠ f(X)). Prove that

R(fˆ) — R(f*) ≤ 2E [|ˆ(p)1 (X) — p1 (X)|].

Hint: You may first prove