代做MA 575 – LINEAR REGRESSION – FALL 2024 Chapter 2: Simple linear regression代做留学生SQL语言
- 首页 >> Python编程MA 575 – LINEAR REGRESSION – FALL 2024
Chapter 2: Simple linear regression
After some general introduction, we focus in this chapter on the simple linear regression model. We have two main results to learn. Theorem 2.1, which deals with the model parameter estimation, and Theorem 3.1, which dealshow to do inference on the parameter.
1. Introduction
In many projects it often happens that we measure and collect several variables on each statistical unit. We may then be interested in one of the variables, say y, called the response, or outcome, or dependent variable; specifically in the conditional distribution of y given some other variables, say x1 , . . . , xp, and called regres- sors, or predictors, or explanatory variables. Throughout this class, we will assume that the response variable y is a continuous variable, or can be reasonably viewed as a continuous variable. Assuming that we have n statistical units, our dataset is {(yi, xi1,...,xip), 1 ≤ i ≤ n}, where yi (resp. xij) is the value of variable y (resp. xj) on the i-th unit. As statisticians we shall view the collected data as realizations of random vectors that we shall also write as {(yi, xi1,...,xip), 1 ≤ i ≤ n} (unlike most statistics textbook, we will not bother distinguish between random variables and their realizations). Because we are only interested in the conditional distribution of y given (x1 ,..., xp), it is somewhat equivalent to assume – as we will do – that the variables xij are non-random, and to focus on y1 , . . . , yn as random variables that we assume to be independent (but not identically distributed),and where
yi ∼ fy|(x1...,xp)(·|xi1 ,...,xip). (1)
Our aim is to use the collected data to estimate the function fin (1). Statistical models to estimate f are called regression models. The linear regression model is the simplest of all regression models. It is a statistical model for yi that postulates that the expectation of yi is a linear function of xi1 , . . . ,xip:
E(yi|xi1,...,xip) = β0 + β1xi1 + ... + βpxip , (2)
where β0,...,βp are called regression coefficients. The model express the idea that if xij changes by one unit, everything else being equal, then on average we expect yi to change by βj unit. Hence βj captures the important of xj in explaining the variations of y. Equivalently, model (2) can be written as
yi = β0 + β1xi1 + ... + βpxip + ϵi, where E(ϵi) = 0. (3)
One particular distribution that we can give to ϵ so that E(ϵ) = 0 isN(0,σ2 ) for some variance σ 2 . If we make that choice, we get the following special case of model (3)
yi = β0 + β1xi1 + ... + βpxip + ϵi, where ϵ ~ N(0,σ2 ). (4)
Model (4) is the main model that we will study in this class. The parameter set of model (4) is then (σ2 ,β0,...,βp). It is important to keep in mind that when we write a model such as (4), we are implicitly assuming that there
exists one specific parameter value parameter) such that
y = β0⋆+ β1⋆x1 + ... + βp⋆xp + ϵ, where ϵ ~ .
We say that the model is correctly specified. In some cases, one maybe interested for various reasons in fitting a misspecified model. However these are advanced topics that we will not consider in this class.
Remark 1. Whether the linear regression model is appropriate or not is a complicated question in general. The linear model maybe inappropriate if y is not continuous. Or if there is a well-established scientific theory that contradicts our linear model assumption. The linear model may also be inappropriate if some of the key technical assumptions do not hold: the responses (y1 ,..., yn) are not independent, or do not have the same variance. For some of these technical issues, we do have statistical methods to help evaluate the fit of the model.
2. Simple linear regression model
When p = 1 (that is we have only one predictor that we will write simply as x), we call model (4) a simple linear regression model. Therefore our data is {(yi, xi), 1 ≤ i ≤ n}. Assuming independence between the statistical units, model (4) becomes
yi ~ N(β0 + β1xi,σ2 ), where (y1 ,..., yn) are indenpendent. (5)
Example 1. The file ’grades.csv’ on blackboard contains the mid-term and final exams grades for a group of 60 students in a statistics course. Each variable is transformed and given in the form. of z-score. We are interested in the conditional distribution of the final exam score given the mid-term score.
−3 −2 −1 0 1 2 3 x
FIG 1. scatter plot of (x,y)for the grades dataset
Figure 1 shows the scatter plot of the data which suggests a linear trend. Hence it makes sense to consider the model
y = β0 + β1x + ϵ, where ϵ ~ N(0,σ2 ),
with parameters (σ2 ,β0 ,β1 ).
2.1. Parameter estimation
Here we show how to obtain the maximum likelihood estimate of (σ⋆ 2 , β0⋆, β1⋆) from data. The density of yi
is given by
The likelihood function of (σ2 ,β0 ,β1 ) is then
Taking the log, we see that, up to some additive constant that we ignore, the log-likelihood is given by
Recall the usual statistical notation: if we have a dataset (x1 , y1 ) . . . , (xn, yn), we set
The next result shows how to estimate the true value of the parameter by maximum likelihood.
Theorem 2.1. Suppose that sx(2) > 0. Then the maximum likelihood estimate of (σ2 ,β0 ,β1 ) is (ˆ(σ)m(2)le , β(ˆ)0 , β(ˆ)1 )
defined as follows.
It is customary to define
β(ˆ)0 + β(ˆ)1xi
ϵi = yi − yi ,
and RSS
Σiˆ(ϵ)i(2) .
ˆ(y)i is called the fitted value of yi , ˆ(ϵ)i is called the residual of yi, and RSS is the residual sum of squares. With this notation we can write
For reasons that will become clear later, we will not useˆ(σ)m(2)le to estimate . We will use instead
Example 2. With the grades example described above.
grades = read .csv(’grades .csv’); y = grades$y; x = grades$x;
#fit the model using R model = lm(y˜x) |
|
|
|
summary(model) |
|
|
|
###################### Coefficients: |
|
|
|
Estimate |
Std . Error |
t_value |
Pr(> | t | ) |
(Intercept) -0 .02819 |
0 .02470 |
-1 . 141 |
0 .258 |
x 0 .88227 |
0 .02237 |
39 .439 |
<2e-16 *** |
|
|
|
|
Let’s compare these estimates with the direct calculations using the estimators derived above.
x_bar = mean(x); y_bar = mean(y);
beta 1 hat = sum((x-x_bar) * (y-y_bar))/sum((x-x_bar)ˆ2); beta 0 hat = y_bar - beta 1 hat *x_bar;
y_hat = beta 0 hat + beta 1 hat *x;
sigmasq_hat = sum((y - y_hat)ˆ2) / (n-2);
print(c(beta_0_hat,beta_1_hat,sigmasq_hat))
[1] -0 .02819114 0 .88226847 0 .03656782 The two estimates match perfectly.