R code 线性代数编程

- 首页 >> Algorithm 算法

Homework 7 – due April 4

Note:You must provide necessary R code with your answers.You will not receive any credit if you don’t show the code for the part you areanswering.

1.     (6 total points) Use the FG2013 datafile posted on Canvas to analyze whataffects the probability of making a field goal in football.

(a)    (2 points) Write out the logistic regression model using Yards asthe explanatory variable and outcome as the response.

> FG2013<-read.csv(file ="~/Desktop/FG2013.csv")

>   logit.reg<- glm(Outcome ~ Yards, data = FG2013, family =binomial(link="logit"))

>  summary(logit.reg)

Call:

glm(formula = Outcome ~ Yards, family = binomial(link= "logit"),

   data =FG2013)

Deviance Residuals:

   Min       1Q  Median       3Q      Max

-2.6626  0.2421   0.3803   0.5883  1.3412  

Coefficients:

          Estimate Std. Error z value Pr(>|z|)    

(Intercept) 5.85106    0.50126  11.673  <2e-16 ***

Yards      -0.09731    0.01121  -8.683  <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be1

   Nulldeviance: 799.91  on 1015  degrees of freedom

Residual deviance: 705.71  on 1014 degrees of freedom

AIC: 709.71

Number of Fisher Scoring iterations: 5

(b)   (4 points) Provide a plot of the logistic regression model fromabove with the 95% confidence bands.

> # Plot logistic model

> curve(expr = predict(logit.reg,data.frame(Yards=x), type="response"), col = "red",

+      xlim = c(min(FG2013$Yards), max(FG2013$Yards)), ylab =expression(hat(pi)),

+      xlab = "Yards", main = "Estimated probability of making afield goal in football",

+      panel.first = grid())

2.     (12 total points) The next aspect is tovalidate the model. Take the placekick data set used in class and remove all PATs (i.e. all extra points)from the data set so that only field goals remain. The code used below showshow to do so.

> placekick <- read.csv(file ="~/Placekick.csv")
> kick <- placekick[placekick$PAT == 0,]

(a)   (2 points) Fit the logisticregression model where distance is the only explanatory variable.

(b)  (2 points) Calculate therelative change in the regression coefficients for each model.

(c)   (4 points) Assume a relativechange of over 50% is considered large. If the relative change is large,explain why this is not desirable.If the relative change is small, explain why this isdesirable

(d)  (4 points) Construct ONE plotthat has the model obtained from the placekick data with the model obtainedfrom the FG2013 data. Thus, your plot should have two logistic regression curvesthat depict the logistic regression model from each data set. Make sure yourplot also has a legend in order to differentiate the two curves. Has the probability of asuccessful field goal increased or decreased by 2013?

3.     (42 total points) From the FG2013data set, use the variables Yards, PointsAhead, and Quarter to predict the probability of a successful field goal. Assume that Quarter isquantitative.

(a)   (3 points) Write out thelogistic regression model.

(b)  (4 points) Estimate theprobability that a 40 yard field goal is successful when the game is tied andin the 4th quarter. Do the same for 20 and 30 yard field goals.

(c)   (3 points) Show how tocalculate the probability of making a 40 yard field in a tied game in the 4thquarter. Do NOT use R to answer this question!

(d)  (3 points) Construct a 95%confidence interval for the probability of making a 40 yard field goal when thegame is tied in the 4th quarter ANDinterpret the interval.

(e)   (6 points)

(i)          Interpret the estimated oddsfor a 10 yard decrease in distance (i.e. use c = -10).

(ii)        Interpret the estimated oddsfor a 3 point increase in lead (i.e. use c = 3)

(iii)      Interpret the estimated oddsafter quarter increases by 1 (i.e. use  c= 1)

(f)    (15 points) Obtain and interpret the 95% confidence intervals foreach of the three odds ratios calculated above. Discuss which intervals do ordo not contain 1 and why this is meaningful in terms of the problem.

(g)  (3 points) Conduct a formalhypothesis test using the anova() function to see if points ahead and quarter need to be included inthe model. Write out the hypotheses and give a conclusion.

(h)  (2 points) Does your answer inpart (g) confirm your conclusions about the importance of points ahead andquarter that you obtained in part (f)? Explain why or why not.

(i)    (3 points) Fit a model that includesthe three variables from above as well as an interaction between points aheadand quarter. Based on this new model, should points ahead and quarter beincluded in the model?


站长地图