# 3BUS 462作业代写、Java课程作业代做、c/c++，Python程序设计作业调试

- 首页 >> Algorithm 算法作业 3BUS 462 Midterm Exam

March 20, 2020

For this exam, for Part A and B, you will use the flight_data_cleaned dataset to answer your questions. With the changes going on right now, I wanted to select a dataset that you likely would have spent some time with. I also wanted to use the exam as an opportunity to reinforce your learning in the course and bring the journey that we started with full circle.

For this exam, you will be using your computer to generate R code which you will submit for part A. Please submit the code and all generated output and required graphs for part A in a single folder (zipped) named LastName_FirstName_BUS462_Midterm.zip.

For the exam portion, please answer all questions on this exam paper submit this paper as a separate PDF along with the .zip file with your code and raw data. Please name the PDF LastName_FirstName_BUS462_Midterm1.pdf. Please do not put this in the zip file.

Please note that this exam is one that you are meant to work alone on. Please adhere to all of the academic honesty rules that the university has.

Part A – Using Flight Data

Section A – Summary Statistics (15 marks)

For this section, please paste the graph along with the answer to the statement asked below.

1)Create a histogram of the PRICE_CND data for all flights and save this to your exam folder. Does the data appear to be normally distributed? (Histogram) (2 marks)

2)Create a boxplot showing the PRICE_CND segmented by airline and save this to your exam folder. Are certain airlines more or less expensive on average than others? (Boxplot) (2 marks)

3)Create a boxplot showing the PRICE_CND segmented by destination and save this to your exam folder. Are certain destinations more expensive on average than others? (Boxplot) (2 marks)

4)Create a boxplot showing the PRICE_CND segmented by the Month_Cleaned and save this to your exam folder. Are certain months more or less pricier on average than other months? (Boxplot) (2 marks)

5)Create a scatterplot showing the relationship between Total_Minutes and Price_CND and save this to your exam folder. What do you see about the relationship? (2 marks)

Section B – Regression Analysis – Simple Linear Regression (10 marks)

6)Complete a simple linear regression model using the Total_Minutes as the explanatory variable and the Price_CND as the response variable (You can transform the Price_CND variable if you would like. Be sure to reverse the transformation when answering). You may also want to check whether the Total_Minutes variable is normally distributed.

a.What is the final regression equation? (Save the output generated in R to your folder and then write out the equation here). You can also include a screen shot of the output. (2 marks)

b.Interpret the slope of the equation from the regression model in words. (2 marks)

c.What is the null and alternative hypothesis being tested? Complete the hypothesis test and either accept or reject the null or alternative hypothesis? (2 marks)

d.For a given flight that is 405 minutes long, construct a 95% confidence interval using the model and make the relevant prediction. (2 marks)

e.Print out the 4 summary graphs from running the model and save each of these to your exam folder. Comment on these graphs here. Are any assumptions violated from the analysis? (2 marks)

Section C – Multiple Regression Analysis (10 marks)

7)Complete a multiple regression model with Price_CND as the response variable and Airline, Month_Cleaned, Destination, Check-in_Baggage_Included, Business_Class, and Change_Airports as explanatory variables.

a.What is the final regression equation? (Save a copy of your output to your exam folder and then write out the equation here). (3 marks)

b.Which variables are significant within the multiple regression model? How did you make this decision? (2 marks)

c.For a given flight that is 405 minutes long, occurs in April, has a destination of New Dehli, has check-in baggage included, is NOT business class, and has a change_in_airport, what is the predicted value? Write the equation and list the predicted value? Create a 95% confidence interval for this data. If a variable in this model is not significant, simply note it and do the calculation. (3 marks)

d.Print out the 4 summary graphs from running the model and save these to your exam folder. Comment on these graphs. Are any assumptions of regression violated from the analysis? (2 marks)

Section D – Variable Selection (5 marks)

8)Complete variable selection using either forward, backward, or all-subsets on all the variables noted in part C.

a.What is the final regression equation that you get following the variable selection process. Write the best model here. What is the adjusted R^2 for this model? Show any supporting output, graphs, or printouts from R here. (5 marks)

Part B – Short Answer (No Computer Needed)

9)Within the realm of MLB, some teams earn much more than other teams. This allows certain teams to therefore spend more money on player salaries, while others have to money ball their way to success. In general, as teams spend more money on salaries, they usually expect that they will win more games on the field. During the last 4 years, (2014-2018), data was collected for each of the 30 MLB teams. The data set has a total of 120 observations. The dataset has a column for wins (number of games won), payroll (opening payroll in millions of dollars for a specific year, AL (variable for whether the team is in the American League (AL) or not, and year (recording when the data was gathered. Below are the results from a regression model that predicts the number of wins a team had from the ln(payroll). (10 marks)

a)What is the correlation (r) between wins and log(payroll)? (2 points)

b)Interpret the slope of the coefficient for log(payroll) in words? (2 points)

c)The New York Yankees are projected to spend approximately $205 million dollars on payroll in 2019 (this current year). Provide a 95% interval for the number of games they will win during the 2019 year. (2 points)

d)Below are the residual graphs that are generated in R for this model. List out all of the assumptions for this regression model, and comment on whether they are reasonable. Be specific in listing the 4 assumptions and commenting on them. (4 marks)

10)Consider the model summary below from a regression analysis. Which model should the researcher chose for predicting Y and why if they are conducting variable selection? (5 marks)

March 20, 2020

For this exam, for Part A and B, you will use the flight_data_cleaned dataset to answer your questions. With the changes going on right now, I wanted to select a dataset that you likely would have spent some time with. I also wanted to use the exam as an opportunity to reinforce your learning in the course and bring the journey that we started with full circle.

For this exam, you will be using your computer to generate R code which you will submit for part A. Please submit the code and all generated output and required graphs for part A in a single folder (zipped) named LastName_FirstName_BUS462_Midterm.zip.

For the exam portion, please answer all questions on this exam paper submit this paper as a separate PDF along with the .zip file with your code and raw data. Please name the PDF LastName_FirstName_BUS462_Midterm1.pdf. Please do not put this in the zip file.

Please note that this exam is one that you are meant to work alone on. Please adhere to all of the academic honesty rules that the university has.

Part A – Using Flight Data

Section A – Summary Statistics (15 marks)

For this section, please paste the graph along with the answer to the statement asked below.

1)Create a histogram of the PRICE_CND data for all flights and save this to your exam folder. Does the data appear to be normally distributed? (Histogram) (2 marks)

2)Create a boxplot showing the PRICE_CND segmented by airline and save this to your exam folder. Are certain airlines more or less expensive on average than others? (Boxplot) (2 marks)

3)Create a boxplot showing the PRICE_CND segmented by destination and save this to your exam folder. Are certain destinations more expensive on average than others? (Boxplot) (2 marks)

4)Create a boxplot showing the PRICE_CND segmented by the Month_Cleaned and save this to your exam folder. Are certain months more or less pricier on average than other months? (Boxplot) (2 marks)

5)Create a scatterplot showing the relationship between Total_Minutes and Price_CND and save this to your exam folder. What do you see about the relationship? (2 marks)

Section B – Regression Analysis – Simple Linear Regression (10 marks)

6)Complete a simple linear regression model using the Total_Minutes as the explanatory variable and the Price_CND as the response variable (You can transform the Price_CND variable if you would like. Be sure to reverse the transformation when answering). You may also want to check whether the Total_Minutes variable is normally distributed.

a.What is the final regression equation? (Save the output generated in R to your folder and then write out the equation here). You can also include a screen shot of the output. (2 marks)

b.Interpret the slope of the equation from the regression model in words. (2 marks)

c.What is the null and alternative hypothesis being tested? Complete the hypothesis test and either accept or reject the null or alternative hypothesis? (2 marks)

d.For a given flight that is 405 minutes long, construct a 95% confidence interval using the model and make the relevant prediction. (2 marks)

e.Print out the 4 summary graphs from running the model and save each of these to your exam folder. Comment on these graphs here. Are any assumptions violated from the analysis? (2 marks)

Section C – Multiple Regression Analysis (10 marks)

7)Complete a multiple regression model with Price_CND as the response variable and Airline, Month_Cleaned, Destination, Check-in_Baggage_Included, Business_Class, and Change_Airports as explanatory variables.

a.What is the final regression equation? (Save a copy of your output to your exam folder and then write out the equation here). (3 marks)

b.Which variables are significant within the multiple regression model? How did you make this decision? (2 marks)

c.For a given flight that is 405 minutes long, occurs in April, has a destination of New Dehli, has check-in baggage included, is NOT business class, and has a change_in_airport, what is the predicted value? Write the equation and list the predicted value? Create a 95% confidence interval for this data. If a variable in this model is not significant, simply note it and do the calculation. (3 marks)

d.Print out the 4 summary graphs from running the model and save these to your exam folder. Comment on these graphs. Are any assumptions of regression violated from the analysis? (2 marks)

Section D – Variable Selection (5 marks)

8)Complete variable selection using either forward, backward, or all-subsets on all the variables noted in part C.

a.What is the final regression equation that you get following the variable selection process. Write the best model here. What is the adjusted R^2 for this model? Show any supporting output, graphs, or printouts from R here. (5 marks)

Part B – Short Answer (No Computer Needed)

9)Within the realm of MLB, some teams earn much more than other teams. This allows certain teams to therefore spend more money on player salaries, while others have to money ball their way to success. In general, as teams spend more money on salaries, they usually expect that they will win more games on the field. During the last 4 years, (2014-2018), data was collected for each of the 30 MLB teams. The data set has a total of 120 observations. The dataset has a column for wins (number of games won), payroll (opening payroll in millions of dollars for a specific year, AL (variable for whether the team is in the American League (AL) or not, and year (recording when the data was gathered. Below are the results from a regression model that predicts the number of wins a team had from the ln(payroll). (10 marks)

a)What is the correlation (r) between wins and log(payroll)? (2 points)

b)Interpret the slope of the coefficient for log(payroll) in words? (2 points)

c)The New York Yankees are projected to spend approximately $205 million dollars on payroll in 2019 (this current year). Provide a 95% interval for the number of games they will win during the 2019 year. (2 points)

d)Below are the residual graphs that are generated in R for this model. List out all of the assumptions for this regression model, and comment on whether they are reasonable. Be specific in listing the 4 assumptions and commenting on them. (4 marks)

10)Consider the model summary below from a regression analysis. Which model should the researcher chose for predicting Y and why if they are conducting variable selection? (5 marks)