代做STAT802: Advanced Topics in Analytics - Semester 1 2024 Assignment 1代做迭代
- 首页 >> CSSTAT802: Advanced Topics in Analytics - Semester 1 2024
Assignment 1 – Part A
Due: 9am on Monday 25 March 2024
Outline: Assignment 1 – Part A comprises three questions worth 15% of your inal grade. Total: 55 marks.
Instructions:
. Only documents in portable format (pdf) will be accepted. You can use,e.g., Word, knitr or Sweave to create your report, as well as RStudio as editor of the source iles.
. Formats other than PDF will be ignored and the author will be asked to re–submit the assignment within 24 hours after the due date & time at the cost of 5% of the total marks. If the assignment is not resubmitted within this time frame, then it will be assigned a mark of zero and deemed as non–submission.
. Any SAS code required to complete this assignment, especially the code to support your conclusions & answers, must be self-explanatory and must be embedded in the correspond- ing answer as text (not image). SAS code submitted in separate iles will be ignored and not considered for marking.
. Optionally, you may submit only your answers and avoid copying & pasting each question
in the PDF document. If this is the case, then just make reference to each question, e.g.,
Answer Question 1 (a), Answer Question 1 (b), ... , etc.
. Answer all the questions as requested. Any material or information unrelated to the correct answer may result in a signiicant reduction of marks for that question.
. Several questions will come to light while solving these tasks. You may need to visit the SAS–support website for additional information about speciic statements/steps to complete them.
. Finally, ill in and sign the cover sheet which must be the very irst page in the PDF. Use, e.g., Adobe Acrobat Pro on Uni computers. Do not submit the cover sheet separately.
LATE SUBMISSIONS: You can submit up to three days after the submission deadline with a 5% penalty p/day. If you need an extension (with no penalty) because your performance has been impacted by some extenuating, unexpected, circumstances, then you neet to submit and SCA along with relevant evidence using the submission link from our STAT802 Home page. Bear in mind that SCA processing may take up to 5 working days. If you have questions, contact [email protected].
Question 1. Consider the days absent / academic data involving maths and language test scores collected from 316 students. File: Math Language Scores S12024.sas7bdat, Week 2 Canvas.
The data dictionary is in slide 23, ile: STAT802 Week 2 GLMs II S12024.
Perform the following tasks and write an Executive Summary (ES), as requested. Note that Parts a), b) and c) carry NO marks however you’re required to address such concerns in your Executive Summary.
a) (0 marks) Using regression models run a proper analysis to investigate 1) the efect of the ‘math’ and ‘language’ scores on X2 = the average number of days absent during the school year, and 2) the extent to which the inluence
of the ‘math’ scores on X2 luctuates across the diferent bilingual status.
SHOW / EXPLAIN YOUR WORK - Or else, marks will be deducted of the ES.
b) (0 marks) Explore options to address issues related to overdispersion, IF AP- PLIES. Include solution/s if such massive variability is observed.
SHOW / EXPLAIN YOUR WORK - Or else, marks will be deducted of the ES.
c) (0 marks) Use the selected model to estimate the average number of days absent for three students: Andrea, Curtis and Jessica. According to the School records, Andrea showed the highest attendance rate (from past years) and this trend is expected to continue on.
The data is the following:
Andrea: Math score = 38.5, Language score = 49.4, biling = 1.
Curtis: Math score = 56.21, Language score = 52.11, biling = 2.
Jessica: Math score = 61.39, Language score = 42.90, biling = 3.
SHOW / EXPLAIN YOUR WORK - Or else, marks will be deducted of the ES.
c) (25 marks) Write down an executive summary addressing points a), b) and c). Include your SAS code in an Appendix (15 marks - Executive Summary + 5 marks SAS code + 5 marks work).
Question 2. The ile binary.csv contains information from 400 students who applied to graduate school in 2022. The attributes are the following:
. admit, binary: 1 if the students was admitted to graduate school, 0 otherwise, . gre, the student’s gre score when the application was submitted,
. gpa, the student’s gpa when the application was submitted, and
. rank, that takes on the values 1 through 4 and indicates the prestige of the Institution the student obtained their bachelor’s degree. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest.
Using regression models, your manager (Cathy) is willing to explore gre, gpa, and in- stitution rank as factors that may inluence the chance of students to be admitted to graduate school. Speciically, she believes that gpa has the highest in uence on anticipat- ing the admission (and non-admission) of these students to graduate school across the four instution ranks.. Cathy also believes that the diferences among the institution’s prestige in the chances of students ‘admitted’ and ‘not admitted’ difer based on the gre scores. Is your manager correct with both assumptions? These results will be used in the next Executive Board meeting.
a) (1 marks - model + 4 marks - justiication = 5 marks) Propose and EXPLAIN an appropriate modelling framework to deal with your manager’s concern. Name the model (e.g., ordinary regression, logistic regression, etc.)
b) (3 marks) Write down the full (theoretical) model. Derive the reduced models, if any. If no reduced models are to be considered, then write down a short paragraph explaining this point.
c) You should by aware by now of the exceedingly large diference between the GPA and GRE supports. While GPA ranges 1 to 5 points, GRE’s minimum is 220 units. Inter- preting regression output with GRE or GPA as response and the other as predictor may be hardly intuitive. Before going through d) - f), you are required to re-scale or even standardized GRE and/or GP.
HINT:
https://scc.ms.unimelb.edu.au/resources/reporting-statistical-inference/ rescaling-explanatory-variables-in-linear-regression.
(5 marks) Write down 2-3 sentences outlining the approach you have adopted to deal with this matter. Don’t go through d) - f) with this issues yet unresolved.
d) (3 marks) Generate SAS code to estimate your model, AND appropriately address any issue related to OVERDISPERSION, if APPLIES.
e) (6 marks) For the following students, your manager wants to know how likely (or unlikely) is for them to be admitted to graduate school. See Slide 12 (predicted probabilities) from the STAT802 Week 2 GLMs II S12024 deck!
Teresa: gre = 680, gpa = 3.5, and rank = 2.
Johanna: gre = 530, gpa = 4.18, and rank = 3.
Tim: gre = 600, gpa = 4.34, and rank = 4.
f) (8 marks) Write down an executive summary (avoid technical jargon). Focus on the question Is your manager correct with both assumptions?. Include a short discussion on Part d) - Overdispersion and Part e).
NOTE: Present output relevant to this question correctly cited and including captions in an Appendix!