代写STAT0045. In-course Assessment 3 (2023/24 Session)代写R编程
- 首页 >> WebSTAT0045. In-course Assessment 3 (2023/24 Session)
Department of Statistical Science
General Instructions
This assessment is classified as Coursework as defined in the UCL Student Regulations for Exams and Assessments (link (https://www.ucl.ac.uk/academic-manual/chapters/chapter-4-assessment-framework-taught-programmes/student-regulations-exams-and-assessments)). It contributes 40% to the overall mark for this module.
The release date for this assessment is 16:00 (UK time) on Tuesday, 12 March 2024.
The submission deadline is 16:00 (UK time) on Monday, 22 April 2024.
This assessment is group work. Work in a group of three students; see also the section on group work below.
There are no automatic extensions for students who have been issued with a Summary of Reasonable Adjustments (SoRA). If you have a SoRA, and would like to apply for an extension, please contact the module lead at the earliest opportunity. If an extension is applied, it is applied to the entire group.
Extenuating circumstances are handled by your parent department and all claims should be submitted via Portico (link (https://www.ucl.ac.uk/academic-manual/chapters/chapter-2-student-support-framework/2-short-term-illness-and-other-extenuating-1)). Depending on the nature and severity of the circumstances, an alternative type of mitigation to a deadline extension may be considered more suitable.
In preparation for this assessment, please ensure that you are familiar with the Department of Statistical Science’s guidance on academic integrity (link (https://www.ucl.ac.uk/statistics/sites/statistics/files/shbpc.pdf)). When submitting your work, you will be required to make a declaration that you have read and understood this guidance.
Parts of your submission may be scanned using similarity detection software. If any breach of the assessment regulations is suspected, it will be investigated in accordance with UCL’s Student Academic Misconduct Procedure (link (https://www.ucl.ac.uk/academic-manual/chapters/chapter-6-student-casework-framework/section-9-student-academic-misconduct-procedure)).
To facilitate anonymous marking, you should not write your name anywhere on your work, including in file names or file descriptions requested as part of the submission process.
You must only submit your work via the designated portal in Moodle. If you try to submit via email or any other channel this will not count as a submission and will not be marked.
There are strict, non-negotiable penalties for late submission, which for coursework are as follows.
Up to 2 working days late: deduction of 10 percentage points, but no lower than the pass mark.
2-5 working days late: capped at the pass mark.
More than 5 working days late: mark of 1.00%.
If the module lead becomes aware of a significant technical issue or outage affecting Moodle during the assessment, a message will be circulated to explain what has happened and the steps being taken to mitigate the issue. If you do not receive notification of a more widespread issue and you experience technical difficulties, you should refer to the Help & Support resources provided by UCL’s central IT service (link (https://www.ucl.ac.uk/isd/help-support)). However, last-minute technical issues will not be considered as valid grounds for missing the deadline, so ensure that you leave plenty of time to prepare, upload and check your submission.
Non-submission (in the absence of any valid extenuating circumstances) will mean that your mark for this component is recorded as 0.00% and you will be deemed to have made an attempt.
You should expect to receive feedback on this assessment within one calendar month of the submission deadline. In the event of a delay, the module lead will contact students directly with details of the revised timeline.
The group work
The registration of the groups is organised via Moodle; see the entry Group choice for ICA 3. If you are looking for group members, feel free to use the course forum to find someone.
The aim is to have groups of three students. A group of two may be asked to consider accepting a third member as part of a group allocation organised by the module lead. Group allocation will not be enforced, and groups of two will be accepted.
Group choice should be finalised by March 22nd. If you are not part of a group by that time, please contact the module lead by email.
All members of the group are responsible for ensuring that they work well together and that everyone is contributing equally. All group members will receive identical marks.
Part of the “work” in “group coursework” is the necessity for groups to communicate and co-ordinate themselves. It is strongly recommended that each group plans a schedule for how they will complete the assessment shortly after these instructions are released. Failure to communicate regularly throughout the duration of the assessment will likely lead to dramatically more stress as the deadline approaches, and more often than not a poor submission which results in a low mark for all group members.
Allowing you to choose your own groups should mean that there is no conflict within groups. If any group issues do arise and you have exhausted all efforts for resolving these issues within the group, please contact me as soon as possible at [email protected] (mailto:[email protected]). The sooner I am contacted about any such issues, the more likely it is that I will be able to help. It is highly unlikely that I will be able to take any meaningful action if you contact me shortly before, or after, the deadline.
The assessment
This assessment consists of Part A and B. For Part A and the first part of Part B, you can submit scanned/photographed hand-written solutions. You can use any writing tool you like as long as the submitted work can be read clearly. Note the UCL advice on submitting scanned/photographed work (link (https://www.ucl.ac.uk/news/2020/apr/seven-simple-steps-submit-handwritten-answers-moodle-exams-or-assessments)). For the report in Part B you are required to write a report and this report should be typed. Include a word count for this part.
Part A and B are both marked on a scale 0-100, and are equally weighted for the final mark. Marks for the constituent parts are listed in bold face. Marks are given for correct answers, but also for succinctness and clarity of explanation.
To ensure anonymous marking, provide all Student ID numbers at the top of Part A and B (no names). The order of the IDs does not matter. Part A and B should be submitted together in one PDF file. Submit the file with the first Student ID as name; for example, if the first ID is 20001234, use the name 20001234.pdf .
You can use R for the questions in Part A and B, but do not hand in R code. R code in the submission will be ignored in the marking.
You are allowed to use an AI tool (such as ChatGTP), but you should acknowledge the use of this and explain the way you used it.
You can use the course Forum to raise queries during the assessment, but only if the queries concern clarification of tasks in the assessment. The forum will close at 12 noon on April 18th, and reopen on April 23rd.
Part A
Question 1
(a) A study is to be undertaken to investigate the association between dyslexia and academic performance. The study will involve two groups of children aged 12, one group with dyslexia and one group without dyslexia. The proportions of children in each group who achieve a given academic degree by age 24 years will be compared. Is this study a cohort study or a case-control study? Give a reason for your answer. [5]
(b) Explain briefly why the study in (a) cannot be undertaken as an experimental study. [5]
(c) Consider data from a case-control study of the form.
Let and be the events that an individual is exposed and is a case respectively and C be the event that an individual is a control. It is desired to estimate the odds O = P(C|E)/P(C|E) of an exposed individual being a case. Assuming that the ratios and are unbiased estimators of and respectively, deduce that the observed odds will be a reasonable estimate of provided that the observed ratio of cases to controls in the sample reflects that in the population. Provide the details of your deduction. [15]
(d) A case-control study of the role of heredity in Crohn’s disease included 210 cases and 358 controls. An individual is exposed if he or she has one or more relatives with Crohn’s disease, and unexposed otherwise. In the notation of part (c), the data are , , , and . Estimate the odds ratio of Crohn’s disease with and without exposure and give an approximate 95% confidence interval for this ratio. Explain you calculation and interpret the numerical results. [10]
Question 2
Let be a variable which represents the 4 treatment combinations of a factorial design for treatment variables and . If there are also two blocking variables and each with 4 levels, then you can use a Latin square design to investigate factorial effects. Let the 4 treatment combinations represented by be given by, where in each pair the first entry is the level for and the second is the level for . Data for outcome variable are given by
where each of the 16 cells contains an observed value of in bold face, and the corresponding treatment combination.
Assume significance level . For the following questions you can use software but do not hand in the code.
(a) Why is this a Latin square design for ? [5]
(b) Using as the outcome variable, write down the model equation for this design using main effects only for and , and main effects and interaction effects for and . Provide a complete specification of the model equation. [10]
(c) Fit the model in (b) to the data. Use an F-test to test the hypothesis that all interaction effects for and are equal to zero. Explain this F-test by defining the hypothesis using parameters in the model in (b), by reporting the distribution of the test statistic, and by explaining why you would reject or not reject the hypothesis. [10]
(d) Provide the point estimates of all the interaction effects in the model in (b). [5]
(e) Produce a graph depicting residuals against fitted values. The graph should be clear and well presented. Include it in the submission of this ICA. Interpret the graph. [5]
(f) Derive the variance of the main effect of in the model in (b) as a function of the common variance , the number of levels for denoted by , and as used in the denotation of the dimension of the Latin square design. Clearly explain your derivation. Mind that you should use the symbols , , and in your answer (and not their values). [20]
Part B
The experiment
Simulated experimental data are available. You will have to answers questions on experimental design, analyse the data, and report the data analysis. Nota bene: data were simulated, but for this assignment the data are presented and should be analysed as if they are the results of a real-life experiment.
The experiment investigates the effect of tulip variety on the maximum height of the tulip when flowering. Treatment variable has three levels: , , and . The response variable is measured in cm.
The tulips are grown in individual pots on two small balconies; one at the front of the flat, and one at the back. Each balcony can accommodate a maximum of 10 pots. is the blocking variable in the design and has levels and .
Sample size considerations
Significance level is fixed at . The experiment is set up to detect a difference of 5cm in mean height across the varieties if this difference is present. A sample-size calculation was undertaken and a sample size of 12 bulbs was chosen. This calculation was based on a standard linear model (with interaction terms) for a two-way ANOVA and used as a preliminary guess of the common variance.
Hint: These considerations are relevant for what follows and should be referred to where appropriate.
Data
Data were collected following a protocol that tried to control the experimental environment as much as possible. The data on the measured responses are given (in cm) as follows:
Questions on experimental design
1) Given the experimental setting, why does it make sense to use the balconies as blocks in the design? [10]
2) Give an explicit example of how randomisation could have been applied in this setting. This example should provide concrete information on how to apply randomisation when the experiment is repeated. [10]
3) Succinctly, give a protocol for controlling the experiment in all stages; that is, when planting the bulbs, when caring for the plants during the grow stage, and when collecting the data on the heights. [10]
4) Show that given the information in the above section on sample size consideration, the sample size of 12 observations is indeed adequate. [10]
Report
Analyse the data using the model that is used for the sample-size calculation. The data analysis should be presented in a report. Interpretation of the numerical results with respect to the aim of the experiment should be part of this analysis. [60]
Follow the instructions carefully:
The report should be written clearly and simply so that it is accessible to readers in other scientific disciplines and to readers for whom English is not their first language.
Assume that readers know the basic elements of experimental design as taught in STAT0045. For example, no need to explain why an F-test is appropriate in an ANOVA table.
Type the report in a text editor and add the word count at the end of the report. Use font size 12.
Maximum word count for the report (excluding text for the tables and figures) is 1200 words.
The title of the report should be “Data Analysis”.
Write the report in paragraphs and complete sentences. Using a few bullet points is OK, but do not write the report as a list of bullet points.
The report should include clear information about the model chosen for the analysis. The report should not include motivation for the experiment or its design, sample-size considerations, or the way the data were collected (this is covered by the questions above).
It is up to you how you divide the main text in sections. Here is one possibility: Introduction, Model, Results, Discussion.
If you use tables of graphs, then number the displays and refer to them in the main text.
The quality of the presentation is included in the marking. Presentation refers here to mathematical content as well as graphs and tables. Mind that graphs and tables need to be relevant, clear and well presented. For graphs, pay attention to the caption, appropriate choices of symbols, line types, axis labels, units of measurement and so forth. For tables, pay attention to the caption, row and column headings, units of measurement and so forth.
If you use an AI tool (see instructions), then use an appendix to acknowledge this use. This appendix does not count towards the maximum word count.
You can add literature references to the report. References do not count towards the maximum word count. No need to add references to the STAT0045 course material.
Hints:
There is no specific need to use AI tools for this report. Mind the danger of using AI tools; see Slides 23-25.
Although it is fine to refer to literature beyond the course material, there is no specific need to do so.
Marking criteria The aim of the report assignment is to see whether 1) you understand the main aspects of experimental design in a specific setting, and 2) you can report clearly on an experiment. Additional criteria: adherence to the above instructions and guidelines, and the quality of the presentation (readability, structure, language).
Submission check
Make sure that you use your student IDs in the submission (and not your names), that you answer all the questions in Part A and B, that you include a word count for the report in Part B, and that you include the graph for Question 2(e) in Part A.