代写STAT7005 Multivariate Methods (Fall 2024) Assignment 2代做Prolog
- 首页 >> Python编程DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE
STAT7005 Multivariate Methods (Fall 2024)
Assignment 2
Due: October 22, 2024 via Moodle
1. The file cat .txt contains the cerebral bleeding volumes of 14 cats at different blood pressures. The columns of the data file are:
• Column 1: Cat ID;
• Column 2: Volume at blood pressure equal to 90 mmHg;
• Column 3: Volume at blood pressure equal to 70 mmHg;
• Column 4: Volume at blood pressure equal to 50 mmHg;
• Column 5: Group identifier, where 0 indicates control group and 1 indicates the treatment group.
(a) Plot the mean profiles for the three pressures and two treatment groups. Do the segments appear to (i) be parallel, (ii) be horizontal, (iii) coincide?
(b) Conduct a profile analysis on the cerebral bleeding volumes for the two treat- ment groups. Describe the procedure and your conclusions in detail. Use α = 0.05 for the test(s) conducted.
2. A two-factor experiment was conducted on the growth of peanut. The data set is available in peanut .txt. Three varieties (5, 6 and 8; second column) of peanut were grown at two locations (1, 2; first column). The variables of interest are:
X1 : Yield (plot weight; third column);
X2 : Sound mature kernels (weight in grams; fourth column);
X3 : Seed size (weight of 100 seeds in grams; fifth column).
There are two replications for each location–variety combination.
(a) Perform. a two-way MANOVA. In particular, test for the location–variety in- teraction effect and each of the main effects. Use α = 0.05.
(b) Perform. three separate univariate two-way ANOVAs to check whether there is interaction effect for some of the variables only.
(c) Larger numbers correspond to better yield and grade–grain characteristics. Using location 2, can we conclude that one variety is better than the other two for each characteristic (variable)? Construct 95% Bonferroni confidence intervals for each pair of varieties to answer this question.
3. The file river .csv contains data on 60 river localities. For each locality, there are 17 biological metric records (X1; X2; . . . ; X17) which are used as measures of biological activity within each locality. The response variables are the concentration of seven chemical substances (Y1; Y2; . . . ; Y7) at those localities. The columns in the data set are in the order X1; X2; . . . ; X17; Y1; Y2; . . . ; Y7 .
(a) Perform. a multivariate regression of Y1; Y2; . . . ; Y7 on X1; X2; . . . ; X17 and obtain the matrices of estimated coefficients B^ and estimated error covariance Σ.
Report your results in at least 2 decimal places.
(b) At the 5% significance level, perform. a test on whether all regression coeffi- cients (except the intercepts) are jointly equal to zero.
(c) Determine which individual predictor variable(s) can explain all response variables at the 5% significance level. Use Wilks’ lambda in performing the tests.
(d) Consider a backward elimination method to select appropriate predictor vari- ables. Start with all predictor variables in a model. Remove the most non- significant predictor variable with p-value greater than 0.2 (based on Wilks’ lambda) and re-fit a model without it. Repeat until all predictor variables have p-values equal to or less than 0.2. Which predictor variables can be retained in the final multivariate regression model? Explain why the result here may be different from that in part (c).
(e) The file river_new.csv has two new observations with corresponding values of X1; X2; . . . ; X17 . Using the fitted model in part (a) and for each new obser- vation, construct the 95% Scheffé’s simultaneous CIs for the seven response variables.