代写STAT7005 Multivariate Methods (Fall 2024) Assignment 2代做Prolog

2024.10.23 - 首页 >> Python编程

DEPARTMENT OF STATISTICS AND ACTUARIAL SCIENCE

STAT7005 Multivariate Methods (Fall 2024)

Assignment 2

Due: October 22, 2024 via Moodle

1. The ﬁle cat .txt contains the cerebral bleeding volumes of 14 cats at diﬀerent blood pressures. The columns of the data ﬁle are:

• Column 1: Cat ID;

• Column 2: Volume at blood pressure equal to 90 mmHg;

• Column 3: Volume at blood pressure equal to 70 mmHg;

• Column 4: Volume at blood pressure equal to 50 mmHg;

• Column 5: Group identiﬁer, where 0 indicates control group and 1 indicates the treatment group.

(a) Plot the mean proﬁles for the three pressures and two treatment groups. Do the segments appear to (i) be parallel, (ii) be horizontal, (iii) coincide?

(b) Conduct a proﬁle analysis on the cerebral bleeding volumes for the two treat- ment groups. Describe the procedure and your conclusions in detail. Use α = 0.05 for the test(s) conducted.

2. A two-factor experiment was conducted on the growth of peanut. The data set is available in peanut .txt. Three varieties (5, 6 and 8; second column) of peanut were grown at two locations (1, 2; ﬁrst column). The variables of interest are:

X1 : Yield (plot weight; third column);

X2 : Sound mature kernels (weight in grams; fourth column);

X3 : Seed size (weight of 100 seeds in grams; ﬁfth column).

There are two replications for each location–variety combination.

(a) Perform. a two-way MANOVA. In particular, test for the location–variety in- teraction eﬀect and each of the main eﬀects. Use α = 0.05.

(b) Perform. three separate univariate two-way ANOVAs to check whether there is interaction eﬀect for some of the variables only.

(c) Larger numbers correspond to better yield and grade–grain characteristics. Using location 2, can we conclude that one variety is better than the other two for each characteristic (variable)? Construct 95% Bonferroni conﬁdence intervals for each pair of varieties to answer this question.

3. The ﬁle river .csv contains data on 60 river localities. For each locality, there are 17 biological metric records (X1; X2; . . . ; X17) which are used as measures of biological activity within each locality. The response variables are the concentration of seven chemical substances (Y1; Y2; . . . ; Y7) at those localities. The columns in the data set are in the order X1; X2; . . . ; X17; Y1; Y2; . . . ; Y7 .

(a) Perform. a multivariate regression of Y1; Y2; . . . ; Y7 on X1; X2; . . . ; X17 and obtain the matrices of estimated coeﬃcients B^ and estimated error covariance Σ.

Report your results in at least 2 decimal places.

(b) At the 5% signiﬁcance level, perform. a test on whether all regression coeﬃ- cients (except the intercepts) are jointly equal to zero.

(c) Determine which individual predictor variable(s) can explain all response variables at the 5% signiﬁcance level. Use Wilks’ lambda in performing the tests.

(d) Consider a backward elimination method to select appropriate predictor vari- ables. Start with all predictor variables in a model. Remove the most non- signiﬁcant predictor variable with p-value greater than 0.2 (based on Wilks’ lambda) and re-ﬁt a model without it. Repeat until all predictor variables have p-values equal to or less than 0.2. Which predictor variables can be retained in the ﬁnal multivariate regression model? Explain why the result here may be diﬀerent from that in part (c).

(e) The ﬁle river_new.csv has two new observations with corresponding values of X1; X2; . . . ; X17 . Using the ﬁtted model in part (a) and for each new obser- vation, construct the 95% Scheﬀé’s simultaneous CIs for the seven response variables.