代写Final 291 section 2代做Python程序

2024.12.11 - 首页 >> Web

Final 291 section 2 300 points

1) A school has 1600 students and they are going to vote as to whether they

will completely convert the school completely oﬀ fossil fuels. How many

students would you have to poll to be 95% conﬁdent of the outcome within +/- 0.5% of the vote? (25 points)

2) Earthquakes can be broken into two classes based on the directions the

earth moves when they fracture. The classes can be compared across time, for whether earthquakes occur in a given region in a small time period, and if they occur in the next small time period, and it can be added up across

several time periods, below is a table for a region oﬀ Indonesia over a 30 year period

	Second time period, earthquake happens	Second time period no earthquake happens	Marginal Sums
First time period, Earthquake happens	148	274	422
First time period no earthquake happens	276	2626	2902
Marginal sums	424	2900	3324

Is happening of an earthquake in one time period statistically independent of happening in the next time period? Test at the .01 level (20 points)

These are earthquakes of the same type, Which cells have higher than expected occurrence if independence is true. (Use the deviation table). (10 points)

3) The earthquake chart is the same chart, only comparing when earthquakes of diﬀerent types follow one another

	Second time period, earthquake happens	Second time period no earthquake happens	Marginal Sums
First time period, Earthquake happens	5	314	319
First time period no earthquake happens	314	2691	3005
Marginal sums	319	3005	3324

Are they statistically independent now (.01 level again) (16 points)

How do the deviations from expectation under independence diﬀer from the chart in problem 2 (hint look at the pattern of pluses and minuses) (8 points)

If you think about what each cell means, what do these diﬀerences mean in terms of the way the two types of earthquakes interact (6 points)

4) In NCI60 in the ISLR data set (100)

a. Identify the cancer types with more than 3 cell lines present.

b. From those Identify cancers with hyper or hypo active genes at the 0.2 FDR level (not independent)

c. Identify common genes between every pair of the cancers identiﬁed in b.

d. Are there any genes shared as strangely active between 3 cancers?

5) The diabetes data set is a prospective study of onset of adult diabetes given

a number of risk factors among the Pima Indian tribe. Using the diabetes.csv data set (100)

a. Separate the ﬁrst half of the data from the second half, use the ﬁrst half for training, second for testing

b. Using the training data

i. Construct the full logistic regression model for outcome

ii. Using backwards selection construct the logistic regression model with every p value for the coeﬃcients < .05 (Show Steps!!!)

c. Predict the “response” (eg type=”response”) for the full logistic regression model for

i. the training data set,

ii. the test data set,

d. Predict the “response” for the smallest logistic model from the backwards selection exercise

i. the training data set,

ii. the test data set,

e. Using random forest, build a model on the training data

f. You now have 3 models, Full Logistic, smallest logistic, and random forest. For predictions of each calculate and tabulate

i. Number of correct positives

ii. Number of False positives

iii. Number of correct negatives

iv. Number of false negatives.

g. Using the results off, is there one of the 3 methods which appears

best in modeling new results, or does it depend on whether it is more important to identify positives (predict diabetes) or negatives (predict health)

h. Now redo analysis twice using random selection of 384 out of 768 for training and the complement for testing. Is there anything you can conclude with this additional information about the merits of each approach?

6) Conceptual question: Suppose you have a null and alternative hypotheses

that are completely deﬁned in terms of the speciﬁc probability distributions they represent. What is the main diﬀerence between using a likelihood ratio test, and using bayes rule to decide between the two. (20)