Data Analysis and Statistical Inference with R

- 首页 >> Algorithm 算法

DUE IN: Friday, 06.04.2018 at 23.59,

HOW: electronically in pdf-format via submission to www.turnitin.com

Class id: depends on lab group (see announcement on piazza.com)

enrollment password: 20TiTaNic18

Please register for the class on turnitin ahead of time.

GROUP WORK: is allowed with a maximum of 2 persons per group. PLEASE stay within the

same group throughout the semester. Only one solution is accepted and graded per group.

Please include the names of all group members on each assignment.

HOW MANY: There will be a total of seven homework assignments in this semester. We will

do a random selection of questions to be graded. Each week a total of ten points can be

gained. Only the five best homeworks will be counted.

DUE DATES: 16.02., 23.02., 02.03., 09.03., 16.03., 23.03., 06.04. (tentatively, subject to change)

FORMAT: Please do the required analyses and provide answers in complete sentences. Provide

the R syntax for the commands. Extract and report those statistics that are

relevant; do not copy complete R output without providing proper answers to the assignment

questions. Integrate requested figures or tables into your document and give a brief verbal

comment/caption on them.

Animals

Biologists believe that there is a relationship between the size of the body and the size of the brain.

Data were collected from 65 animals of which the body weight and the brain weight got assessed.

Given the data in the file animals.Rdata which is posted on Campusnet you want to explore the

link between brain and body weight. In the data set, brain weight is measured in g, body weight

in kg.

1. In a first step you graphically check the linearity of the relationship and look for potential

transformations to improve linearity.

(a) Assess whether there is a linear relationship between brain weight and body weight by

looking at a scatterplot of brain weight versus body weight. [Hint: The scatterplotcommand

in Rcmdr provides a ready-made function for adding the regression line and

marginal boxplots.]

(b) Now, plot the variables on a logarithmic scale (use the logarithm to base 10 here). Draw

three scatterplots: one for either of the two scales transformed and one with both scales

transformed. Which scatterplot shows the clearest linear relationship?

1

(c) Perform logarithmic transformations (using the natural logarithm with base e) for body

weight and brain weight and draw three scatterplots: one for either of the two variables

in the original form and the other transformed, and one for both variables transformed.

Which scatterplot shows the clearest linear relationship? How do the plots here differ

form the ones obtained in Question 1b?

2. In a second step, you explore correlation and linear regression models on this data set and

perform some model checks as well.

(a) “Homoscedasticity”: Graphically inspect whether the variability in scores for logarithmic

brain weight is roughly the same at all values of logarithmic body weight.

(b) “Normality”: Graphically inspect whether the logarithmically transformed scores for

body weight and brain weight are normally distributed. Use the commands qqnorm or

qqline for that.

3. In the next step, you explore correlation and linear regression models on this data set.

(a) Calculate the Pearson correlation coefficient to determine whether logarithmic body

weight is related to the logarithmic brain weight. Interpret!

(b) Compute a linear regression model for logarithmic body weight depending on the logarithmic

brain weight.

4. Compute a linear regression model for logarithmic brain weight depending on the logarithmic

body weight. How do you interpret the output in terms of the original variables, body and

brain weight?

5. Calculate the standard deviations of logarithmic body and logarithmic brain weights. Then

check that the regression slopes obtained in the two models above satisfy the equation

bx,y = r

sy

sx

by,x = r

sx

sy

6. In the scatter plot using logarithmically transformed brain and body weight, you can see

three observations on the very right of the plot representing animals having rather large

values for body weight and respectively small values for brain weight. Which animals are

these? Compute a linear regression model that leaves out these points. Did the quality of

the model as measured by adjusted R-squared improve? Why?

7. Does the regression model in Question 6 prove that a higher body weight causes a higher

brain weight?

8. Now you are using the model obtained in Question 6 to predict brain weight for some animals.

(a) Which brain weight would you predict for a Southern long-nosed armadillo with a body

weight of 3.6 kg, and which for a female blue whale with a body weight of 150 tons?

(b) Which one of the two predictions you just made do you find more reliable? Why?


站长地图