讲解GR4206留学生、辅导R程序语言、讲解RMarkdown file

- 首页 >> CS


Lab 1

Enter Your Name and UNI Here

Sep 13, 2019

Instructions

Before you leave lab today make sure that you upload an RMarkdown file to the canvas page (this should

have a .Rmd extension) as well as the pdf output after you have knitted the file (this will have a .pdf

extension). Note that since you have already knitted this file, you should see both a Lab1_UNI.pdf and a

Lab1_UNI.Rmd file in your GR4206 folder. Click on the Files tab to the right to see this. The files you

upload to the Canvas page should be updated with commands you provide to answer each of the questions

below. You can edit this file directly to produce your final solutions.

Background: The Normal Distribution

Recall from your probability class that a random variable X is normally-distributed with mean µ and variance

(denoted X ∼ N(µ, σ2)) if it has a probability density function, or pdf, equal to f(x) = 1

In R we can simulate N(µ, σ2

) random variables using the rnorm() function. For example,

rnorm(n = 5, mean = 10, sd = 3)

## [1] 8.120639 10.550930 7.493114 14.785842 10.988523

outputs 5 normally-distributed random variables with mean equal to 10 and standard deviation (this is σ)

equal to 3. If the second and third arguments are ommited the default rates are mean = 0 and sd = 1,

which is referred to as the “standard normal distribution”.

Tasks

Sample means as sample size increases

1) Generate 100 random draws from the standard normal distribution and save them in a vector named

normal100. Calculate the mean and standard deviation of normal100. In words explain why these

values aren’t exactly equal to 0 and 1.

# You'll want to type your response here. Your response should look like:

# normal100 <-

# Of course, your answer should not be commented out.

2) The function hist() is a base R graphing function that plots a histogram of its input. Use hist() with

your vector of standard normal random variables from question (1) to produce a histogram of the

standard normal distribution. Remember that typing ?hist in your console will provide help documents

for the hist() function. If coded properly, these plots will be automatically embedded in your output

file.

3) Repeat question (1) except change the number of draws to 10, 1000, 10,000, and 100,000 storing the

results in vectors called normal10, normal1000, normal10000, normal100000.

4) We want to compare the means of our four random draws. Create a vector called sample_means

that has as its first element the mean of normal10, its second element the mean of normal100, its

third element the mean of normal1000, its fourth element the mean of normal10000, and its fifth

element the mean of normal100000. After you have created the sample_means vector, print the

contents of the vector and use the length() function to find the length of this vector. (it should be

five). There are, of course, multiple ways to create this vector. Finally, explain in words the pattern we

are seeing with the means in the sample_means vector.

Sample distribution of the sample mean

5) Let’s push this a little farther. Generate 1 million random draws from a normal distribution with µ = 3

and σ

2 = 4 and save them in a vector named normal1mil. Calculate the mean and standard deviation

of normal1mil.

6) Find the mean of all the entries in normal1mil that are greater than 3. You may want to generate a

new vector first which identifies the elements that fit the criteria.

7) Create a matrix normal1mil_mat from the vector normal1mil that has 10,000 columns (and

therefore should have 100 rows).

8) Calculate the mean of the 1234th column.

9) Use the colSums() functions to calculate the means of each column of normal1mil_mat. Remember,colSums will give you help documents about this function. Save the vector of column means with an

appropriate name as it will be used in the next task.

10) Finally, produce a histogram of the column means you calculated in task (9). What is the distribution

that this histogram approximates (i.e. what is the distribution of the sample mean in this case)?



站长地图