讲解GR4206留学生、辅导R程序语言、讲解RMarkdown file
- 首页 >> CSLab 1
Enter Your Name and UNI Here
Sep 13, 2019
Instructions
Before you leave lab today make sure that you upload an RMarkdown file to the canvas page (this should
have a .Rmd extension) as well as the pdf output after you have knitted the file (this will have a .pdf
extension). Note that since you have already knitted this file, you should see both a Lab1_UNI.pdf and a
Lab1_UNI.Rmd file in your GR4206 folder. Click on the Files tab to the right to see this. The files you
upload to the Canvas page should be updated with commands you provide to answer each of the questions
below. You can edit this file directly to produce your final solutions.
Background: The Normal Distribution
Recall from your probability class that a random variable X is normally-distributed with mean µ and variance
(denoted X ∼ N(µ, σ2)) if it has a probability density function, or pdf, equal to f(x) = 1
In R we can simulate N(µ, σ2
) random variables using the rnorm() function. For example,
rnorm(n = 5, mean = 10, sd = 3)
## [1] 8.120639 10.550930 7.493114 14.785842 10.988523
outputs 5 normally-distributed random variables with mean equal to 10 and standard deviation (this is σ)
equal to 3. If the second and third arguments are ommited the default rates are mean = 0 and sd = 1,
which is referred to as the “standard normal distribution”.
Tasks
Sample means as sample size increases
1) Generate 100 random draws from the standard normal distribution and save them in a vector named
normal100. Calculate the mean and standard deviation of normal100. In words explain why these
values aren’t exactly equal to 0 and 1.
# You'll want to type your response here. Your response should look like:
# normal100 <-
# Of course, your answer should not be commented out.
2) The function hist() is a base R graphing function that plots a histogram of its input. Use hist() with
your vector of standard normal random variables from question (1) to produce a histogram of the
standard normal distribution. Remember that typing ?hist in your console will provide help documents
for the hist() function. If coded properly, these plots will be automatically embedded in your output
file.
3) Repeat question (1) except change the number of draws to 10, 1000, 10,000, and 100,000 storing the
results in vectors called normal10, normal1000, normal10000, normal100000.
4) We want to compare the means of our four random draws. Create a vector called sample_means
that has as its first element the mean of normal10, its second element the mean of normal100, its
third element the mean of normal1000, its fourth element the mean of normal10000, and its fifth
element the mean of normal100000. After you have created the sample_means vector, print the
contents of the vector and use the length() function to find the length of this vector. (it should be
five). There are, of course, multiple ways to create this vector. Finally, explain in words the pattern we
are seeing with the means in the sample_means vector.
Sample distribution of the sample mean
5) Let’s push this a little farther. Generate 1 million random draws from a normal distribution with µ = 3
and σ
2 = 4 and save them in a vector named normal1mil. Calculate the mean and standard deviation
of normal1mil.
6) Find the mean of all the entries in normal1mil that are greater than 3. You may want to generate a
new vector first which identifies the elements that fit the criteria.
7) Create a matrix normal1mil_mat from the vector normal1mil that has 10,000 columns (and
therefore should have 100 rows).
8) Calculate the mean of the 1234th column.
9) Use the colSums() functions to calculate the means of each column of normal1mil_mat. Remember,colSums will give you help documents about this function. Save the vector of column means with an
appropriate name as it will be used in the next task.
10) Finally, produce a histogram of the column means you calculated in task (9). What is the distribution
that this histogram approximates (i.e. what is the distribution of the sample mean in this case)?