# 辅导program、辅导R程序设计

- 首页 >> CS Econometrics 761

Please upload your solutions to A2L by December 19, 2022 by 23:59 EST. The

submission should be a pdf document.

This document consists of two questions: Q1, which is a combination of theory and

coding, and Q2, which is an applied question asking you to replicate the results in a

published paper. Take the time to read all explanations in this document, and pay

attention to how you report your results after running the code. Please, use full sentences

when answering questions. Treat this as a report that you may have to write for your

future job.

Most of the R code is already written for you: all R code appears in blue in this pdf

(except for this line). This means that everything that is blue can be run in R directly,

i.e. you can just copy-paste the blue code and it will run (provided that you followed all

directions given to you in this document).

You have to pay attention to how you copy-paste from the pdf1!

One of the questions is asking you to use the gmm package. Those of you with

Macs may have issues loading it, so switch to working on the R Studio Cloud,

https://login.rstudio.cloud/login?redirect=%2F.

Warning vs Error Messages in R: You may still get "warnings" in R Studio Cloud --

so pay attention if something is labeled as "error" or "warning." If it is a warning,

it’s fine, move ahead as if nothing happened. Warnings still produce correct answers.

If it’s an error, then you have to investigate.

1For example, when you copy-paste the code from pdf into R markdown, be careful with

the underscores, you will have to add them manually

the comments following ##, you may have to double check that everything that follows ##

in the pdf is commented in your R Markdown file, else you may get an error

the left arrow in R is <-, pay attention to how this gets copied from pdf, you may have to change

it manually before you run the code

1

You can suppress warning message from your output by including in your {r chunk,

warning=FALSE}. For more on this, take a look at this document.

You can also suppress the output from a regression by including in your {r chunk}

the following: {r echo = T, results = "hide"}. You may need this for Q2(e). Please

look at page 18 of PS.pdf.

– So you can write {r echo = T, warning = F, results = "hide"} which would

suppress warnings and the output.

This project should not take you more than 3 days to complete (this is a generous esti-

mate). However, you have a bit more than 2 weeks to work on this project. Given the

fact that you have plenty of extra time to work on this project:

1. you are strongly advised to start working on the assignment as soon as you get it;

2. you are strongly incentivised to start working on the project as soon as possible by

having limited contact with myself and the TAs, in the sense that I will not

answer any questions about this project by email. If you have questions, you have

to sign up for my office hours and/or those of your TAs, or ask your questions on

the discussion board on A2L.

3. you are heavily penalized for late submissions. There will be a penalty of 25%

per 12 hours of being late, no matter the excuse.

Make sure that you start early and aim to submit at least 3 days earlier than the deadline.

This should give you plenty of time to work trough any trouble shooting that you may

need to do as well as to ask questions during office hours, and, in particular, it will help

you keep calm. As most code is provided to you (in blue), this project is mostly about

paying attention to what the problem is asking you, as well as to understanding the code.

Please, pay attention, read everything at least twice, and then double check that you are

answering the question that is asked.

2

Question 1

This question has a theoretical component and a coding (in R) component (99.9% of the

code is already written for you in this text). This exercise is based on Wooldridge (2001).2

You can find this paper on A2L, under the Individual PS module.

? Read pages 87 and 88 (at a minimum) in Wooldridge (2001) to remind yourselves

about the method of moments estimator (MM) and GMM.

Reminder: GMM is a method for constructing estimators, that uses assumptions about

specific moments of the random variables. The assumptions are called “moment con-

ditions.” Moment conditions are expected values that specify the model parameters in

terms of the true moments. The sample moment conditions are the sample equivalents to

the moment conditions. GMM finds the parameter values that are closest to satisfying

the sample moment conditions. When there are more moment conditions than parame-

ters, GMM finds the parameters that get as close as possible to solving weighted sample

moment conditions. Uniform weights and optimal weights are two ways of weighting the

sample moment conditions. The uniform weights use an identity matrix to weight the

moment conditions. The optimal weights use the inverse of the covariance matrix of the

moment conditions. [All this in the videos and slides on GMM that you can find on

A2L.] GMM generalizes the method of moments (MM) by allowing the number of mo-

ment conditions to be greater than the number of parameters. Using these extra moment

conditions makes GMM more efficient than MM. When there are more moment conditions

than parameters, the estimator is said to be overidentified. GMM can efficiently combine

the moment conditions when the estimator is overidentified.

We will illustrate these points by estimating the mean of a normally distributed ran-

dom variable with mean = θ0 and variance = θ

2

0. We will be estimating θ0 by MM, GMM,

and an efficient GMM estimator.

Theoretical part

Let y = (y1, y2, . . . , yn) be a random iid sample with yi ～ N (θ0, θ20), that is, yi are normally

distributed with mean θ0 and variance θ

2

0. Note that there is one parameter here (rather

than two).

2Wooldridge (2001). Applications of Generalized Method of Moments Estimation. Journal of Eco-

nomic Perspectives, 15(4): 87-100.

3

(a) Write down the MM estimator for θ0 that only uses the moment condition E(yi) =

θ0. What is the asymptotic distribution of this estimator as n→∞?

(b) Consider the GMM estimator for θ that uses the moment conditions E[g(yi, θ0)] = 0,

where g(yi, θ) = (yi ? θ, y2i ? 2θ2)′. These are two moment conditions for one

parameter, so the choice of GMM weight matrix matters. We choose the weight

matrix.

Find an explicit expression for the resulting GMM estimator.

(c) What would have been the asymptotically optimal weight matrix choice in (b)?

What is the probability limit of the chosen Wn in (b) as n → ∞? Is the choice of

Wn in (b) asymptotically optimal?

(d) Calculate the asymptotic variance of the GMM estimator obtained in (b), and com-

pare to the asymptotic variance of the estimator in (a).

Coding part

First, we generate our data. We draw a sample of size n = 500 from the population

random variable Y that is normally distributed with mean 2 and variance 4, i.e. Y ～

N (θ0, θ20) = N (2, 4). So for this part we assume that we know the true parameter θ0 = 2.

Note that this is something specific to simulations; in reality, we do not know the true

value of θ0, but in simulations, we do. This will help us understand whether GMM is

indeed more efficient than MM. Without knowing the true θ0 we would not be able to

answer this question.

The lines below generate one sample of iid random variables drawn from this popula-

tion random variable..

set.seed(1234) ## setting the seed to be able to replicate our results

n = 500 ## sample size

y = rnorm(n,2,2) ## y is a normally distributed random variable with mean=2

and variance=4; there is no typo here, to convince yourselves type ??rnorn

in the R console and read the help file

4

Second, we compute the MM estimator in (a) based on this one sample. Using your

answer from (a) above, the MM estimator is given by

theta hat MM = mean(y)

when you copy-paste the code from pdf into R markdown, be careful with

– the underscores, you will have to add them manually

– the comments following ##, you may have to double check that everything

that follows ## in the pdf is commented in your R Markdown file, else you

may get an error

You will notice that there is a small bias by computing theta hat MM - 2.

The value of theta hat MM is based on just one draw from the population. What we

would like now is to draw S times from the population random variable Y , where S is a

large number. That is, we will generate j = 1, . . . , S samples of size n, where each sample

is drawn from Y . Each sample is used to compute the value of θ?j,n, j = 1, . . . , S; that is,

sample j = 1 produces a value for θ?1,n, ..., sample j = S produces a value for θ?S,n. We

will have a vector of size S × 1 of values

The value of our estimator will be the average across simulations, i.e. θ? = 1

S

∑S

j=1 θ?j,n.

(Note that n is the same number across simulations, i.e. n the sample size does not vary

with S, we always draw the same sample size but we draw it many times.)

If you remember, when we talk about inference, we say “if we were to repeat the

experiment many times,...” What this means is that we would like to draw different

samples of size n repeatedly (and many times) from the same population random

variable Y . We would then compute the estimator based on each sample, call

it θ?j,n, j = 1, . . . , S, and then we would average across all draws, such that the

estimator that we report in a simulation study will be

1

S

S∑

j=1

θ?j,n.

5

To do this, we add the lines:

S = 1000 ## number of draws

theta hat MM ← matrix(NA, nrow = S, ncol = 1) ## creates a vector of

NAs that has S rows and one column (row j is the value θ?j,n, j = 1, . . . , S)

for(j in 1:S){

y = rnorm(n,2,2) ## a new sample j is drawn from the population Y

theta hat MM[j] = mean(y) ## θ?j,n =mean of sample j

}

? pay attention to { and }!

So your code so far should look like this

set.seed(1234)

n = 500

S = 1000

theta hat MM ← matrix(NA,nrow=S,ncol=1)

for(j in 1:S){

y = rnorm(n, mean=2, sd=2)

theta hat MM[j] ← mean(y)

}

Run this code.

The vector theta hat MM contains 1000 values each computed on a sample drawn

from the population random variable.

(e) As we learned, estimators have sampling distributions. The estimator theta hat MM

has a sampling distribution that can be plotted by writing (after running the code

above):

hist(theta hat MM, freq=F, ylim=c(0,7)) ## histogram

lines(density(theta hat MM), col=“red”) ## sampling density

6

The CLT says that this sampling distribution is approximately normal. To

see this, we add to this plot a normal density with mean = theta hat MM

and variance = sqrt(var(theta hat MM)). So add to the code the following

line:

lines(seq(1.1, 2.5, by=.01), dnorm(seq(1.1, 2.5, by=.01),mean(theta hat MM),

sd(theta hat MM)), col=“black”)

You should now have a plot with a histogram, a red curve, and a black curve.

The red curve is the sampling distribution. The normal density is the black

curve.

Does theta hat MM that you plotted look unbiased? How can you tell simply

by looking at the plot? Does this plot show us that the sampling distri-

bution of the MM estimator is approximately normal? How can you tell?

Report now mean(theta hat MM) and var(theta hat MM). What is the

bias of mean(theta hat MM)? Does this match your plot?

(f) We will implement now the GMM estimator in (b) using the formula for the esti-

mator that you derived in (b). To write this formula in R, we write the line below:

theta hat gmm ← ((1/2)*mean(y)*mean(y?2))?(1/3)

As before, we want now to compute this for each draw S = 1, 2, ..., 1000. So

we add this to our for-loop:

theta hat MM ← matrix(NA,nrow=S,ncol=1) ## vector that stores MM es-

timator after each draw

theta hat gmm ← matrix(NA,nrow=S,ncol=1) ## vector that stores GMM

estimator from (b) after each draw

for(j in 1:S){

y = rnorm(n, mean=2, sd=2)

theta hat MM[j] ← mean(y) ## computes MM estimator

theta hat gmm[j]← ((1/2)*mean(y)*mean(y?2))?(1/3) ## computes the GMM

estimator from (b)

}

So your entire code should look like

7

set.seed(1234)

n = 500

S = 1000

theta hat MM ← matrix(NA,nrow=S,ncol=1)

theta hat gmm ← matrix(NA,nrow=S,ncol=1)

for(j in 1:S){

y = rnorm(n, mean=2, sd=2)

theta hat MM[j] ← mean(y)

theta hat gmm[j] ← ((1/2)*mean(y)*mean(y?2))?(1/3)

}

Run this code.

To compare the MM and the GMM estimator, write

c(mean(theta hat MM),var(theta hat MM))

c(mean(theta hat gmm),var(theta hat gmm))

Which estimator has the smaller bias? which one has the smaller variance? Is

this what you expected given the theory?

(g) First, install and load the gmm package, i.e.

install.packages(“gmm”)

library(gmm)

Then create the moment conditions function g(y, θ) from (b). The lines below

generate a function that takes as arguments x (which is your data) and tet

(which is your theta), and return an n x 2 matrix:

g ← function(tet,x){

m1 ← (x-tet)

m2 ← (x?2 - 2*tet?2)

f ← cbind(m1,m2)

return(f)

8

}

Write the lines above and run the code. Then test the function. To do this,

write:

yy = c(1,2,3)

g(0,yy)

Print g(0,yy) on the screen. g should be a 3 x 2 matrix

Compute g(y, θ) from (b) where y =yy (from above, yy=c(1,2,3)) and θ = 0

by hand (again, you should get a 3 x 2 matrix). How do they compare,

the g computed with the code and the g that you calculated by hand? Are

they equal? Should they be?

Now to compute the GMM estimator calculated with the optimal matrix, write

the following code:

init ← 2 ## intial value to be used in the GMM algorithm

theta hat optm gmm ← gmm(g, y, init, wmatrix=“optimal”)$coefficient ##

computes GMM estimator using the moments in g, the data y, the initial

starting value in init, and the optimal weight matrix; it stores only the

GMM estimator for the coefficient

We want now to compute S such GMM estimators. So we need to add two

lines to the code below. What are these two lines? You can use what we

did in (f) as an example. It is straight-forward if you understand what

happened in (f). Your code, with the two lines (you need to replace ??) is:

set.seed(1234)

n = 500

S = 1000

theta hat MM <- matrix(NA,nrow=S,ncol=1)

theta hat gmm ← matrix(NA,nrow=S,ncol=1)

theta hat optm gmm ← ??

init ← 2

g ← function(tet,x){

9

m1 ← (x-tet)

m2 ← (x?2 - 2*tet?2)

f ← cbind(m1,m2)

return(f)

}

for(j in 1:S){

y = rnorm(n, mean=2, sd=2)

theta hat MM[j] ← mean(y)

theta hat gmm[j] ← ((1/2)*mean(y)*mean(y?2))?(1/3)

theta hat optm gmm[j] ← ??

}

Replace ?? above and then run the code. Then report then the following:

c(mean(theta hat MM),var(theta hat MM))

c(mean(theta hat gmm),var(theta hat gmm))

c(mean(theta hat optm gmm),var(theta hat optm gmm))

Which estimator has the smallest variance, and which estimator is efficient?

Is this as you expected given what you learned about these estimators in

this class?

(h) Take the plot in (e) above, and add curves for the sampling densities of theta hat gmm

and theta hat optm gmm. Color the density of theta hat gmm blue and that of

theta hat optm gmm green. (You have to write the code for this, and you can fol-

low the code in (e) as example of how to do this). So your plot will have: histogram

of the MM estimator, samplind density of the MM estimator (in red), sampling

density of the GMM estimator (blue), and sampling density of the GMM estimator

with optimal weight matrix (in green). It should look like Figure 1 below:

10

Figure 1: Sampling densities for all three estimators.

Does this plot match with your answer from (g) above? Explain what the red, green,

and blue curves are and how they support or not your answer in (g) above.

11

Question 2

This question is based on the paper of Bedard and Dhuey (“The Persistence of Early

Childhood Maturity: International Evidence of Long-Run Age Effects”, QJE, 2006).

The paper is available on A2L under the Individual PS module.

The data used in this paper is on A2L under the same module.

You are expected to reproduce some of the results in this paper. To do this, you

will have to write R code. I provide below code that is supposed to help you with

this. For parts a and b, you pretty much have to copy-paste the code that I wrote

for you, while for d and e you have to adapt the code from lm regression to ivreg

based on the code that I provided for you for part b.

When you report your results for Q2, use the fixest package that Caleb in-

troduces in his video "Ordinary Least Squares in R", that you can find on

Eduflow under R Videos or on YouTube. You can also use the stargazer pack-

age if you find it easier to use, directions on the last page of the document whose

link you can find here or in the description of the module on Individual Project up

on A2L.

Pay particular attention to what the question is asking you! Look at the code,

look at what the question is asking you. This is an exercise in attention more than

anything, as you have the code written and you have to adapt it to the question.

The paper uses data from multiple sources. We only want to reproduce a small fraction

of the results in the paper. We only use the TIMSS (Trends in International Mathematics

and Science Study) data from Canada for 3rd and 4th graders (9 year olds in 1995), while

Bedard and Dhuey (2006) also consider many other countries and also another age group

(13 year olds). The dataset is available on A2L under the same module (“timss-canada-

1995-pop1.dta”).

Every entry in the dataset corresponds to one student. The dataset contains the

variables weight (sampling weights that can be used to obtain nationally representa-

tive estimation results) and idschool (identifier for the school of the student). The

outcome variable is math score (math test score, normalized to have mean 50 and stan-

dard deviation 10). The main explanatory variable of interest, which is treated as endoge-

nous, is age (age in months), and the the corresponding instrumental variable is r age

12

(relative age in months, see paper for the definition). Additional control variables are

grade (either 3rd or 4th grade, since students are 9 years old), female (gender dummy),

mother native (dummy for mother born in country), father native (dummy for fa-

ther born in country), both parents (dummy for both parents present in household),

calculator (dummy for calculator at home), computer (dummy for computer at home),

books100 (dummy for whether more than 100 books at home), hh size (a measure for

household size), and rural (dummy for whether school is in rural area).

(a) Start by reading the paper. Explain – in your own words – why age is probably

endogenous here. Explain how the instrumental variable (relative age) is computed.

Explain why relative age is arguably exogenous. Explain why relative age is arguably

a relevant instrument for age.

(b) Reproduce the OLS and IV (i.e. 2SLS) results for math test scores in

Canada in Table III of Bedard and Dhuey (2006). So these would be col-

umn 1 and column 4 for Math in Table III. Report your estimation results for

both unweighted OLS and IV and for population weighted OLS3 and

IV (weighted 2SLS). Code to help you with this is provided at the end of this

document, you just have to follow the directions given and then copy-paste the

code. Report your findings for non-weighted regressions and for weighted regres-

sions both with heteroscedasticity robust standard errors, and compare to Bedard

and Dhuey (2006).

– You will not obtain the same answers as in the table, which is to be expected.

Think why this may be the case. Read the description of the data in the paper,

look at the data that you have imported in R and the exact regression that

the authors ran vs the one that you ran, read my note below. What could

justify the fact that the numbers are not exactly reproduced? But even if you

cannot reproduce their results numerically, can you reproduce their qualitative

findings, e.g. how does the statistical significance of your coefficient in front of

age compare to theirs?

3However, the weights are not chosen as the inverse of the variance of the errors with

the goal of minimizing the variance, but instead are computed with the goal of making the

estimates nationally representative. The WLS standard errors will be larger here than the

OLS standard errors.

13

– Note that our normalization of the outcome variable is different from the one

used in Bedard and Dhuey (2006), because they normalize over their whole

sample, while we only normalize over the data from Canada. The household

size variable might also be defined slightly differently.

(c) For the estimation results in (b), are the signs of the coefficients on the control

variables as expected? Explain.

In the following we only perform unweighted estimation, i.e. we do not use the weight

variable anymore, which would allow us to obtain nationally representative results.4 It

also turns out to not matter much whether heteroscedasticity robust standard errors are

reported here, so it is OK to not use them in the answer to the following questions.

(d) Repeat the IV estimation in (b) for the following eight subpopulations: only female

students, only male students, only students whose mother is native to Canada, only

students whose mother is not native to Canada, only students who report to have

a calculator at home, only students who report not to have a calculator at home,

only students who report to have a computer at home, only students who report

not to have a computer at home. Code to help you with this is provided at the end

of the document. For this part, you have to adapt the code to answer the question

asked. The code that I provide is only for subpopulation of female students, so

you have to adapt the code to the other subpopulations. Interpret your findings,

focusing on the differences in age effect for the different subpopulations (e.g.

male vs. female age effect). Note that the calculator and computer dummies can

be seen as rough proxies for socioeconomic status, and mother native is probably

also positively correlated with socioeconomic status.

– You have to run the code with all covariates for the different populations, but

you can report only the coefficient for age (with its standard deviation and

t-stat). Example of how you can do this in code at the end of the document.

(e) As a robustness check, repeat the analysis in (d), but this time including dummy

variables for each school as regressors (so called “school specific fixed effects”). Code

4The results using weights are less robust and have larger standard errors, because they are driven

by a relatively small subset of individuals with large weights. Not using weights also simplifies the

implementation somewhat, e.g. xtivreg does not work with individual specific weights.

14

to help you with this is provided at the end of the document. For this part, you

have to adapt the code to IV regression. The code I gave you is for OLS (it uses

the lm function) but you need to report results for IV regressions. How do the

differences in age effects (e.g. male vs. female age effects) change compared to

(d)? Explain why the inclusion of school dummies in the regression might change

the results.

– You have to run the code with all covariates for the different populations, but

you can report only the coefficient for age (with its standard deviation and

t-stat). Example of how you can do this in code at the end of the document.

Directions and code to help you with Q2

First you have to load your data into R and install a few packages.

Go to RStudio, File, Import Dataset, From Stata..., Browse, Select the data set

that you downloaded from A2L, called “timss-canada-1995-pop1.dta”, and click

“Import.” You have now imported the Stata data set in R.

Install and load the packages

– “lmtest”,

– “MASS”,

– and you may also need “sandwich”.

Directions and code to help you with Q2(b).

To run the unweighted OLS regression (similar to that in Table III first column for

Canada), just regress math score on all the variables mentioned in the footnote of

that table that appear in your data set, i.e.

lmAPI← lm(math score～age+grade+female+mother native+father native+both parents

+calculator+computer+books100+hh size+rural,

data=timss canada 1995 pop1)

and then correct for heteroskedasticity by running:

15

coeftest(lmAPI, vcov=vcovHC(lmAPI, “HC1”) ## this line produces HAC

robust standard errors; you may need to install and load “sandwich” for

this depending on your R version

To produce the weighted OLS regression that appears in Table III, column 1, run

lmAPI w← rlm(math score～age+grade+female+mother native+father native+both parents

+calculator+computer+books100+hh size+rural,

data=timss canada 1995 pop1,weights=weight)

and then you correct for heteroskedasticity by running:

coeftest(lmAPI w, vcov=vcovHC(lmAPI w, “HC1”)

This is supposed to reproduce the weighted OLS regression with heteroskedasticity robust

standard errors in Table III, column 1 for Canada.5

To run an IV regression, install and load the AER package, and then run the un-

weighted regression

lmAPI iv← ivreg(math score～age+grade+female+mother native+father native

+both parents+calculator+computer+books100+hh size+rural | r age+grade

+female+mother native+father native+both parents+

calculator+computer+books100+hh size+rural,

data=timss canada 1995 pop1)

Pay attention to the code above. After the conditioning “|” we add all the instrument

and ALL EXOGENEOUS explanatory variables that we used in the regression. Since age

is exogeneous, it is not added. Pay attention to how the code is written as if you want to

read more on this, look up ivreg in R help or on stack exchange.

To run the weighted version of the IV regression above, add weights=weight after

data=timss canada 1995 pop1, just as we did for the weighted OLS regression.

5You will not get the same numerical answers. But are they qualitatively the same?

16

Code to help you with Q2(d).

The code below is for lm, you have to adapt it for ivreg. To run an lm regression for

different subsets of the data, we subset the data as follows. For example, for female

students, we run

lmAPI w female← lm(math score～age+grade+mother native+father native

+both parents+calculator+computer+books100+hh size+rural,

data=subset(timss canada 1995 pop1,female==1))

Pay attention to the line that runs the regression! There is no female as a covariate

anymore. If you include female as a covariate you will get an error saying that ’x’ is

singular: singular fits are not implemented in lm. It is the same for ivreg – you must

eliminate female as an exogeneous covariate when you run ivreg! Pay attention, else you

will get an error.

Code to help you with Q2(e).

The code below is for lm, you have to adapt it to ivreg. To generate school fixed

effects, you can add “factor(idschool)” as an exogeneous covariate, and run the

following:

“‘{r echo = T, results = "hide"} ## this line hides the output (in particular,

it hides the output, as it contains all ID schools which would make it too

difficult to read; run your chuck with and without results=”hide” to see

what happens)

lmAPI w female schoolFE ← rlm(math score～age+factor(idschool)+grade

+mother native+father native+both parents+calculator

+computer+books100+hh size+rural,

data=subset(timss canada 1995 pop1,female==1))

summary(lmAPI w female schoolFE )$coef[1,] ## this line reports the result

for the coefficient on age only

When you run ivreg with fixed effects, you have to include factor(idschool) to the

right of “|” and to the left of it, as it is an exogeneous covariate!

17

Please upload your solutions to A2L by December 19, 2022 by 23:59 EST. The

submission should be a pdf document.

This document consists of two questions: Q1, which is a combination of theory and

coding, and Q2, which is an applied question asking you to replicate the results in a

published paper. Take the time to read all explanations in this document, and pay

attention to how you report your results after running the code. Please, use full sentences

when answering questions. Treat this as a report that you may have to write for your

future job.

Most of the R code is already written for you: all R code appears in blue in this pdf

(except for this line). This means that everything that is blue can be run in R directly,

i.e. you can just copy-paste the blue code and it will run (provided that you followed all

directions given to you in this document).

You have to pay attention to how you copy-paste from the pdf1!

One of the questions is asking you to use the gmm package. Those of you with

Macs may have issues loading it, so switch to working on the R Studio Cloud,

https://login.rstudio.cloud/login?redirect=%2F.

Warning vs Error Messages in R: You may still get "warnings" in R Studio Cloud --

so pay attention if something is labeled as "error" or "warning." If it is a warning,

it’s fine, move ahead as if nothing happened. Warnings still produce correct answers.

If it’s an error, then you have to investigate.

1For example, when you copy-paste the code from pdf into R markdown, be careful with

the underscores, you will have to add them manually

the comments following ##, you may have to double check that everything that follows ##

in the pdf is commented in your R Markdown file, else you may get an error

the left arrow in R is <-, pay attention to how this gets copied from pdf, you may have to change

it manually before you run the code

1

You can suppress warning message from your output by including in your {r chunk,

warning=FALSE}. For more on this, take a look at this document.

You can also suppress the output from a regression by including in your {r chunk}

the following: {r echo = T, results = "hide"}. You may need this for Q2(e). Please

look at page 18 of PS.pdf.

– So you can write {r echo = T, warning = F, results = "hide"} which would

suppress warnings and the output.

This project should not take you more than 3 days to complete (this is a generous esti-

mate). However, you have a bit more than 2 weeks to work on this project. Given the

fact that you have plenty of extra time to work on this project:

1. you are strongly advised to start working on the assignment as soon as you get it;

2. you are strongly incentivised to start working on the project as soon as possible by

having limited contact with myself and the TAs, in the sense that I will not

answer any questions about this project by email. If you have questions, you have

to sign up for my office hours and/or those of your TAs, or ask your questions on

the discussion board on A2L.

3. you are heavily penalized for late submissions. There will be a penalty of 25%

per 12 hours of being late, no matter the excuse.

Make sure that you start early and aim to submit at least 3 days earlier than the deadline.

This should give you plenty of time to work trough any trouble shooting that you may

need to do as well as to ask questions during office hours, and, in particular, it will help

you keep calm. As most code is provided to you (in blue), this project is mostly about

paying attention to what the problem is asking you, as well as to understanding the code.

Please, pay attention, read everything at least twice, and then double check that you are

answering the question that is asked.

2

Question 1

This question has a theoretical component and a coding (in R) component (99.9% of the

code is already written for you in this text). This exercise is based on Wooldridge (2001).2

You can find this paper on A2L, under the Individual PS module.

? Read pages 87 and 88 (at a minimum) in Wooldridge (2001) to remind yourselves

about the method of moments estimator (MM) and GMM.

Reminder: GMM is a method for constructing estimators, that uses assumptions about

specific moments of the random variables. The assumptions are called “moment con-

ditions.” Moment conditions are expected values that specify the model parameters in

terms of the true moments. The sample moment conditions are the sample equivalents to

the moment conditions. GMM finds the parameter values that are closest to satisfying

the sample moment conditions. When there are more moment conditions than parame-

ters, GMM finds the parameters that get as close as possible to solving weighted sample

moment conditions. Uniform weights and optimal weights are two ways of weighting the

sample moment conditions. The uniform weights use an identity matrix to weight the

moment conditions. The optimal weights use the inverse of the covariance matrix of the

moment conditions. [All this in the videos and slides on GMM that you can find on

A2L.] GMM generalizes the method of moments (MM) by allowing the number of mo-

ment conditions to be greater than the number of parameters. Using these extra moment

conditions makes GMM more efficient than MM. When there are more moment conditions

than parameters, the estimator is said to be overidentified. GMM can efficiently combine

the moment conditions when the estimator is overidentified.

We will illustrate these points by estimating the mean of a normally distributed ran-

dom variable with mean = θ0 and variance = θ

2

0. We will be estimating θ0 by MM, GMM,

and an efficient GMM estimator.

Theoretical part

Let y = (y1, y2, . . . , yn) be a random iid sample with yi ～ N (θ0, θ20), that is, yi are normally

distributed with mean θ0 and variance θ

2

0. Note that there is one parameter here (rather

than two).

2Wooldridge (2001). Applications of Generalized Method of Moments Estimation. Journal of Eco-

nomic Perspectives, 15(4): 87-100.

3

(a) Write down the MM estimator for θ0 that only uses the moment condition E(yi) =

θ0. What is the asymptotic distribution of this estimator as n→∞?

(b) Consider the GMM estimator for θ that uses the moment conditions E[g(yi, θ0)] = 0,

where g(yi, θ) = (yi ? θ, y2i ? 2θ2)′. These are two moment conditions for one

parameter, so the choice of GMM weight matrix matters. We choose the weight

matrix.

Find an explicit expression for the resulting GMM estimator.

(c) What would have been the asymptotically optimal weight matrix choice in (b)?

What is the probability limit of the chosen Wn in (b) as n → ∞? Is the choice of

Wn in (b) asymptotically optimal?

(d) Calculate the asymptotic variance of the GMM estimator obtained in (b), and com-

pare to the asymptotic variance of the estimator in (a).

Coding part

First, we generate our data. We draw a sample of size n = 500 from the population

random variable Y that is normally distributed with mean 2 and variance 4, i.e. Y ～

N (θ0, θ20) = N (2, 4). So for this part we assume that we know the true parameter θ0 = 2.

Note that this is something specific to simulations; in reality, we do not know the true

value of θ0, but in simulations, we do. This will help us understand whether GMM is

indeed more efficient than MM. Without knowing the true θ0 we would not be able to

answer this question.

The lines below generate one sample of iid random variables drawn from this popula-

tion random variable..

set.seed(1234) ## setting the seed to be able to replicate our results

n = 500 ## sample size

y = rnorm(n,2,2) ## y is a normally distributed random variable with mean=2

and variance=4; there is no typo here, to convince yourselves type ??rnorn

in the R console and read the help file

4

Second, we compute the MM estimator in (a) based on this one sample. Using your

answer from (a) above, the MM estimator is given by

theta hat MM = mean(y)

when you copy-paste the code from pdf into R markdown, be careful with

– the underscores, you will have to add them manually

– the comments following ##, you may have to double check that everything

that follows ## in the pdf is commented in your R Markdown file, else you

may get an error

You will notice that there is a small bias by computing theta hat MM - 2.

The value of theta hat MM is based on just one draw from the population. What we

would like now is to draw S times from the population random variable Y , where S is a

large number. That is, we will generate j = 1, . . . , S samples of size n, where each sample

is drawn from Y . Each sample is used to compute the value of θ?j,n, j = 1, . . . , S; that is,

sample j = 1 produces a value for θ?1,n, ..., sample j = S produces a value for θ?S,n. We

will have a vector of size S × 1 of values

The value of our estimator will be the average across simulations, i.e. θ? = 1

S

∑S

j=1 θ?j,n.

(Note that n is the same number across simulations, i.e. n the sample size does not vary

with S, we always draw the same sample size but we draw it many times.)

If you remember, when we talk about inference, we say “if we were to repeat the

experiment many times,...” What this means is that we would like to draw different

samples of size n repeatedly (and many times) from the same population random

variable Y . We would then compute the estimator based on each sample, call

it θ?j,n, j = 1, . . . , S, and then we would average across all draws, such that the

estimator that we report in a simulation study will be

1

S

S∑

j=1

θ?j,n.

5

To do this, we add the lines:

S = 1000 ## number of draws

theta hat MM ← matrix(NA, nrow = S, ncol = 1) ## creates a vector of

NAs that has S rows and one column (row j is the value θ?j,n, j = 1, . . . , S)

for(j in 1:S){

y = rnorm(n,2,2) ## a new sample j is drawn from the population Y

theta hat MM[j] = mean(y) ## θ?j,n =mean of sample j

}

? pay attention to { and }!

So your code so far should look like this

set.seed(1234)

n = 500

S = 1000

theta hat MM ← matrix(NA,nrow=S,ncol=1)

for(j in 1:S){

y = rnorm(n, mean=2, sd=2)

theta hat MM[j] ← mean(y)

}

Run this code.

The vector theta hat MM contains 1000 values each computed on a sample drawn

from the population random variable.

(e) As we learned, estimators have sampling distributions. The estimator theta hat MM

has a sampling distribution that can be plotted by writing (after running the code

above):

hist(theta hat MM, freq=F, ylim=c(0,7)) ## histogram

lines(density(theta hat MM), col=“red”) ## sampling density

6

The CLT says that this sampling distribution is approximately normal. To

see this, we add to this plot a normal density with mean = theta hat MM

and variance = sqrt(var(theta hat MM)). So add to the code the following

line:

lines(seq(1.1, 2.5, by=.01), dnorm(seq(1.1, 2.5, by=.01),mean(theta hat MM),

sd(theta hat MM)), col=“black”)

You should now have a plot with a histogram, a red curve, and a black curve.

The red curve is the sampling distribution. The normal density is the black

curve.

Does theta hat MM that you plotted look unbiased? How can you tell simply

by looking at the plot? Does this plot show us that the sampling distri-

bution of the MM estimator is approximately normal? How can you tell?

Report now mean(theta hat MM) and var(theta hat MM). What is the

bias of mean(theta hat MM)? Does this match your plot?

(f) We will implement now the GMM estimator in (b) using the formula for the esti-

mator that you derived in (b). To write this formula in R, we write the line below:

theta hat gmm ← ((1/2)*mean(y)*mean(y?2))?(1/3)

As before, we want now to compute this for each draw S = 1, 2, ..., 1000. So

we add this to our for-loop:

theta hat MM ← matrix(NA,nrow=S,ncol=1) ## vector that stores MM es-

timator after each draw

theta hat gmm ← matrix(NA,nrow=S,ncol=1) ## vector that stores GMM

estimator from (b) after each draw

for(j in 1:S){

y = rnorm(n, mean=2, sd=2)

theta hat MM[j] ← mean(y) ## computes MM estimator

theta hat gmm[j]← ((1/2)*mean(y)*mean(y?2))?(1/3) ## computes the GMM

estimator from (b)

}

So your entire code should look like

7

set.seed(1234)

n = 500

S = 1000

theta hat MM ← matrix(NA,nrow=S,ncol=1)

theta hat gmm ← matrix(NA,nrow=S,ncol=1)

for(j in 1:S){

y = rnorm(n, mean=2, sd=2)

theta hat MM[j] ← mean(y)

theta hat gmm[j] ← ((1/2)*mean(y)*mean(y?2))?(1/3)

}

Run this code.

To compare the MM and the GMM estimator, write

c(mean(theta hat MM),var(theta hat MM))

c(mean(theta hat gmm),var(theta hat gmm))

Which estimator has the smaller bias? which one has the smaller variance? Is

this what you expected given the theory?

(g) First, install and load the gmm package, i.e.

install.packages(“gmm”)

library(gmm)

Then create the moment conditions function g(y, θ) from (b). The lines below

generate a function that takes as arguments x (which is your data) and tet

(which is your theta), and return an n x 2 matrix:

g ← function(tet,x){

m1 ← (x-tet)

m2 ← (x?2 - 2*tet?2)

f ← cbind(m1,m2)

return(f)

8

}

Write the lines above and run the code. Then test the function. To do this,

write:

yy = c(1,2,3)

g(0,yy)

Print g(0,yy) on the screen. g should be a 3 x 2 matrix

Compute g(y, θ) from (b) where y =yy (from above, yy=c(1,2,3)) and θ = 0

by hand (again, you should get a 3 x 2 matrix). How do they compare,

the g computed with the code and the g that you calculated by hand? Are

they equal? Should they be?

Now to compute the GMM estimator calculated with the optimal matrix, write

the following code:

init ← 2 ## intial value to be used in the GMM algorithm

theta hat optm gmm ← gmm(g, y, init, wmatrix=“optimal”)$coefficient ##

computes GMM estimator using the moments in g, the data y, the initial

starting value in init, and the optimal weight matrix; it stores only the

GMM estimator for the coefficient

We want now to compute S such GMM estimators. So we need to add two

lines to the code below. What are these two lines? You can use what we

did in (f) as an example. It is straight-forward if you understand what

happened in (f). Your code, with the two lines (you need to replace ??) is:

set.seed(1234)

n = 500

S = 1000

theta hat MM <- matrix(NA,nrow=S,ncol=1)

theta hat gmm ← matrix(NA,nrow=S,ncol=1)

theta hat optm gmm ← ??

init ← 2

g ← function(tet,x){

9

m1 ← (x-tet)

m2 ← (x?2 - 2*tet?2)

f ← cbind(m1,m2)

return(f)

}

for(j in 1:S){

y = rnorm(n, mean=2, sd=2)

theta hat MM[j] ← mean(y)

theta hat gmm[j] ← ((1/2)*mean(y)*mean(y?2))?(1/3)

theta hat optm gmm[j] ← ??

}

Replace ?? above and then run the code. Then report then the following:

c(mean(theta hat MM),var(theta hat MM))

c(mean(theta hat gmm),var(theta hat gmm))

c(mean(theta hat optm gmm),var(theta hat optm gmm))

Which estimator has the smallest variance, and which estimator is efficient?

Is this as you expected given what you learned about these estimators in

this class?

(h) Take the plot in (e) above, and add curves for the sampling densities of theta hat gmm

and theta hat optm gmm. Color the density of theta hat gmm blue and that of

theta hat optm gmm green. (You have to write the code for this, and you can fol-

low the code in (e) as example of how to do this). So your plot will have: histogram

of the MM estimator, samplind density of the MM estimator (in red), sampling

density of the GMM estimator (blue), and sampling density of the GMM estimator

with optimal weight matrix (in green). It should look like Figure 1 below:

10

Figure 1: Sampling densities for all three estimators.

Does this plot match with your answer from (g) above? Explain what the red, green,

and blue curves are and how they support or not your answer in (g) above.

11

Question 2

This question is based on the paper of Bedard and Dhuey (“The Persistence of Early

Childhood Maturity: International Evidence of Long-Run Age Effects”, QJE, 2006).

The paper is available on A2L under the Individual PS module.

The data used in this paper is on A2L under the same module.

You are expected to reproduce some of the results in this paper. To do this, you

will have to write R code. I provide below code that is supposed to help you with

this. For parts a and b, you pretty much have to copy-paste the code that I wrote

for you, while for d and e you have to adapt the code from lm regression to ivreg

based on the code that I provided for you for part b.

When you report your results for Q2, use the fixest package that Caleb in-

troduces in his video "Ordinary Least Squares in R", that you can find on

Eduflow under R Videos or on YouTube. You can also use the stargazer pack-

age if you find it easier to use, directions on the last page of the document whose

link you can find here or in the description of the module on Individual Project up

on A2L.

Pay particular attention to what the question is asking you! Look at the code,

look at what the question is asking you. This is an exercise in attention more than

anything, as you have the code written and you have to adapt it to the question.

The paper uses data from multiple sources. We only want to reproduce a small fraction

of the results in the paper. We only use the TIMSS (Trends in International Mathematics

and Science Study) data from Canada for 3rd and 4th graders (9 year olds in 1995), while

Bedard and Dhuey (2006) also consider many other countries and also another age group

(13 year olds). The dataset is available on A2L under the same module (“timss-canada-

1995-pop1.dta”).

Every entry in the dataset corresponds to one student. The dataset contains the

variables weight (sampling weights that can be used to obtain nationally representa-

tive estimation results) and idschool (identifier for the school of the student). The

outcome variable is math score (math test score, normalized to have mean 50 and stan-

dard deviation 10). The main explanatory variable of interest, which is treated as endoge-

nous, is age (age in months), and the the corresponding instrumental variable is r age

12

(relative age in months, see paper for the definition). Additional control variables are

grade (either 3rd or 4th grade, since students are 9 years old), female (gender dummy),

mother native (dummy for mother born in country), father native (dummy for fa-

ther born in country), both parents (dummy for both parents present in household),

calculator (dummy for calculator at home), computer (dummy for computer at home),

books100 (dummy for whether more than 100 books at home), hh size (a measure for

household size), and rural (dummy for whether school is in rural area).

(a) Start by reading the paper. Explain – in your own words – why age is probably

endogenous here. Explain how the instrumental variable (relative age) is computed.

Explain why relative age is arguably exogenous. Explain why relative age is arguably

a relevant instrument for age.

(b) Reproduce the OLS and IV (i.e. 2SLS) results for math test scores in

Canada in Table III of Bedard and Dhuey (2006). So these would be col-

umn 1 and column 4 for Math in Table III. Report your estimation results for

both unweighted OLS and IV and for population weighted OLS3 and

IV (weighted 2SLS). Code to help you with this is provided at the end of this

document, you just have to follow the directions given and then copy-paste the

code. Report your findings for non-weighted regressions and for weighted regres-

sions both with heteroscedasticity robust standard errors, and compare to Bedard

and Dhuey (2006).

– You will not obtain the same answers as in the table, which is to be expected.

Think why this may be the case. Read the description of the data in the paper,

look at the data that you have imported in R and the exact regression that

the authors ran vs the one that you ran, read my note below. What could

justify the fact that the numbers are not exactly reproduced? But even if you

cannot reproduce their results numerically, can you reproduce their qualitative

findings, e.g. how does the statistical significance of your coefficient in front of

age compare to theirs?

3However, the weights are not chosen as the inverse of the variance of the errors with

the goal of minimizing the variance, but instead are computed with the goal of making the

estimates nationally representative. The WLS standard errors will be larger here than the

OLS standard errors.

13

– Note that our normalization of the outcome variable is different from the one

used in Bedard and Dhuey (2006), because they normalize over their whole

sample, while we only normalize over the data from Canada. The household

size variable might also be defined slightly differently.

(c) For the estimation results in (b), are the signs of the coefficients on the control

variables as expected? Explain.

In the following we only perform unweighted estimation, i.e. we do not use the weight

variable anymore, which would allow us to obtain nationally representative results.4 It

also turns out to not matter much whether heteroscedasticity robust standard errors are

reported here, so it is OK to not use them in the answer to the following questions.

(d) Repeat the IV estimation in (b) for the following eight subpopulations: only female

students, only male students, only students whose mother is native to Canada, only

students whose mother is not native to Canada, only students who report to have

a calculator at home, only students who report not to have a calculator at home,

only students who report to have a computer at home, only students who report

not to have a computer at home. Code to help you with this is provided at the end

of the document. For this part, you have to adapt the code to answer the question

asked. The code that I provide is only for subpopulation of female students, so

you have to adapt the code to the other subpopulations. Interpret your findings,

focusing on the differences in age effect for the different subpopulations (e.g.

male vs. female age effect). Note that the calculator and computer dummies can

be seen as rough proxies for socioeconomic status, and mother native is probably

also positively correlated with socioeconomic status.

– You have to run the code with all covariates for the different populations, but

you can report only the coefficient for age (with its standard deviation and

t-stat). Example of how you can do this in code at the end of the document.

(e) As a robustness check, repeat the analysis in (d), but this time including dummy

variables for each school as regressors (so called “school specific fixed effects”). Code

4The results using weights are less robust and have larger standard errors, because they are driven

by a relatively small subset of individuals with large weights. Not using weights also simplifies the

implementation somewhat, e.g. xtivreg does not work with individual specific weights.

14

to help you with this is provided at the end of the document. For this part, you

have to adapt the code to IV regression. The code I gave you is for OLS (it uses

the lm function) but you need to report results for IV regressions. How do the

differences in age effects (e.g. male vs. female age effects) change compared to

(d)? Explain why the inclusion of school dummies in the regression might change

the results.

– You have to run the code with all covariates for the different populations, but

you can report only the coefficient for age (with its standard deviation and

t-stat). Example of how you can do this in code at the end of the document.

Directions and code to help you with Q2

First you have to load your data into R and install a few packages.

Go to RStudio, File, Import Dataset, From Stata..., Browse, Select the data set

that you downloaded from A2L, called “timss-canada-1995-pop1.dta”, and click

“Import.” You have now imported the Stata data set in R.

Install and load the packages

– “lmtest”,

– “MASS”,

– and you may also need “sandwich”.

Directions and code to help you with Q2(b).

To run the unweighted OLS regression (similar to that in Table III first column for

Canada), just regress math score on all the variables mentioned in the footnote of

that table that appear in your data set, i.e.

lmAPI← lm(math score～age+grade+female+mother native+father native+both parents

+calculator+computer+books100+hh size+rural,

data=timss canada 1995 pop1)

and then correct for heteroskedasticity by running:

15

coeftest(lmAPI, vcov=vcovHC(lmAPI, “HC1”) ## this line produces HAC

robust standard errors; you may need to install and load “sandwich” for

this depending on your R version

To produce the weighted OLS regression that appears in Table III, column 1, run

lmAPI w← rlm(math score～age+grade+female+mother native+father native+both parents

+calculator+computer+books100+hh size+rural,

data=timss canada 1995 pop1,weights=weight)

and then you correct for heteroskedasticity by running:

coeftest(lmAPI w, vcov=vcovHC(lmAPI w, “HC1”)

This is supposed to reproduce the weighted OLS regression with heteroskedasticity robust

standard errors in Table III, column 1 for Canada.5

To run an IV regression, install and load the AER package, and then run the un-

weighted regression

lmAPI iv← ivreg(math score～age+grade+female+mother native+father native

+both parents+calculator+computer+books100+hh size+rural | r age+grade

+female+mother native+father native+both parents+

calculator+computer+books100+hh size+rural,

data=timss canada 1995 pop1)

Pay attention to the code above. After the conditioning “|” we add all the instrument

and ALL EXOGENEOUS explanatory variables that we used in the regression. Since age

is exogeneous, it is not added. Pay attention to how the code is written as if you want to

read more on this, look up ivreg in R help or on stack exchange.

To run the weighted version of the IV regression above, add weights=weight after

data=timss canada 1995 pop1, just as we did for the weighted OLS regression.

5You will not get the same numerical answers. But are they qualitatively the same?

16

Code to help you with Q2(d).

The code below is for lm, you have to adapt it for ivreg. To run an lm regression for

different subsets of the data, we subset the data as follows. For example, for female

students, we run

lmAPI w female← lm(math score～age+grade+mother native+father native

+both parents+calculator+computer+books100+hh size+rural,

data=subset(timss canada 1995 pop1,female==1))

Pay attention to the line that runs the regression! There is no female as a covariate

anymore. If you include female as a covariate you will get an error saying that ’x’ is

singular: singular fits are not implemented in lm. It is the same for ivreg – you must

eliminate female as an exogeneous covariate when you run ivreg! Pay attention, else you

will get an error.

Code to help you with Q2(e).

The code below is for lm, you have to adapt it to ivreg. To generate school fixed

effects, you can add “factor(idschool)” as an exogeneous covariate, and run the

following:

“‘{r echo = T, results = "hide"} ## this line hides the output (in particular,

it hides the output, as it contains all ID schools which would make it too

difficult to read; run your chuck with and without results=”hide” to see

what happens)

lmAPI w female schoolFE ← rlm(math score～age+factor(idschool)+grade

+mother native+father native+both parents+calculator

+computer+books100+hh size+rural,

data=subset(timss canada 1995 pop1,female==1))

summary(lmAPI w female schoolFE )$coef[1,] ## this line reports the result

for the coefficient on age only

When you run ivreg with fixed effects, you have to include factor(idschool) to the

right of “|” and to the left of it, as it is an exogeneous covariate!

17