代做Homework 1 2024代写R语言
- 首页 >> CSHomework 1
The file ‘affairs.csv’ contains a survey conducted on married adults, and gives measures of marital satisfaction along with basic statistics about the marriage. Variable descriptions are as follows:
variable |
variable label |
id |
identifier |
male |
=1 if male |
age |
in years |
yrsmarr |
years married |
kids |
=1 if have kids |
relig |
5 = very relig., 4 = somewhat, 3 = slightly, 2 = not at all, 1 = anti |
educ |
years schooling |
occup |
occupation, reverse Hollingshead scale |
ratemarr |
5 = vry hap marr, 4 = hap than avg, 3 = avg, 2 = smewht unhap, 1 = vry unhap |
naffairs |
number of affairs within last year |
Your answers should be sufficiently well explained that we can tell what you did; good grades will be given to correct statistics and clear, concise interpretation. Good, clear labeling of figures and tables will be a part of the grade for this homework. All figures and tables should be sufficiently clearly labeled that a knowledgeable reader could get the point simply by looking at the graphic, without making any reference to external text.
a. It is plausible that the probability of having any affairs varies in a non-linear way with years of marriage. Does it? Give the OLS equation that tests this question in terms of the relationship with years of marriage, incorporating the linear and quadratic term in a single regression. Show the year at which the probability peaks using both a direct graph of the data as well as a calculus-driven technique based on your regression coefficients.
b. Generate a variable for the age at which the individual was married, and draw a graph which gives the average age of marriage across the seven education categories. What do you see?
c. ‘Gender is not an important determinant of the likelihood of having an affair.’
Do the data support this statement? Using both a linear and a non-linear model, conduct a thorough analysis and put together a single table of results and a short statement describing how to interpret the effects in this table to answer the question.
d. Estimate the linear probability (OLS) model explaining the probability of having an affair, using ‘relig’ as a continuous cardinal OLS explanatory variable. Give a succinct interpretation of what the marginal coefficient means. Is the variable ‘relig’ in fact cardinal? Is it ordinal?
e. Estimate this relationship using a probit with marginal effects calculated. Describe whether you see any difference and what you ascribe this difference to.
f. Now estimate a linear probability model having ‘dummied out’ religion, omitting ‘very religious’ and give an interpretation of the coefficients on the dummies.
g. Generate a dummy=1 if people report to being ‘very’ or ‘somewhat’ religious, and 0 otherwise. Please create single, well-labelled figure that shows the distribution of years married for each value of this dummy. Please comment on the timing of marital problems in non-religious versus religious households.
h. Look at the coding for the variable ‘ratemarr’. What kind of variable is this? What would be the problem with trying to use ‘ratemarr’ as a dependent variable in an OLS equation? Discuss an alternative estimation strategy, describing how we can use the underlying latent variable to help us think about the problem.
i. ‘Religion improves marital fidelity, and so programs seeking to promote stable families should include a religious aspect in order to increase their efficacy.’
Give a concise paragraph discussing this statement from a statistical (and only a statistical) perspective. Can you confirm this hypothesis of a causal relationship given your answers to questions f and g? Can you reject it?
j. Now imagine that you discover that serving in Vietnam had a strong impact on the men in the survey’s probability of being religious (in whatever direction), and that you had access to their Vietnam draft lottery numbers. Describe a strategy that you might employ to use this information as an instrument for religion in predicting the number of affairs, and give the R code that you would need to do it, using age and education as control variables in the second-stage regression.
k. Is this a valid instrument? Why or why not?