代写ECON 221 - Statistical Methods I Problem Set # 1代做留学生SQL语言程序
- 首页 >> Python编程Department of Economics
ECON 221 - Statistical Methods I
Problem Set # 1
Instructions:
• Please, write your answers in a separate file (i.e. Word), save it as a PDF document and name it with your last name and your student ID number linked by an underscore (i.e. Yourname 12345678.pdf).
• Answer all questions. Clearly indicate which question each answer belongs to.
• Include only the relevant parts of the R output, summarize the results in a separate table, if needed.
• You need to upload on Moodle 2 files:
1. The PDF file described above with your answers and relevant R output.
2. Your R code file, with comments, named the same way as the PDF file, with an extension *.R (i.e. Yourname 12345678.R). Your R code must NOT include your answers.
I. Problems - Use R for your computations. You have to show your work. No credit without an explanation (20 marks each).
1. Consider the Wages data set posted on Moodle: (2 marks each)
(a) Are there any missing values in any of the variables in the data set?
(b) If there are, how would you remove them?
(c) Create a new variable, WAGE, which is the exponential of LNWAGE. Is this new variable categorical or numerical? Qualitative or quantitative? Nominal, ordinal, interval or ratio?
(d) Create a histogram for that variable by choosing the appropriate number of bins.
(e) Compute the mean and median for the wage variable. Which is higher?
(f) Compute the skewness of the WAGE variable. Is it skewed, left or right, why?
(g) Compute the 5-number summary for that variable and produce a boxplot. Interpret the numbers.
(h) Compute the range, IQR, min, max, population and sample variance and standard deviation for the WAGE variable.
(i) Are there diferences between population and sample measures for variances and standard devia- tions? Why?
(j) Compute and interpret the maximum value of the z-score for the WAGE variable.
2. Construct a new character variable, JOB, which contains “PROF” if PROF=1, “CLER” if CLER=1, etc. (5 marks each)
(a) Is this new variable categorical or numerical? Qualitative or quantitative? Nominal, ordinal, interval or ratio?
(b) Create appropriate table(s) and chart(s) for the JOB variable (frequency distribution, pie chart, Pareto chart, barplot).
(c) Convert the JOB variable into a factor variable and present the mean WAGE, AGE, EX for each group.
(d) Briefly interpret your findings in part (c) in terms of how wages (WAGE), age (AGE) and expe- rience (EX) vary across job categories (JOB).
3. Now consider the relationships between variables (5 marks each):
(a) Provide a scatter plot between WAGE on the vertical axis and AGE on the horizontal. Label the axes and provide a title for the plot. Provide intuition about the observed nature of the relationship.
(b) Compute and interpret the covariance and correlation coefficient for WAGE and AGE.
(c) Redo (a) and (b) with the education variable (ED) and experience variable (EX), instead of AGE.
(d) Compute, present and interpret the entire correlation matrix between WAGE, AGE, ED, EX, FE and UNION.
4. Now consider relationships between different groups (5 marks each):
(a) Generate 2 new variables, WAGE F and WAGE M for females and males, respectively, depending on the variable FE (FE=1 for WAGE F). Compute the mean and median for those 2 variables. Which one is higher across the 2 groups? What does the comparison of mean and median tell us about the symmetry or skewness of the wage distributions?
(b) Repeat (a) with the non-white (NONWH) and union (UNION) variables instead of FE variable.
(c) Present a table ranking from highest to lowest the mean wage for the different job categories. Does the order make sense?
(d) Briefly summarize your results in parts (a)-(c), highlighting the factors that contribute to wage differences and the direction of their effects.
5. Now consider some probabilities (4 marks each):
(a) Calculate the probability that a person taken at random is a member of a union (i.e. UNION=1).
(b) Calculate the probability that a person taken at random is a married female (i.e. MARR=1 and FE=1).
(c) Calculate the probability that a person makes less than $9 an hour given they are female.
(d) Present a contingency table between job category (JOB) and union membership (UNION).
(e) Continue with part (d) and calculate the respective joint, marginal, and conditional proba- bilities for all categories. (i.e. P(UNION), P(JOB), P(JOB&UNION), P(JOBjUNION), and P(UNIONjJOB)).
APPENDIX - Data Set Variables Description
1. ED - years of education.
2. SOUTH - 1 if person resides in the South.
3. NONWH - 0 if person is Caucasian.
4. HISP - 1 if person is Hispanic.
5. FE - 1 if person is female.
6. MARR - 1 if person is married and spouse is present in the same household.
7. MARRFE - 1 if the person is a married female.
8. EX - years of experience on labour market.
9. EXSQ - years of experience on labour market squared.
10. UNION - 1 if union member.
11. LNWAGE - natural logarithm of hourly wage (in USD).
12. AGE - age in years.
13. MANUF - 1 if person works in manufacturing sector.
14. CONSTR - 1 if person works in construction sector.
15. MANAG - 1 if person works in management or administration.
16. SALES - 1 if person works in sales sector.
17. CLER - 1 if person is an o伍ce worker.
18. SERV - 1 if person works in services sector.
19. PROF - 1 if person is a professional or technical worker.