代写STAT 675: Final project 2024代做留学生SQL 程序

2025.04.08 - 首页 >> Matlab编程

STAT 675: Final project 2024

The final project is required to be done by a group of two people (or three people for one group).

The final group presentations will be given on Mon, May 5 and Wed, May 7, 2025 during the lecture time. Note that the class time for these two days will be extended to 2 hours (10:45am-12:45pm), and because of the extension, we will not have a lecture on April 30. Instead, I will hold an office hour during our lecture time 11am-12:20pm on April 30 to answer your questions. Each group needs to prepare slides and to give about 20-25 minutes presentations plus 10 minutes Q&A. Each group is required to send the final slides to me via email by 10:00am of your presentation date for preparation.

Please send me your group members by April 16. Each group only needs to send me one email with all members cc’ed. If you’d like me to choose a group member for you, please also send me an email by April 16. Each group is required to submit one final project proposal by Mon, April 28, 2025. Each group only needs to submit one proposal but every group member should be involved with the proposal writing. The proposal should contain information regarding the topic that you choose, specific data (e.g., data name, sample size, variables, and resources) that you plan to study, and objectives. The proposal can be short (one or two paragraphs) within one page limit. The proposal will not be graded but I will take a look.

The final report is due by 11:59pm, Friday, May 16, 2025. Each group member needs to submit an individually and independently written report of maximum ten pages (12 font, single or 1.5 lines spacing), not including computing codes. Computing codes should be given in a supplementary file along with any other materials that you might want to cover. Note that although this is a group project, each group member must submit an independently written report. Group members are not allowed to copy from each other. If such an issue is identified, students might receive zero point on the report.

Note that since this is a group project, group members should collaborate and work together to solve problems that you might encounter (through zoom or in person meetings). You might not ask me questions without a group discussion. When you ask me questions via email (after group discussions), please cc all group members in the email.

Please choose one of the following topics for your final project.

Topics

Topic 1:

Study topics on modeling categorical data. This topic contains Part 1 and Part 2 (you need to complete both parts):

Part 1: Apply Logistic Regression to a relatively large-scale dataset (large sample size, e.g., n > 200 and/or large number of predictors, e.g, p > 10 or 20) with binary outcomes. Your analysis should be complete and should incorporate at least the following procedures: exploratory data analysis, model building, model selection, model diagnostics, model interpretation, and prediction. In addition, you are required to fit Logistic Regression with Lasso Penalty and Generalized Additive Models (GAM) for the same data. Discuss if it might/might not work well, and compare them with the standard Logistic Regression.

Part 2: Apply Multinomial Logistic Regression or Ordinal Logistic Regression to a large-scale dataset (large sample size, e.g., n > 200 or large number of predictors, e.g, p > 10) with multi- categorial outcomes or ordinal outcomes (more than two levels). Your analysis should be complete and should incorporate at least the following procedures: exploratory data analysis, model building, model selection, model diagnostics, model interpretation, and prediction.

You might also try Lasso and GAM if possible and compare the results (optional).

For both Part 1 and Part 2, you can find data on your own (e.g., you are particularly interested in some data) or you might choose datasets from online resources such as:

https://www.kaggle.com/

https://archive.ics.uci.edu/ml/datasets.html

Before data analysis, each group might need to perform. data cleaning and data processing (e.g., deal with missing values, covert to desired data format for analysis). To encourage group collaboration and independent study, I will not provide help on this step. You might encounter highly unbalanced data, e.g., proportion of events or non-events is very small. In this case, you need to do some study on how to address highly unbalanced data before your analysis.

Topic 2:

Study topics on modeling count data. This topic contains Part 1 and Part 2 (you need to complete both parts):

Part 1: Apply Poisson and Negative Binomial regressions to a dataset with count outcome variables (different from lecture and textbook examples). Your analysis should be complete and should incorporate at least the following procedures: exploratory data analysis, model building, model selection, model diagnostics, model interpretation, and prediction.

Part 2: Study topics on Modeling count data with zero inflated models.

Real-life count data are frequently characterized by overdispersion and excess zeros (Lambert 1992; Greene 1994). Zero-inflated count models provide a way of modeling the excess zeros in addition to allowing for overdispersion. Such models assume that the data are a mixture of two separate data generation processes: one generates only zeros, and the other is either a Poisson or a negative binomial data-generating process. The result of a Bernoulli trial is used to determine which of the two processes generates an observation.

Apply both Zero-inflated Poisson and Zero-inflated Negative Binomial models to a dataset with count outcomes (different from lecture and textbook examples). Your analysis should be complete and should incorporate at least the following procedures: exploratory data analysis, model building, model selection, model diagnostics, model interpretation, and prediction. You might compared with regular Poisson or Negative Binomial models.

For both Part 1 and Part 2, you might find data on your own (e.g., you are particularly interested in some data) or you might choose datasets from online resources such as listed in Topic 1.

Topic 3:

A topic (related to logistic regression, GLM, or categorical data analysis) chosen by your own and approved by the instructor (should be comprehensive and might include advanced topics). If your project involves topics that are not covered by STAT675, you are required to provide introduction on these topics so other students in the class can understand.

Your analysis should be complete and should incorporate at least the following procedures: exploratory data analysis, model building, model selection, model diagnostics, model interpretation, and prediction.

NOTE: The evaluation of the project will be based both on your presentation and your final writing report. Again, to encourage group collaboration and independent study, I will only provide suggestions/advice on potential questions that you might have. I cannot help to solve specific problems/questions such as data cleaning/processing, programming/coding issues, etc. Each group should start the final project as early as possible (I will not answer questions at the last minutes, e.g. the day before/on the presentation date). Don’t wait till the last moment. See the “Instructions and Grading Criteria” file for further requirements and information on the final project.