代写IB1500 Foundations of Data Analysis for Management 2023-24代写留学生Matlab语言程序
- 首页 >> CSIB1500
Foundations of Data Analysis for Management
Project on Data Analysis Resit, 2023-24
Assignment Instructions
All assignments must be submitted ONLINE via my.wbs by 12pm (midday) UK time on the date displayed against this assessment.
Please ensure that you have inserted a completed assignment coversheet, which must be included as the first page of your script. This should include your Student ID number, but not your name.
Word Limit
3000 word limit.
Word Count Policy
WBS has a school-wide policy on word counts. This is strictly enforced to ensure consistency across modules and programme. You can find more information about this policy in your Student Handbook under Academic Practice - 7i. Word count policy.
This is a strict limit not a guideline: any piece submitted with more words than the limit will result in the excess not being marked.
Academic Practice
Please ensure you read the full guidelines for Academic Practice in the Undergraduate Handbook and ensure you understand it. If in doubt, please seek clarification in advance of your submission. This includes important information on:
. Cheating, plagiarism and collusion
. Correct referencing
. Using internet sources in assessments
. Academic writing
. English Language support
. Word count policy
When you submit this assignment online, you will be required to tick a declaration box indicating that the work involved is entirely your own. Each assignment will be put through plagiarism software to identify any collusion or inadequate referencing of materials used from different sources. Please do not submit images of your typed work unless you have been specifically requested to do so.
We would consider taking action if your work:
1. is too reliant on the words of particular authors (rather than presenting your ideas in your own words), if the essay uses the ideas or words of an author without referencing them or putting their words into quotations (plagiarism).
2. suggests that you have worked very closely with another student or students (unless explicitly asked to do so by your Module Leader/Tutor) (collusion).
3. includes unreferenced work that you have previously submitted for any accredited course of study (unless explicitly asked to do so by your Module Leader/Tutor) (self-plagiarism).
The Use of Artificial Intelligence (AI)
The University recognises an increasing number of technologies such as Artificial Intelligence and that they may be applicable in your completing this assessment. The assessment brief sets out specific requirements or restrictions, and your student handbook has further guidance and advice.
You are reminded that the inappropriate use of such a technology may constitute a breach of University policy, such as the Proofreading Policy or Regulation 11 (Academic Integrity). If you breach these policies, it may have significant consequences for your studies. Please make sure you read and understand the assessment brief and how AI may or may not be used.
If a generative AI or similar is permitted and has been used you MUST make clear why you used such a tool or service, what you used it for and you will be obliged to confirm that you take sole intellectual ownership of any submitted work. As appendices, and as part of your submitted work, you must provide screenshots of the question and the AI-generated response, alongside an explanation of how the content has been utilised. You should note the relevant reference alongside each screenshot.
When you submit you must complete (physically or electronically) a declaration. This requires you to explain the use of any AI. Failure to disclose at the point of submission may be prejudicial in any later investigations should they arise.
For this assessment the use of AI is: Prohibited
You MUST NOT use any generative Artificial Intelligence in this assessment unless specifically authorised for reasonable adjustments. You MAY use non-generative tools such as a spell-check, basic grammar check (non-generative), calculator or similar. If you have any doubts about a tool or service you plan to use please contact the module leader.
Extensions and Self-certification
Late submissions will incur a penalty of 5% for every 24 hour period after the due date and time, i.e. this begins one minute after the submission deadline (beginning at 12.01pm).
Requests for specific extensions (of up to 15 days) which are typically for longer and more serious concerns must be submitted via my.wbs ideally 72 hours BEFORE the deadline. Extensions can only be approved if you clearly detail your circumstances and provide supporting documentation (or a reason as to why you cannot provide the supporting documentation at the time) as set out in the Mitigating Circumstances Policy.
Self-certification is a university-wide policy whereby you are permitted an automatic extension of 5 working days on eligible written assessed work without the need for evidence. WBS permits self-certification for all types of written, assessed works such as essays and dissertations. It is not permitted for exams, course tests, or presentations.
You can self-certify twice within each year of study, starting from the anniversary of your course start date. This will cover all eligible written assessments that fall within the self-certification period, as long as they have not previously had an extension applied. To find out further details about the self- certification policy please see: https://my.wbs.ac.uk/-/academic/20778/item/id/1244460/ .
If you wish to self-certify for an extension of 5 working days, please select 'Self-certification' in the Extension Type field. If you wish to request a longer extension than 5 working days, please leave the Extension Type as 'Standard'.
Your assignment instructions begin below.
**You must NOT use the same media and data you used for the initial Project for this module.**
This project consists of two parts.
In part 1, your task is to analyse a piece of media discussing a scientific study using the concepts from the module. This part consists of four sections. You will need to pick the piece of media yourself.
In part 2, your task is to analyse of a dataset using the knowledge acquired in the module. This part consists of four sections. You will need to pick the data yourself.
In what follows are specific instructions for the sections in each part. For each part/section, there is a suggested approximate word count. The word count is approximate because you don’t need to follow it precisely as long as the overall count is 3000 or under.
Your submission will consist of two documents: the essay and an Excel file that supports your analysis.
An essay without the Excel file will be considered, but it will receive lower marks on the Technical Capability component. The files should be clearly signposted: the essay should follow the structure outlined below and the Excel file follow the same structure (see the example Excel on the module page). There should be a clear correspondence between the Excel file and the essay: where appropriate assign numbers and titles to tables and figures; they can be easily referenced in the Excel file. You can calculate statistics by hand or use Analysis ToolPak where appropriate. The marker should be able to judge how the results reported in the essay were achieved. To do so, make sure that all formulas are readable (i.e., do not simply put in the numbers from somewhere else) and if ToolPak is used clearly indicate that with “Computed using TookPak.”
Note on the word count: The following items are NOT counted towards the word count:
. Section/subsection names
. Appendix (the tables and figures in the appendix)
. Footnotes (footnotes should be used only when strictly necessary, no essential information should be put in footnotes)
. Numbers in the text
. Text in the infographic
Please make sure to provide the overall word count at the beginning of your project.
PLEASE SEE THE EXAMPLE ESSAY AND EXCEL FILE ON THE MODULE PAGE TO GET AN IDEA OF HOW THIS PROJECT SHOULD LOOK LIKE.
See more detailed instructions for each part/section below.
Part 1: Analysis of a news article (1000 words)
Section 1.1. Data based on the news article (200 words)
In this section, please describe the data from study covered in the news article. Use the following structure (see Week 2 materials):
Question -> Target Population -> Study Population -> Sample -> Data.
Provide description of each part of the chain and note any information that you are not sure about from the article (i.e., when information is not enough in the article to give a precise description). For this section use ONLY the article itself. Do not look up information about the research paper or anything else mentioned in the article.
Do not forget to provide the link to article.
Section 1.2. Data based on the research article (200 words)
In this section, provide descriptions following the same structure but now using the research paper that the article discusses. If you identified any missing information in the previous section, attempt to fill in the gaps using the paper. Make sure to reference the part of the paper that supports your statements (pages are enough).
Do not forget to provide a link to the paper.
Section 1.3. Data quality (500 words)
In this section, given your analysis above, discuss the data quality of the study. Specifically, discuss measurement, sampling, and external validity. Identify any issues there might be and explain how they could affect the result of the study. For example, it is not enough to say that self-report can have issues but you have to identity the issues and explain why that matters.
Issues of measurement: Identify the variables of interest, how they were measured and whether this measurement is appropriate or not.
Issues of sampling: Using the sample and the study population from the previous analysis, discuss whether the sampling might have introduced a bias into the data (or not).
Issues of external validity: Discuss whether the results of the study can be applied to the target population or other populations.
Section 1.4. Conclusion (100 words)
Provide a short conclusion about the quality of the data and the conclusions drawn by the article/study.
Part 2: Analysis of data
Section 2.1: Data (600 words)
In this section, provide a short description of your data. If you are using data licensed under Creative Commons, here you should also provide a proper reference of the data. See the specific license on proper attribution. Provide the following information:
. Information about the sample size,
. Information about the origin of the data (e.g., it was a survey done by someone or it is observational data on a particular topic).
Section 2.1.1. Descriptive Statistics
Provide a short description of ALL the variables you are USING in this project. This means that you don’t need to provide descriptions of all the variables in the data set but only the variables you are using in the project (in any section of the project). Provide the following information:
. A table with the following columns where you define the variables and determine their type. This table should be in the appendix.
Variable |
Definition |
Data Type |
. A table with the following columns where you provide descriptive statistics for all the continuous variables:
Table X. Description of Continuous Variables.
Variable |
Mean |
Median |
Mode |
SD |
Minimum |
Maximum |
. A table with the following columns where you provide descriptive statistics for each the categorical variables (don’t’ forget to provide the absolutes for the percentages). If your categorical variable has more than 5 outcomes, provide information only for the top 5:
Table X. Description of Variable Y. Overall sample size Z.
Value |
Count |
% |
. For each variable:
o Create a histogram (if it is continuous) and a bar chart (if it is categorical, same rule applies as before for the variables with more than 5 outcomes). The graphs should be in the appendix and appropriately titled and labelled.
o Determine the skew of the variable. Comment on which descriptive statistics would be the most appropriate for this variable.
o Comment on whether there are any outliers; what they might be (mistakes, uncommon observations); what you did with them (if you did anything).
o Comment if you see any strange/interesting patterns in the graphs and what is the most likely explanation for them.
The Excel file should contain all the calculations of the descriptive statistics and the histograms. Specially, the any formulas used should be clickable (i.e., the marker should be able to read the formula used); if ToolPak is used, that should be clearly noted with “Computed using TookPak.”
Section 2.1.2. Data Quality
Assess the quality of your data on three levels: measurement, data sampling, and external validity. Make sure to note how those issues affect the conclusions you can draw from your analysis.
First, consider the measurement in your data. What do you think they wanted to measure? What does the data actually measure?
Second, consider the study population and how the sample was selected. Are there any issues with who ended up in this sample?
Finally, consider whether whatever you can conclude about this data can be extended to larger populations?
Here you don’t need to provide very detailed descriptions of all possible biases and issues that each variable might suffer from. You should rather concentrate on the most important problems relating to the sampling of data and the validity of your conclusions. For example, you identified a possibility of selection bias in some of your data. Then you should explain how selection bias affects your data and why that matters for your results.
Section 2.2: Confidence Intervals and Hypothesis Testing (550 words)
In this section, you will calculate confidence intervals and test your hypotheses. For each subsection, write a QUESTION you are trying to answer with your analysis.
In this section, there should not be any formulas. You should only provide the value of the statistic and the p-value and clearly show what these results mean for your question.
The Excel file should contain your estimations for the tests (the values of statistics and p-values) and should clearly indicate the data that was used in the tests.
Section 2.2.1. Confidence Interval Estimation
Question: Formulate your question here
Calculate two confidence intervals for two different means or proportions in your data and make a conclusion whether the difference between the means/proportions is statistically reliable (think of answering the question “What did I learn from doing this analysis?”).
Section 2.2.2. T-test
Question: Formulate your question here
Conduct a 5% significance t-test (a one or two-sample version). Make conclusions from the test concerning your question (think of answering the question “What did I learn from doing this analysis?”).
NOTE: You need to compare similar things here. For example, means of the same variable for different subgroups. DO NOT COMPARE APPLES TO ORANGES.
Section 2.2.3. Chi-square test
Question: Formulate your question here
Conduct a 5% significance chi-square test (one or two samples). Make conclusions from the test concerning your question (think of answering the question “What did I learn from doing this analysis?”).
NOTE: Chi-square test is for CATEGORICAL VARIABLES. This means that it applies to COUNTS of cases, not means of continuous variables.
Section 2.3: Regression Analysis (550 words)
Question: Formulate your question here (i.e., pick the dependent and the independent variables)
In this section, you will conduct a regression analysis. Specifically, estimate a multiple regression. First, provide a question that you are trying to answer with this regression. Then for each independent variable explain why you included it in the regression and what you think the sign of the coefficient would be. DO THIS BEFORE YOU RUN THE REGRESSION.
Section 2.3.1. Results
Discuss the results of the regression analysis here. Specifically, discuss the significance and sign of EVERY independent variable and whether your predictions from the section above were supported. Moreover, provide the following table of the regression results in the Appendix:
|
Coefficient |
95% CI |
Coefficient |
XXX |
[XX,XX] |
R-squared |
XX |
|
# Observations |
XXX |
|
Note: * - p-value <0.1, ** - p-value<0.05, *** - p-value< 0.01
Section 2.3.2. Model Fit and Regression Assumptions
In this section discuss ALL the following points:
. Model Fit
. Normality of Residuals
. Heteroskedasticity
. Multicollinearity
. Non-linearity
. Reverse Causality
. Omitted variable bias
. Correlation vs Causation
. Comment on whether you identified any issues and what you should do if you wanted to improve your model.
Section 2.4: Infographic (300 words)
In this section, present your infographic. Your infographic should consist of 3-5 graphs and 3-5 numbers. Don’t forget to provide statistical information (such as 95%CIs) in support of the claims.
To accompany the infographic, please provide its description: what is the main topic of the infographic and why did you choose it? Also, provide a short description of each element of the infographic and its purpose. Each element should represent a specific statistical claim and you should provide statistics that support that claim. For example, you are making a claim about two means, then you should provide 95%CI information and/or a result of a t-test.
Remember, infographics are about telling stories visually, so make sure to clearly state which QUESTIONS your infographic is trying to answer. Moreover, the textual component should be minimal. The text should only provide the basic context for the statistics and graphs.
The Excel file should show how you estimated each statistic. It should also show how you calculated confidence intervals for means and/or proportions mentioned in this section.
Below is guidance on the elements of the infographic:
MESSAGE - There should be a central question explicitly addressed in the infographic.
GRAPHS & STATISTICS - Appropriate statistics and graphs should be selected for the data. The graphs and statistics should help convey the main message.
LAYOUT - Infographic should be structured and spaced well, with a clear conceptual basis for the organizational scheme. Where appropriate, spatial relationships should be used to convey meaning and show signs of creativity.
AESTHETICS - Colours should be used to convey meaning and be aesthetically appealing. The infographic as a whole should be aesthetically pleasing. The main purpose of the infographic is to convey a message, and tell a story. This means that this is an artistic medium, so if this is something you enjoy, put your skills to good use here.
CLARITY - Text and visuals should work together so that each is enhanced by the other.
SOURCES - Sources of data and ideas drawn from outside the course are clearly provided, as are the ways they were used.