代写APH113 Final Project调试Python程序

2025.11.07 - 首页 >> Database作业

APH113 Final Project

Applying Machine Learning to Cardiovascular Disease Prediction

1. Project Overview & Learning Objectives

This final group project serves as the capstone for the APH113 module, "Artificial Intelligence in Healthcare & Pharmaceutical Research." It is designed to be a comprehensive, hands-on experience that allows you to apply the principles and techniques learned throughout the course to a real-world biomedical problem.

The core of this project is an inquiry-based investigation into a machine learning pipeline designed for the prediction of cardiovascular disease (CVD). This is not merely a programming exercise; it is a rigorous practice in scientific inquiry and communication. Your success will be measured not only by your ability to correctly execute code but also by your capacity to clearly and accurately convey your process and findings in the form. of a professional scientific report. The project's final grade is determined by the written report (60%) and the programming code (40%).

Upon successful completion of this project, students will be able to:

· Apply Theoretical Knowledge: Demonstrate a fundamental understanding of data processing, machine learning theory, and algorithms within a real-world biomedical context.

· Enhance Technical Proficiency: Strengthen Python programming skills within a machine learning environment through code analysis, annotation, and execution.

· Develop Critical Analysis: Move beyond simple code execution to critically analyze model results, interpret their clinical significance, and evaluate the limitations of the methods used.

· Practice Scientific Communication: Structure, write, and format a professional-standard scientific report to systematically present research findings, adhering to academic conventions.

2. Thematic Scenario: Cardiovascular Disease (CVD) Prediction

2.1 Background

Cardiovascular diseases (CVDs) are the leading cause of death globally, presenting a formidable challenge to public health systems. A multitude of factors, including age, sex, lifestyle. choices, and genetic predispositions, are recognized as potential risk factors for CVD. The complex interplay between these factors makes early diagnosis and risk assessment exceptionally difficult.

The machine learning (ML) paradigm offers powerful tools for analyzing such complex, multi-factorial datasets. By identifying subtle patterns and correlations within the data, ML models can assist clinicians in classifying patients, predicting disease risk, and helping to explore the primary factors associated with CVD. This project leverages a real-world clinical dataset to explore the potential of machine learning in CVD prediction.

2.2 Project Dataset & Code

For this project, you will be provided with a curated dataset and a corresponding Python script.

· Dataset: The dataset for this project contains 13 clinical indicators from 303 samples. These samples are categorized into two groups: patients with CVD and healthy individuals, as indicated by the "target" column in the Data Dictionary.

· Data Dictionary: The table below provides a detailed explanation of each column (indicator) in the dataset.

ID	Column (Indicator)	Explanation of Column Value
1	age	Age in years
2	sex	Sex: 1 = male; 0 = female
3	cp	Chest pain type: 0 = typical angina; 1 = atypical angina; 2 = non-anginal pain; 3 = asymptomatic
4	trestbps	Resting blood pressure (in mm Hg on admission to the hospital)
5	chol	Serum cholesterol in mg/dl
6	fbs	Fasting blood sugar > 120 mg/dl: 1 = true; 0 = false
7	restecg	Resting electrocardiographic results: 0 = normal; 1 = having ST-T wave abnormality; 2 = showing probable or definite left ventricular hypertrophy by Estes' criteria
8	thalach	Maximum heart rate achieved
9	exang	Exercise-induced angina: 1 = yes; 0 = no
10	oldpeak	ST depression induced by exercise relative to rest
11	slope	The slope of the peak exercise ST segment: 0 = upsloping; 1 = flat; 2 = downsloping
12	ca	Number of major vessels (0-3) colored by fluoroscopy
13	thal	Thalassemia: 0 = error/missing value; 1 = fixed defect; 2 = normal; 3 = reversible defect
14	target (Label)	0 = no disease; 1 = disease

· Python Code: Provided alongside the dataset is a Python script. that implements a machine learning workflow for classification and prediction using this dataset. The script. covers key stages such as data loading, preprocessing, model training, and performance evaluation. Crucially, this script. is intentionally incomplete. Specific sections of the code have been “masked” (i.e., removed) and replaced with detailed comments. These comments will serve as precise instructions, guiding you to write the missing code to complete the entire workflow.

2.3 Your Task: Core Requirements & Path to Excellence

The project tasks are tiered to assess the depth of your understanding and creative application of the material. All groups must complete the core requirements, while groups aiming for higher marks should engage with the tasks on the path to excellence. This structure is directly linked to the assessment criteria, providing a clear roadmap for your efforts.

Core Requirements (To achieve a passing grade)

1. Code Completion, Comprehension & Annotation: This is a two-stage task that forms the foundation of your technical work.

a. Code Completion: First, you must carefully examine the provided Python script. to identify the masked sections of code. Using the guiding comments as your technical specification, you are required to write the necessary Python code to make the script. fully functional. The correctness of your implemented code is a primary criterion for assessment.

b. Code Comprehension & Annotation: After successfully completing the script, you are required to provide detailed and insightful comments for each line or logical block of code. Your annotations should not merely restate what the code does but explain why it is being done in the context of the overall machine learning pipeline.

2. Execution & Result Generation: Successfully execute the entire script. in a suitable Python environment and generate all outputs, including any statistical summaries, visualizations, and model performance metrics.

3. Result Analysis & Summary: In your written report, analyze the results generated by the code. Discuss the function and purpose of the code, explain the meaning of the outputs, and write a conclusive summary based on the model's overall performance and findings.

Path to Excellence (For high distinction, i.e., 70-100%)

To achieve an outstanding grade on this project, groups are encouraged to go beyond the basic requirements and demonstrate creative engagement and critical thinking. The following tasks correspond directly to the 'E' grade (Excellent) criteria in the assessment scheme:

1. Methodological Innovation: Integrate a new or alternative machine learning algorithm into the source code. Compare its performance to the original model and discuss the potential advantages or disadvantages of your chosen method. Any such modifications must be clearly specified in the Abstract and Methods sections of your report.

2. Critical Evaluation: In your code annotations and the Discussion section of your report, identify and explore potential limitations or problems with the provided code and methodology. This could include commentary on data quality, preprocessing choices, model assumptions, or potential biases.

3. Preparing Your Final Report

Your written report is a critical component of your project submission. It must adhere to the structural and formatting standards of a professional scientific publication to present your work in a clear and rigorous manner.

3.1 Report Structure

Your report must follow a formal scientific publication structure and should consist of, but is not limited to, the following sections:

· Group ID and Author Names: To be included on the title page.

· Abstract: A concise summary (approx. 250 words) of the project's background, methods, key findings, and conclusions. If you have introduced a new method in your analysis, it must be specified here.

· Introduction: Provide background on CVD, the role of machine learning in its diagnosis, and the specific objectives of your study.

· Methods: Briefly describe the dataset and the computational methods employed in the provided code. If you have introduced a new method, provide a rationale for your choice. This section should be detailed enough for an experienced peer to understand your workflow but should not be a line-by-line protocol.

· Results: Present the key findings from your execution and analysis of the code. Make effective use of figures and tables to summarize and present your results. Ensure all figures and tables are numbered and have appropriate captions/titles.

· Discussion: Interpret your findings and explain their significance in the context of CVD prediction. Discuss any limitations of the model or dataset and suggest possible directions for future work or improvement.

· Conclusion: Briefly summarize the main takeaways from the project.

· References: List all external sources cited in your report.

· Appendices: This is a mandatory section and must contain: (1) your complete, well-annotated Python code, and (2) the raw output generated by your script.

3.2 Formatting & Style. Guidelines

To ensure professionalism and consistency, please adhere strictly to the following formatting requirements. These specifications supersede any inconsistent instructions that may have been present in earlier documents.

· Length: The main body of the report (from Introduction to Conclusion) should not exceed 3,500 words (approx. 8 pages). This limit excludes the title page, abstract, references, and appendices.

· File Format: Microsoft Word (.docx), single-column format.

· Font: Times New Roman, 11-point for the main text. Figure and table captions should be 9-point.

· Line Spacing: 1.5 line-spacing for the main text. Captions for figures/tables, footnotes, and the reference list should be single-spaced.

· Paragraph Spacing: Your report should have no extra spacing after paragraphs.

· Figures and Tables: All figures and tables should be positioned close to where they are first mentioned in the text and numbered sequentially. Captions should be placed below figures and above tables.

· Referencing Style.: The reference list must begin on a new page with the uppercase bold title, REFERENCES. All in-text citations and the reference list must be formatted in Harvard style.

4. Submission & Assessment

4.1 Submission Instructions

· Group Size: Groups must consist of 4-5 students.

· File Naming: Please name your file: APH113_FinalProject_GroupID.docx.

4.2 Assessment Criteria

This final project accounts for 65% of your total module mark. Your submission will be evaluated based on two equally weighted components: The Written Report (60%) and the Programming Code (40%).

Please refer to AssessmentForm_GroupReport_AY2526.docx for the specific criteria that will be used to assess your work. You are strongly advised to use the assessment forms as both a guide and a self-assessment checklist to ensure your submission meets the highest standards of quality. A clear understanding of these criteria is the first step to success, as it makes the evaluation process fully transparent and allows you to focus your efforts effectively.