Data Analytics讲解、c/c++编程语言调试、Python,Java辅导

- 首页 >> Java编程
Week 6 Assessment: Code
Task
The Iris data set is a comprehensive data set compiled by Robert Fisher in 1936, detailing a number of measurements of three species of Iris flowers. It has gained some popularity in the fields of Data Analytics and Machine Learning, as it provides a large number of measurements across a relatively small number of categories.
The assessment task is to carry out some simple, computer-supported analysis of the Iris data set.
Task details
The task has been broken into several, largely independent stages.
Stage 1: Reading and processing data
For this stage you need to complete the specification of the read_and_process(csv_filename) function. This function should do the following:
Import the csv file named csv_filename as a Pandas DataFrame
Drop any rows that do not contain entries in all columns
Strip ' cm' and ' mm' from each data point, and convert them to floats
Divide the second column ('sepal_width') by 10
Return the resulting DataFrame
You may assume that csv_filename is a readable csv file with a similar format to iris.csv
Stage 2: User menu
For this stage you need to implement the initial interactions. When your program is run:
Prompt the user to enter a csv file with Enter csv file:
Read and process the user-entered file using the function from Stage 1
Display the menu:
1. Create textual analysis
2. Create graphical analysis
3. Exit
Prompt the user to select an option with Please select an option:
Process the user's choice:
oIf they select '1', proceed to Stage 3
oIf they select '2', proceed to Stage 4
oIf they select '3', exit the program with the exit() function
You may assume that only valid options are selected.
Stage 3: Text-based analysis
For this stage you will output some simple statistics based on the DataFrame loaded in Stage 2. Upon entering this stage, the program should:
Prompt the user for a species with: Select species (all, setosa, versicolor, virginica):
To obtain full marks, the available species should be extracted from the DataFrame, and may be different from those listed above. They should be arranged alphabetically, after all.
Display the following statistics: Mean, 25%-ile, Median, 75%-ile, Standard deviation for each of the characteristics (sepal_length, sepal_width, petal_length, petal_width) for the species selected by the user. If the user chose all, then the resulting table should be a summary of all the data.
The output should be the result of printing a DataFrame with index: ['sepal_length', 'sepal_width', 'petal_length', 'petal_width'] and column headings: Mean, 25%, Median, 75%, Std. (See Sample Interactions)
Return to the main menu (Stage 2)
The output resulting from pandas function calls is sufficient. You do not need to manually round any results.
Stage 4: Graphics-based analysis
For this stage you will output some simple graphical plots based on the DataFrame loaded in Stage 2. Upon entering this stage, your program should:
Prompt the user for a characteristic for the x-axis with: Choose the x-axis characteristic (all, sepal_length, sepal_width, petal_length, petal_width):
The available characteristics do not need to be extracted from the DataFrame
If the user does not select all:
oPrompt the user for a characteristic for the y-axis with: Choose the y-axis characteristic (sepal_length, sepal_width, petal_length, petal_width):
oPlot a scatter-plot of the two chosen characteristics (does not have to be displayed)
If the user does select all:
oUsing a scatter_matrix or pairplot, plot the relationships between all pairs of characteristics
In both cases, the program should prompt the user to enter a file with: Enter save file: and then save the graphical plot to the entered file.
Return to the main menu (Stage 2)
To obtain full marks, the outputs should differentiate the different species by colouring the data points based on their species.
In addition to the automarked test-cases, the output of this Stage will be inspected by your OL, and up to 5 marks awarded for output.
The marks will be based on the following criteria:
Scatter plots of the correct characteristics (3 marks)
Differentiation of species by colour (2 marks)
Stage 5: Conclusion
For this Stage, you are required to complete the provided function conclusion(). Your function should return a tuple containing the two (non-species) characteristics you believe answer the following question:
In iris.csv, which pair of characteristics is best for separating the species?
In other words, which pair of characteristics have the most significant impact in determining what species the plant belongs to?
The two characteristics should be ordered alphabetically within the tuple, and should be two of: 'sepal_length', 'sepal_width', 'petal_length', or 'petal_width' .
The return value should be hard-coded into the function (i.e., no calculations are required) based on your own analysis of the data (using the program you just created, if appropriate).
If you are failing the last (hidden) test case, but passing the second last test case then add a comment indicating the reason for your choice. Justification is not needed if you pass the last test case.
Subjective component
In addition to the above tasks, your code will be inspected by your OL and evaluated on its adherence to good coding practices. Particular attention will be on the following aspects of your code:
Documentation: Appropriate use of comments
Modularity: Appropriate use of functions (Note: if appropriate you should define your own functions outside of those outlined above). All functions should "stand alone" - that is, not be dependent on global variables
Readability: Appropriate use of variable names
Structure: Appropriate code layout so that the program flow is clear
Sample interactions
Enter csv file: iris.csv
1. Create textual analysis
2. Create graphical analysis
3. Exit
Please select an option: 1
Select species (all, setosa, versicolor, virginica): all
Mean 25% Median 75% Std
sepal_length 5.843333 5.1 5.80 6.4 0.828066
sepal_width 3.054000 2.8 3.00 3.3 0.433594
petal_length 3.758667 1.6 4.35 5.1 1.764420
petal_width 1.198667 0.3 1.30 1.8 0.763161
1. Create textual analysis
2. Create graphical analysis
3. Exit
Please select an option: 3
Enter csv file: iris_test.csv
1. Create textual analysis
2. Create graphical analysis
3. Exit
Please select an option: 1
Select species (all, versicolor, virginica): versicolor
Mean 25% Median 75% Std
sepal_length 5.955102 5.6 5.9 6.3 0.503348
sepal_width 2.785714 2.6 2.8 3.0 0.296507
petal_length 4.275510 4.0 4.4 4.6 0.461668
petal_width 1.332653 1.2 1.3 1.5 0.194066
1. Create textual analysis
2. Create graphical analysis
3. Exit
Please select an option: 3
Enter csv file: iris.csv
1. Create textual analysis
2. Create graphical analysis
3. Exit
Please select an option: 2
Choose the x-axis characteristic (all, sepal_length, sepal_width, petal_length, petal_width): all
Enter save file: iris_all.png
1. Create textual analysis
2. Create graphical analysis
3. Exit
Please select an option: 3
After the above interaction, an example of iris_all.png would be either of the following:


Enter csv file: iris.csv
1. Create textual analysis
2. Create graphical analysis
3. Exit
Please select an option: 2
Choose the x-axis characteristic (all, sepal_length, sepal_width, petal_length, petal_width): sepal_width
Choose the y-axis characteristic (sepal_length, sepal_width, petal_length, petal_width): sepal_width
Enter save file: sw_vs_sw.png
1. Create textual analysis
2. Create graphical analysis
3. Exit
Please select an option: 1
Select species (all, setosa, versicolor, virginica): all
Mean 25% Median 75% Std
sepal_length 5.843333 5.1 5.80 6.4 0.828066
sepal_width 3.054000 2.8 3.00 3.3 0.433594
petal_length 3.758667 1.6 4.35 5.1 1.764420
petal_width 1.198667 0.3 1.30 1.8 0.763161
1. Create textual analysis
2. Create graphical analysis
3. Exit
Please select an option: 3
After the above interaction, an example of sw_vs_sw.png would be:

Note: Your plots do not have to have the same style options (e.g., colours, fonts) as the ones presented here. Your plots will be assessed on whether they are plotting the correct data with the correct chart type (i.e., a scatterplot)
Submission and feedback
You can click on the mark button, also used to submit your work, as many times as you like. We will assess your last submission only.
You can see where your code differs from the expected output by examining the feedback from the non-hidden test cases. The hidden test cases will test your code more rigorously, but with suppressed input/output to limit dishonest attempts.
You are encouraged to test your code yourself and not rely on the provided test cases. Two files suitable for input have been provided as part of your scaffold.
Marking
This assessment is marked out of 35 but is worth 30%.
There are twenty assessed test cases worth 1 mark each for a total of 20 marks:
Stage 1: 2 visible test cases, 1 hidden test case
Stage 2: 1 visible test case
Stage 3: 5 visible test cases, 3 hidden test cases
Stage 4: 3 visible test cases, 3 hidden test cases
Stage 5: 1 visible test case, 1 hidden test case
Up to 5 marks will be awarded for the output of Stage 4.
The subjective component will be graded out of 10 marks as detailed below:

Commenting (3):
- Good use of comments +3
- Used comments but could do better +1.5
- No comments 0

Variable/function names (3):
- Good choice of names +3
- Some good choices +1.5
- Poor choices all around 0

Use of functions (2):
- Well chosen auxiliary function usage: +2
- Attempted to use additional functions: +1
- No additional functions: 0

Layout (2):
- Easy to follow: +2
- One or two poor layout choices (e.g., imports in the middle of the code): +1
- No clear structure: 0

站长地图