# 代做Classifier and Cluster Analysis in Data Science

- 首页 >> Database作业University of Canberra

Faculty of Science and Technology

Programming for Data Science G (11521)

Assignment 1

Classifier and Cluster Analysis in Data Science

Due dates: 23:59 Sunday (Week 8)

Type: Individual assignment

Mark for assessment: 20

Submission: Submit a .zip file containing all Python files (.py) in your project via Canvas site.

Late submission: 5% of the total mark per day (1 mark per day). Information on how to apply for

extension can be found in the unit outline on Canvas.

Remarks: As per unit outline, you will need an aggregate of at least 40 over all assignments to pass

the unit.

[6 marks] Question 1: Implement a Python program for Nearest Neighbour Classifier that can

classify an unknown data sample to one of the given classes.

For example, there are 2 classes Red and Blue, and x is an unknown data sample (i.e., we do not

know x is red or blue). After calculating all distances between x and all data samples in the 2 classes,

we find a data sample in the Red class that has shortest distance to x, so x is classified as a red data

sample.

Requirements: Your program reads data samples from 2 text files for 2 classes and unknown data

samples from another text file, runs the Nearest Neighbour Classifier algorithm as demonstrated in

the screenshots below, and outputs all unknown data samples and their classified label to screen

and to another text file. Your program should work with any data dimension D > 1 and any number

of unknown data samples > 0. For Python programming, use a tuple to store a data sample, a list to

store all data samples, and modules to store functions. The main program includes only function

calls and does not include any function implementations. Please do not use other versions of

Nearest Neighbour Classifier you can find on websites or research articles, and do not import any

external packages (except tkinter) to this project.

Page 2 of 4

[14 marks] Question 2: Implement a Python program for K-Means Clustering that can group data

samples to clusters.

For example, you are given a set of data samples to group them into 2 clusters. The K-means

clustering algorithm generates 2 cluster centres at random, groups data samples that are nearest to

the first cluster centre to form a cluster then do the same with the second one to form another

cluster. The algorithm will generate new cluster centres by averaging data samples in the same

cluster. If the difference between the 2 old cluster centres and the 2 new cluster centres are not

significant, the algorithm will stop, otherwise it removes the old cluster centres and re-groups data

samples for the new cluster centres as seen above to form new clusters. The process repeats until

the difference between the old and new cluster centres is not significant.

Requirements: Your program reads data samples from a text file, runs K-means Clustering algorithm

as demonstrated in the screenshots below, and outputs all data samples with cluster centres to

screen as below. Your program should work with any data dimension D > 1 and any number of

clusters K > 1. For Python programming, use tkinter to display data samples and cluster centres on a

canvas, a tuple to store a data sample or a cluster centre, a list to store all data samples or all cluster

centres, and modules to store functions. The main program includes only function calls and does not

include any function implementations. Please do not use other versions of K-Means Clustering that

you can find on websites or research articles to implement this project. Please do not import any

external packages (except tkinter) to this project.

The screenshots below explain how K-means Clustering algorithm works.

Unknown data

sample

Blue data

samples

Red data

samples

How to classify this unknown data sample?

Step 1. Calculate all distances between this

unknown data sample to all other data samples

Step 2. Find the minimum distance Step 3. Output result: the unknown data

sample is red

Page 3 of 4

Below is an example of data samples drawn on screen before and after applying K-means clustering.

Page 4 of 4

More details of the above algorithms and demos will be given in lectures and tutorials from Week 2

to Week 7.

-- END --