讲解EE5434、讲解 Python语algorithm

2019.09.18 - 首页 >> Python编程

EE5434 homework 1

Out: Friday, September 13, 2019

Due: midnight (12AM) of Monday, September 23, 2019. Canvas will not accept any submission

after this deadline. No late submission will be graded.

Total point: 100

What to hand in: the source codes, a readme file, and the report.

Where to hand in: Canvas

Implement the PLA learning algorithm (using Python) and analyze its performance using

different training sets. You need to implement two programs using Python. You can call vector

operations but not any modules about the learning algorithms.

The first program is used to generate the training data. It takes a three-dimensional vector

<w0,w1,w2> as input and generate data points x=<x1,x2> with sign (w1x1+w2x2+w0)>0 or

(w1x1+w2x2+w0)<0. If the sign is negative, the data point has label “-“. Otherwise it has label “+”.

Name this program “DataEmit”. It must be run using the following format:

DataEmit <w0,w1,w2> m n // <w0,w1,w2> species the line. m is the number of points with label

“+”. n is the number of points with label “-“. w0,w1,w2 are separated by just “,”. No extra space

is allowed.

The program must output a file named “train.txt”, which contains all the data points with

labels. Each point takes one line with format: x1 x2 label.

For example, if we test your program using the following command:

DataEmit <0,-1,1> 5 4

The output file “train.txt” may look like the following:

Program 2 will take “train.txt” as input and then output the learned weight using PLA. It must

be named as “PLA”. To run it, we will use:

PLA train.txt

The output should be a weight vector <w0,w1,w2> and also a plot that contains all the training

data points and the line (represented by <w0,w1,w2>). The detailed format of the plot: blue

circle represents positive labeled data points and red cross refers to negative labels. The line

can be black (refer to the note about the PLA). No need to show the vector w. Just plot the line:

w1x1+w2x2+w0=0. Show the axis name (x1 or x2).

Each program will be tested using three input cases. Each case is 12 pts. In total: 72 points for

testing the two programs.

Once you finish the programs. Do the following experiments and record the results in the

report.

DataEmit <5,2,3> 10 10

DataEmit <5,2,3> 50 50

DataEmit <5,2,3> 100 100

DataEmit <5,2,3> 150 150

DataEmit <5,2,3> 200 200

For each training data, run PLA program and compared what you learned with the known

“line”. Analyze how the size of the training data affects the output of PLA?

Then, choose your own vector w and repeat the above experiment again. Test PLA’s

performance with 1) increase of the training data; 2) the ratio of the two labels in the training

data (balanced to unbalanced). Clearly describe your designed experiment. Use tables or figures

to summarize the results.

20 points for the report containing the above analysis.

8 pts for following all the instructions and a readme file containing any specific instructions for

running your programs. For example, what is the python version and what is the running

environment.