# CSE111语言辅导、辅导Python/Java编程

- 首页 >> Web CSE111 Assignment 4

Background information

To be truly parallel, sorting a single list when multiple CPU cores are available should show a significant speedup over a single threaded approach. Radix sorting lends itself to truly parallel implementations; consult the literature for approaches you might consider taking.

Remember that MSD is a sorting outcome, not a sorting algorithm so investigate sorting algorithms that lend themselves to parallel implementation.

Setup

SSH into any of the CSE111 teaching servers using your CruzID Blue credentials:

$ ssh@.soe.ucsc.edu

Create a suitable place to work: ( only do this the first time you log in )

$ mkdir –p ~/CSE111/Assignment4

$ cd ~/ CSE111/Assignment4

Install the lab environment: ( only do this once )

$ tar xvf /var/classes/CSE111/Assignment4.tar.gz

Build, test, and check code coverage of the system:

$ make test

Check your implementation for memory leaks:

$ make valgrind

Calculate your expected grade:

$ make grade

( this will take some time )

Reset the system if you get into a mess:

$ make clean

Note that using shared machines like the CSE111 teaching servers leads to variable results when another user suddenly starts executing their tests whilst yours are running. On the automated grading system, your code will have exclusive access to all 24 CPUs, so will performance far more predictably.

Requirements

Your completed radix sorter must do two things:

Correctly MSD radix sort large vectors of unsigned integers

Exhibit a significant speedup as more CPU cores are used

In addition, you must:

Have no compiler warnings or memory leaks

Demonstrate 100% function, line, and branch coverage

The first being a functional requirement, the second be a non-functional (performance) requirement.

These requirements are NOT equally weighted – see Grading Scheme below. However, it is recommended you work on the functional requirement first, and only then work on the non-functional requirement.

Remember: “make it work” always comes before “make it fast”, whilst always striving to “make it right”.

What you need to do

The class ParallelRadixSort is provided, but unimplemented. You need to implement it without changing the existing signature. You can add member variables and functions, but do not change existing signatures or overide the default constructor.

Basic steps are as follows:

Investigate how a truly parallel MSD radix sort can be implemented

Implement a multi-threaded truly parallel version of ParallelRadixSort::msd()

To execute your parallel MSD radix sorter after running make:

$ ./radix 10000 1 9 -d

Where “10000” is the number of random unsigned integers that the test harness will generate, “1” is the number of times it will generate that many random unsigned integers, and “9” is the maximum number of CPU cores to use when sorting the single list.

The -d flag requests a sampled dump of the sorted vectors to demonstrate (hopefully) correct ordering.

A more strenuous test might be:

$ ./perf 500000 1 4

Which will indicate the speedup achieved (or not) by your multi-threaded implementation. 100% indicates you archived no speedup, 200% indicates you doubled the performance, and so on. Speedups around 300% are easily achievable with simple implementations, anything over 400% will require significant effort and a sophisticated implementation.

As in previous assignments if there’s something you don’t understand, do this, in this order:

Google it

Post a discussion on Slack

Come along to office hours and ask questions

Background information

To be truly parallel, sorting a single list when multiple CPU cores are available should show a significant speedup over a single threaded approach. Radix sorting lends itself to truly parallel implementations; consult the literature for approaches you might consider taking.

Remember that MSD is a sorting outcome, not a sorting algorithm so investigate sorting algorithms that lend themselves to parallel implementation.

Setup

SSH into any of the CSE111 teaching servers using your CruzID Blue credentials:

$ ssh

Create a suitable place to work: ( only do this the first time you log in )

$ mkdir –p ~/CSE111/Assignment4

$ cd ~/ CSE111/Assignment4

Install the lab environment: ( only do this once )

$ tar xvf /var/classes/CSE111/Assignment4.tar.gz

Build, test, and check code coverage of the system:

$ make test

Check your implementation for memory leaks:

$ make valgrind

Calculate your expected grade:

$ make grade

( this will take some time )

Reset the system if you get into a mess:

$ make clean

Note that using shared machines like the CSE111 teaching servers leads to variable results when another user suddenly starts executing their tests whilst yours are running. On the automated grading system, your code will have exclusive access to all 24 CPUs, so will performance far more predictably.

Requirements

Your completed radix sorter must do two things:

Correctly MSD radix sort large vectors of unsigned integers

Exhibit a significant speedup as more CPU cores are used

In addition, you must:

Have no compiler warnings or memory leaks

Demonstrate 100% function, line, and branch coverage

The first being a functional requirement, the second be a non-functional (performance) requirement.

These requirements are NOT equally weighted – see Grading Scheme below. However, it is recommended you work on the functional requirement first, and only then work on the non-functional requirement.

Remember: “make it work” always comes before “make it fast”, whilst always striving to “make it right”.

What you need to do

The class ParallelRadixSort is provided, but unimplemented. You need to implement it without changing the existing signature. You can add member variables and functions, but do not change existing signatures or overide the default constructor.

Basic steps are as follows:

Investigate how a truly parallel MSD radix sort can be implemented

Implement a multi-threaded truly parallel version of ParallelRadixSort::msd()

To execute your parallel MSD radix sorter after running make:

$ ./radix 10000 1 9 -d

Where “10000” is the number of random unsigned integers that the test harness will generate, “1” is the number of times it will generate that many random unsigned integers, and “9” is the maximum number of CPU cores to use when sorting the single list.

The -d flag requests a sampled dump of the sorted vectors to demonstrate (hopefully) correct ordering.

A more strenuous test might be:

$ ./perf 500000 1 4

Which will indicate the speedup achieved (or not) by your multi-threaded implementation. 100% indicates you archived no speedup, 200% indicates you doubled the performance, and so on. Speedups around 300% are easily achievable with simple implementations, anything over 400% will require significant effort and a sophisticated implementation.

As in previous assignments if there’s something you don’t understand, do this, in this order:

Google it

Post a discussion on Slack

Come along to office hours and ask questions