# 辅导COMP 528、辅导c/c++程序

- 首页 >> Web Laplace equation solver

In this assignment, we will build a solver for the Laplace equation, which is a second-order

Partial Differential Equation that appears in many areas of science and Engineering including

electricity, fluid flow, and steady state heat conduction.

Here we will use the equation to model the steady state heat distribution in a room with

a radiator. Although a real room is three dimensional, we will limit this exercise to two

dimensions for simplicity. The room is 10 by 10m with a radiator along one wall as shown

in Figure 1. The radiator is 4m wide and is centred along the wall, i.e. it takes up 40%

of the wall, with 30% on either side. The radiator is held at a constant temperature of

100?C. The walls are all at 10?C. Together this specifies the temperature at every point at

the boundaries of our domain (the room). Boundary conditions specified in this way are

called Dirichlet boundary conditions.

100?C

4m10m

10m

10?C

Figure 1: 10m × 10m room with a radiator

The steady-state heat distribution of this two-dimensional room can be found by solving the

Laplace equation which, in its continuous form can be written as follows:

2021-2022 1

University of Liverpool Continuous Assessment 2 COMP 528

?2t

?x2

+

?2t

?y2

= 0, (1)

where t(x, y) is a continuous, twice-differentiable function describing the steady-state tem-

perature at any point in the room, and ?

2

?x2

, is the second partial derivative with respect to

x, and similarly for y.

There are many ways to solving this partial-differential equation. Here we will solve it by

dividing the area of the room in Figure 1 into a grid of points ti,j, to represent the temperature

at the point (i, j) in the room. We can use the finite-difference method (and some simplifying

assumptions) to express Equation 1 over this grid of points as:

ti,j =

ti?1,j + ti+1,j + ti,j?1 + ti,j+1

4

, (2)

which is really just another way of saying that the temperature at any point is the average

of the four points immediately around it.

ti,j

ti,j+1

ti?1,j ti+1,j

ti,j?1

Figure 2: This problem is a stencil computation. The temperature value at every point can

be calculated by averaging the temperature of its four neighbouring points.

This might remind you of something we studied in class - this is the famous stencil pattern (or

a gather access), where every value in an array is calculated as a weighted sum of the points

2021-2022 2

University of Liverpool Continuous Assessment 2 COMP 528

in its vicinity. This stencil is a simple one where the weights are the same. The example we

saw in class was a one-dimensional stencil so each point had two neighbours. However, since

this is a two-dimensional physical problem, instead of two neighbours, every point now has

four neighbours. If we were to extend this problem to three dimensions (we will NOT do

that here), every point would have six neighbours (two for each dimension).

To simplify things, we will consider the points along the edge of our domain, i.e. where i = 0,

j = 0, i = n, or j = n, to represent the walls, or the radiator. We will refer to these as the

boundary points. All the points in the domain that are not boundary points will be called

interior points, i.e. for 0 < i < n, 0 < j < n. In total there are (n + 1)2 points, of which

(n? 1)2 are interior points while the rest are boundary points.

The temperature of the boundary points will be considered fixed. The temperature of the

interior points will be calculated by using the formula given in Equation 2 iteratively until a

given maximum number of iterations or when the temperature between successive iterations

does not change by at least a given amount (whichever happens first).

Starter Code

We can store the temperature at each point in our domain in a two-dimensional array t [ i ][ j ] .

The values in this array can be initialised as follows:

? The values of the points along the length of the radiator can be initialised as 100. These

will be a subset of the boundary points on one of the boundaries (walls). If the radiator

appears to end between two grid points, include the outer grid point in the radiator

initialisation as well.

? All other points can be initialised at 10.

The rest of the program as a sequential code is given here:

for ( i t e r = 0 ; i t e r < l i m i t ; i t e r ++) {

for ( i = 1 ; i < n ; i++)

for ( j = 1 ; j < n ; j++)

r [ i ] [ j ] = 0 .25 ? ( t [ i ?1] [ j ] + t [ i +1] [ j ] + t [ i ] [ j ?1] + t [ i ] [ j +1 ] ) ;

for ( i = 1 ; i < n ; i++)

for ( j = 1 ; j < n ; j++)

t [ i ] [ j ] = r [ i ] [ j ] ; // data copying

}

Note that this code will always run for a fixed number of iterations. It is up to you to

implement an early termination criteria. Instead of dividing by 4, we multiply by 0.25 because

computer systems usually implement floating-point multiplication as a faster operation than

division. There are some inefficiencies in the given program - for example, we copy the

2021-2022 3

University of Liverpool Continuous Assessment 2 COMP 528

results from r[ i ][ j ] back to t [ i ][ j ] at the end of each iteration. If you remember our

discussion about the memory bottleneck, you will know that data movement is the enemy

of program performance and should be minimised. Most things you can do to improve the

serial performance of this code will also improve the performance of the parallel version so

take your time to find inefficiencies in this code and improve it as you see fit while making

sure that the numerical results don’t change.

All computations should be in double precision.

Task 1: Sequential Program

The given code can be improved by avoiding the data copying loop at the end of each iteration.

This can be achieved by defining t as a three-dimensional array of size 2× (N + 1)× (N + 1),

i.e. t [x ][ i ][ j ] . Every iteration can then swap between reading from x=0 and writing to

x=1, and vice-versa. For example, the first iteration could read from t [0][ i ][ j ] and write to

t [1][ i ][ j ] , while the second iteration could read from t [1][ i ][ j ] (which was just written in

the previous iteration) and write to t [0][ i ][ j ] (overwriting the initial values since they are

no longer required), and the third iteration would repeat what the first one did.

Rewrite the given code to implement the above change to remove the extra data copy. Write a

complete C program that models the temperature distribution of the room given in Figure 1.

Your domain should consist of N ×N points and you should solve to K iterations where N

and K are defined constants in your code (so you can change them easily). Instrument the

code to track the total time taken and display it in the end of the execution. Also display

the final temperature at every N/8×N/8 points in your domain, i.e. 8× 8 points, regardless

of the value of N .

Use smaller values of N and K while developing and testing your code, but your submission

should have N = 1024 and K = 100000.

Your submission for this task should include:

? Your sequential C program to solve the temperature distribution problem (as a .c file).

? A document containing:

– Screenshot of compiling the program on your computer

– Screenshot of the complete output of the program on your computer (could be for

a smaller K and N .)

– Screenshot of compiling the program on Barkla.

– Complete program output (could be a screenshot) of the program on Barkla.

2021-2022 4

University of Liverpool Continuous Assessment 2 COMP 528

Task 2: OpenMP program

Modify the sequential program from Task 1 to be an OpenMP program. This program should

start by solving the problem in serial (using the same code from Task 1), then it should solve

the problem in parallel using OpenMP. It should also have code to check that the results

do not change between the two invocations. Finally, measure the time for both versions,

calculate and report both the times and the speedup achieved.

Your submission for this task should include:

? Your complete OpenMP program to solve the temperature distribution problem (as a

.c file).

? A document containing:

– Screenshot of compiling the program on your computer

– Screenshot of the complete output of the program on your computer (could be for

a smaller K and N .)

– Screenshot of compiling the program on Barkla.

– Complete program output (could be a screenshot) of the program on Barkla.

– A table showing the run times for 1, 2, 4, 8, 16 and 32 threads.

– A plot showing speedup vs number of threads (see Lecture05.pdf, slides 31/32 for

an example).

Hints

You might get a segmentation fault trying to allocate an array like double t[2][N+1][N+1] on

your computer or even Barkla for large values of N . This is because the previous statement

allocates the memory on the stack, and there is a very tight limit on the size of variables you

can allocate on the stack. Instead, you should allocate this on the heap using the following

statement:

double (? t ) [N+1] [N+1] = mal loc ( s izeof (double [ 2 ] [N+1] [N+ 1 ] ) ) ;

In general, if you get segmentation faults when running your program, use a tool called gdb

to help debug. Read its manual to understand how to use it.

OpenMP has some pragmas to help the compiler spot vectorisation opportunities. Read

about them in the OpenMP manual.

My reference solution (which is not particularly optimised) took about 160 seconds on Barkla

in serial execution, and a best time of about 12 seconds in parallel execution. Use these

timings to guide your efforts.

2021-2022 5

University of Liverpool Continuous Assessment 2 COMP 528

You can consider printing the state of the domain every 500 iterations to follow the progress

of your simulation. However, make sure to remove all print statements from inside the

timed section when measuring runtime - the print statements will immensely slow down your

program! The timings I gave above are with the prints removed.

Marking scheme and submission

Your submission should be a zip file containing two documents, and two .c files - one for

each task above. Also include your SLURM submission scripts. This should be submitted

on Canvas, as the submission for CA2 in COMP 528.

1 Task 1: Code that compiles without errors or warnings 5%

2 Task 1: Same numerical results as the reference version 10%

3 Task 1: Successfully reduced data movement 10%

4 Task 2: Code that compiles without errors or warnings 5%

5 Task 2: Same numerical results as the reference version 10%

6 Task 2: Speedup plot 10%

7 Task 2: Maximum number of cores before speedup curve flattens out 20%

8 Task 2: Minimum runtime 20%

9 Task 1 and 2: Clean code and comments 10%

Table 1: Marking scheme

In this assignment, we will build a solver for the Laplace equation, which is a second-order

Partial Differential Equation that appears in many areas of science and Engineering including

electricity, fluid flow, and steady state heat conduction.

Here we will use the equation to model the steady state heat distribution in a room with

a radiator. Although a real room is three dimensional, we will limit this exercise to two

dimensions for simplicity. The room is 10 by 10m with a radiator along one wall as shown

in Figure 1. The radiator is 4m wide and is centred along the wall, i.e. it takes up 40%

of the wall, with 30% on either side. The radiator is held at a constant temperature of

100?C. The walls are all at 10?C. Together this specifies the temperature at every point at

the boundaries of our domain (the room). Boundary conditions specified in this way are

called Dirichlet boundary conditions.

100?C

4m10m

10m

10?C

Figure 1: 10m × 10m room with a radiator

The steady-state heat distribution of this two-dimensional room can be found by solving the

Laplace equation which, in its continuous form can be written as follows:

2021-2022 1

University of Liverpool Continuous Assessment 2 COMP 528

?2t

?x2

+

?2t

?y2

= 0, (1)

where t(x, y) is a continuous, twice-differentiable function describing the steady-state tem-

perature at any point in the room, and ?

2

?x2

, is the second partial derivative with respect to

x, and similarly for y.

There are many ways to solving this partial-differential equation. Here we will solve it by

dividing the area of the room in Figure 1 into a grid of points ti,j, to represent the temperature

at the point (i, j) in the room. We can use the finite-difference method (and some simplifying

assumptions) to express Equation 1 over this grid of points as:

ti,j =

ti?1,j + ti+1,j + ti,j?1 + ti,j+1

4

, (2)

which is really just another way of saying that the temperature at any point is the average

of the four points immediately around it.

ti,j

ti,j+1

ti?1,j ti+1,j

ti,j?1

Figure 2: This problem is a stencil computation. The temperature value at every point can

be calculated by averaging the temperature of its four neighbouring points.

This might remind you of something we studied in class - this is the famous stencil pattern (or

a gather access), where every value in an array is calculated as a weighted sum of the points

2021-2022 2

University of Liverpool Continuous Assessment 2 COMP 528

in its vicinity. This stencil is a simple one where the weights are the same. The example we

saw in class was a one-dimensional stencil so each point had two neighbours. However, since

this is a two-dimensional physical problem, instead of two neighbours, every point now has

four neighbours. If we were to extend this problem to three dimensions (we will NOT do

that here), every point would have six neighbours (two for each dimension).

To simplify things, we will consider the points along the edge of our domain, i.e. where i = 0,

j = 0, i = n, or j = n, to represent the walls, or the radiator. We will refer to these as the

boundary points. All the points in the domain that are not boundary points will be called

interior points, i.e. for 0 < i < n, 0 < j < n. In total there are (n + 1)2 points, of which

(n? 1)2 are interior points while the rest are boundary points.

The temperature of the boundary points will be considered fixed. The temperature of the

interior points will be calculated by using the formula given in Equation 2 iteratively until a

given maximum number of iterations or when the temperature between successive iterations

does not change by at least a given amount (whichever happens first).

Starter Code

We can store the temperature at each point in our domain in a two-dimensional array t [ i ][ j ] .

The values in this array can be initialised as follows:

? The values of the points along the length of the radiator can be initialised as 100. These

will be a subset of the boundary points on one of the boundaries (walls). If the radiator

appears to end between two grid points, include the outer grid point in the radiator

initialisation as well.

? All other points can be initialised at 10.

The rest of the program as a sequential code is given here:

for ( i t e r = 0 ; i t e r < l i m i t ; i t e r ++) {

for ( i = 1 ; i < n ; i++)

for ( j = 1 ; j < n ; j++)

r [ i ] [ j ] = 0 .25 ? ( t [ i ?1] [ j ] + t [ i +1] [ j ] + t [ i ] [ j ?1] + t [ i ] [ j +1 ] ) ;

for ( i = 1 ; i < n ; i++)

for ( j = 1 ; j < n ; j++)

t [ i ] [ j ] = r [ i ] [ j ] ; // data copying

}

Note that this code will always run for a fixed number of iterations. It is up to you to

implement an early termination criteria. Instead of dividing by 4, we multiply by 0.25 because

computer systems usually implement floating-point multiplication as a faster operation than

division. There are some inefficiencies in the given program - for example, we copy the

2021-2022 3

University of Liverpool Continuous Assessment 2 COMP 528

results from r[ i ][ j ] back to t [ i ][ j ] at the end of each iteration. If you remember our

discussion about the memory bottleneck, you will know that data movement is the enemy

of program performance and should be minimised. Most things you can do to improve the

serial performance of this code will also improve the performance of the parallel version so

take your time to find inefficiencies in this code and improve it as you see fit while making

sure that the numerical results don’t change.

All computations should be in double precision.

Task 1: Sequential Program

The given code can be improved by avoiding the data copying loop at the end of each iteration.

This can be achieved by defining t as a three-dimensional array of size 2× (N + 1)× (N + 1),

i.e. t [x ][ i ][ j ] . Every iteration can then swap between reading from x=0 and writing to

x=1, and vice-versa. For example, the first iteration could read from t [0][ i ][ j ] and write to

t [1][ i ][ j ] , while the second iteration could read from t [1][ i ][ j ] (which was just written in

the previous iteration) and write to t [0][ i ][ j ] (overwriting the initial values since they are

no longer required), and the third iteration would repeat what the first one did.

Rewrite the given code to implement the above change to remove the extra data copy. Write a

complete C program that models the temperature distribution of the room given in Figure 1.

Your domain should consist of N ×N points and you should solve to K iterations where N

and K are defined constants in your code (so you can change them easily). Instrument the

code to track the total time taken and display it in the end of the execution. Also display

the final temperature at every N/8×N/8 points in your domain, i.e. 8× 8 points, regardless

of the value of N .

Use smaller values of N and K while developing and testing your code, but your submission

should have N = 1024 and K = 100000.

Your submission for this task should include:

? Your sequential C program to solve the temperature distribution problem (as a .c file).

? A document containing:

– Screenshot of compiling the program on your computer

– Screenshot of the complete output of the program on your computer (could be for

a smaller K and N .)

– Screenshot of compiling the program on Barkla.

– Complete program output (could be a screenshot) of the program on Barkla.

2021-2022 4

University of Liverpool Continuous Assessment 2 COMP 528

Task 2: OpenMP program

Modify the sequential program from Task 1 to be an OpenMP program. This program should

start by solving the problem in serial (using the same code from Task 1), then it should solve

the problem in parallel using OpenMP. It should also have code to check that the results

do not change between the two invocations. Finally, measure the time for both versions,

calculate and report both the times and the speedup achieved.

Your submission for this task should include:

? Your complete OpenMP program to solve the temperature distribution problem (as a

.c file).

? A document containing:

– Screenshot of compiling the program on your computer

– Screenshot of the complete output of the program on your computer (could be for

a smaller K and N .)

– Screenshot of compiling the program on Barkla.

– Complete program output (could be a screenshot) of the program on Barkla.

– A table showing the run times for 1, 2, 4, 8, 16 and 32 threads.

– A plot showing speedup vs number of threads (see Lecture05.pdf, slides 31/32 for

an example).

Hints

You might get a segmentation fault trying to allocate an array like double t[2][N+1][N+1] on

your computer or even Barkla for large values of N . This is because the previous statement

allocates the memory on the stack, and there is a very tight limit on the size of variables you

can allocate on the stack. Instead, you should allocate this on the heap using the following

statement:

double (? t ) [N+1] [N+1] = mal loc ( s izeof (double [ 2 ] [N+1] [N+ 1 ] ) ) ;

In general, if you get segmentation faults when running your program, use a tool called gdb

to help debug. Read its manual to understand how to use it.

OpenMP has some pragmas to help the compiler spot vectorisation opportunities. Read

about them in the OpenMP manual.

My reference solution (which is not particularly optimised) took about 160 seconds on Barkla

in serial execution, and a best time of about 12 seconds in parallel execution. Use these

timings to guide your efforts.

2021-2022 5

University of Liverpool Continuous Assessment 2 COMP 528

You can consider printing the state of the domain every 500 iterations to follow the progress

of your simulation. However, make sure to remove all print statements from inside the

timed section when measuring runtime - the print statements will immensely slow down your

program! The timings I gave above are with the prints removed.

Marking scheme and submission

Your submission should be a zip file containing two documents, and two .c files - one for

each task above. Also include your SLURM submission scripts. This should be submitted

on Canvas, as the submission for CA2 in COMP 528.

1 Task 1: Code that compiles without errors or warnings 5%

2 Task 1: Same numerical results as the reference version 10%

3 Task 1: Successfully reduced data movement 10%

4 Task 2: Code that compiles without errors or warnings 5%

5 Task 2: Same numerical results as the reference version 10%

6 Task 2: Speedup plot 10%

7 Task 2: Maximum number of cores before speedup curve flattens out 20%

8 Task 2: Minimum runtime 20%

9 Task 1 and 2: Clean code and comments 10%

Table 1: Marking scheme