# COSC 1285编程代做、代写Java程序设计、代做Java程序 调试Web开发|代做R语言编程

- 首页 >> Database作业 Algorithms and Analysis

COSC 1285/2123

Assignment 1

Assessment Type Group assignment. Groups as allocated and notified on Canvas.

Submit online via Canvas → Assignments → Assignment

1. Marks awarded for meeting requirements as closely as

possible. Clarifications/updates may be made via announcements/relevant

discussion forums.

Due Date Friday 16th April 2021, 11:59pm

Marks 30

1 Overview

One of the frequent phrases we hear during the COVID-19 panademic are epidemic

modelling and contract tracing. As the virus is transmitted human to human, one of

the effective methods to stop an outbreak is to identify the close contacts of an infected

individual, and isolate all of them for a period of time (typically 14 days). To estimate

the impact of outbreaks, epidemic modelling is an extremely important tool. A graph

is a natural representation for these close contacts, tracing outbreaks and for epidemic

modelling.

When we represent these networks as a graph, the vertices in such a graph represent

people and edges represents close contact relationship.

In class, we studied three methods to represent the graph, the adjacency list, adjacency

matrix and vertex/edge list representations. There is a fourth type of representation

called incident matrix (see below for details). The performance of each representation

varies depending on the characteristics of the graph. In this assignment, we will implement

the adjacency list, adjacency matrix and incident matrix representations for the

relational parts, and evaluate on how well they perform when modelling close contacts

and for epidemic modelling. This will assist with your understanding of the tradeoffs

between data structure representations and its effect when implementing operations or

solving problems using algorithms and data structures.

2 Learning Outcomes

This assessment relates to all of the learning outcomes of the course which are:

CLO 1: Compare, contrast, and apply the key algorithmic design paradigms: brute

force, divide and conquer, decrease and conquer, transform and conquer, greedy,

dynamic programming and iterative improvement;

CLO 2: Compare, contrast, and apply key data structures: trees, lists, stacks,

queues, hash tables and graph representations;

CLO 3: Define, compare, analyse, and solve general algorithmic problem types:

sorting, searching, graphs and geometric;

CLO 4: Theoretically compare and analyse the time complexities of algorithms and

data structures; and

CLO 5: Implement, empirically compare, and apply fundamental algorithms and

data structures to real-world problems.

3 Background

3.1 Networked SIR Model

There are several well-known epidemic models, but the one that we will focus on in

this assignment is the (Networked) SIR model. The SIR epidemic model assumes each

individual has three states - Susceptible (S) to the disease, Infected (I) by the disease and

Recovered (R) to the disease. The progression of states is assumed to be susceptible (S)

to Infected (I) to Recovered (R). Initially some in the initial population are infected and

all others are susceptible. Once recovered, an individual has immunity and will not be

further infected. There are also other models, like SIS (Susceptible, Infected, Susceptible),

but for the purpose of this assignment, we will stick with the most fundamental model,

the SIR model, as others are really variations on this.

In the traditional SIR model, there is assumption of full mixing between individuals

- i.e., any individual can potentially infect another. The model then derives differential

equations that estimates the changes in the total number of susceptible, infected and

recovered individuals over time, and these are proportional to the numbers of the other

two states - e.g., change in number of infected individuals is dependent on the number

of susceptible and the number of recovered individuals. It is able to model some diseases

relatively well, but is inaccurate for others and misses out on local spreading and effects.

Hence a networked SIR model was proposed, where instead of full mixing between individuals,

infected individuals can now only infect its neighbours in the graph/network,

which typically represents some form of “close contact”.

The networked SIR model works as follows.

At each timestep, each susceptible individual can be infected by one of its infected

neighbours (doesn’t matter which one, as an individual is infected regardless of who

caused it). Each infected neighbour has a chance to infect the susceptible individual.

This chance is typically modelled as an infection parameter α - for each chance, there

is α probability of infection. In addition, at each timestep each infected individual have

a chance of recovering, either naturally or from medicine and other means (this model

doesn’t care which means). Let the recovery parameter β represent this, which means

there is β probability that an infected individual will recover and change to recovered

state. There is no transition for susceptible to recovered (which could be interesting, e.g.,

to model vaccination, but left for future assignments).

As you can probably infer, the more neighbours one has, the higher potential for

infection. Hence one reason for the general advice from the health departments to reduce

the number of close contacts. Also wearing masks reduces the infection probability, again

reducing the spread of an outbreak. In this assignment, you will get an opportunity to

experiment with these parameters and see what differences they make.

2

For more information about SIR and epidemic modelling, have an initial read here

https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology. We will

also release some recordings talking more about the model on Canvas, particularly providing

more details about how to run the SIR model to simulate an epidemic, please look

at those also when they become available.

3.2 Incidence Matrix Representation

The incidence matrix represents a graph as a set of vertices and list of edges incident to

each vertex as a 2D array/matrix. More formally, let the incidence matrix of a undirected

graph be an n x m matrix A with n and m are the number of vertices and edges of the

graph respectively. The values in the matrix A follows the following rules:

Each edge ek is represented by a column in A.

Each vertex is represented by a row in A.

Let the two incident vertices to an edge ek be vi and vj

. Then the matrix entries

Ai,k and Aj,k are both = 1.

Ai,k = 0 for all other non-incident edges.

For example, the following graph:

has its incidence matrix as below:

AB AC AD BC CD

A 1 1 1 0 0

B 1 0 0 1 0

C 0 1 0 1 1

D 0 0 1 0 1

For further readings about this graph representation, please see https://en.wikipedia.

org/wiki/Incidence_matrix.

4 Assignment Details

The assignment is broken up into a number of tasks, to help you progressively complete

the project.

3

Task A: Implement the Graph Representations, their Operations

and the Networked SIR model (10 marks)

In this task, you will implement data structures for representing undirected, unweighted

graphs using the adjacency list, adjacency matrix and incidence matrix representations.

Your implementation should also model vertices that have SIR model states associated

with them. Your implementation should support the following operations:

Create an empty undirected graph (implemented as a constructor that takes zero

arguments).

Add a vertex to the graph.

Add an edge to the graph.

Toggle the SIR state on vertices.

Delete a vertex from the graph.

Delete an edge from the graph.

Compute the k hop neighbours of a vertex in the graph.

Print out the set of vertices and their SIR states.

Print out the set of edges.

In addition, we want you to implement the (networked) SIR epidemic model to simulate

how a virus can spread through a population.

Note that you are welcome to implement additional functionality. When constructing

our solutions to the assignment, we have found that adding some methods was helpful,

particularly for implementing the SIR model. But the above functionalities are ones you

should complete and ones you will be assessed on.

Data Structure Details

Graphs can be implemented using a number of data structures. You are to implement

the graph abstract data type using the following data structures:

Adjacency list, using an array of linked lists.

Adjacency matrix, using a 2D array (an array of arrays).

Incidence matrix, using a 2D array (an array of arrays).

4

For the above data structures, you must program your own implementation, and not

use the LinkedList or Matrix type of data structures in java.utils or any other libraries.

You must implement your own nodes and methods to handle the operations. If you use

java.utils or other implementation from libraries, this will be considered as an invalid

implementation and attract 0 marks for that data structure. The only exceptions are:

if you choose to implement a map of vertex labels to a row or column index for the

adjacency matrix or incidence matrix, you may use one of the existing Map classes

to do this.

if you choose to implement a map of vertex labels to its associated SIR state, you

may use of the existing Map classes to do so.

Operations Details

Operations to perform on the implemented graph abstract data type are specified on the

command line. They are in the following format:

[arguments]

where operation is one of {AV, AE, TV, DV, KN, PV, PE, SIR, Q} and arguments is for

optional arguments of some of the operations. The operations take the following form:

AV – add a vertex with label ’vertLabel’ into the graph. Has default

SIR state of susceptible (S).

AE – add an edge with source vertex ’srcLabel’, target

vertex ’tarLabel’ into the graph.

TV – toggle the SIR state of vertex ’vertLabel’. Togglng means go to

the next state, i.e., from S to I, from I to R (if in R, remain in R).

DV – delete vertex ’vertLabel’ from the graph.

DE – remove edge with source vertex ’srcLabel’ and target

vertex ’tarLabel’ from the graph.

KN – Return the set of all neighbours for vertex ’vertLabel’ that

are up to k-hops away. The ordering of the neighbours does not matter. See below

for the required format.

PV – prints the vertex set of the graph. See below for the required format. The

vertices can be printed in any order.

PE – prints the edge set of the graph. See below for the required format. The edges

can be printed in any order.

SIR probability> – Run the networked SIR model simulation, using the current graph

as the network, the specified seed vertices as additional set of infected vertices (in

addition to any existing in the current graph) and the two infection and recover

5

probabilities. Runs the model simulation to completion, which means two things

a) there are no more vertices with Infected (I) state and there are no changes in the

number of infected or recovered in the latest iteration of the model; or b) if condition

there are still infected vertices but there has been no changes in the number of

infected or recovered for 10 iterations, then can stop the simulation. Outputs the

list of nodes infected at each iteration, see below for format.

Q – quits the program.

k-hop Neighbour operation format details The format of the output of the neighbour

operation for vertex ’A’ should take the form:

A: neighbour1 neighbour2 ...

If a vertex has no neighbours, then the list of neighbours should be empty. The node

’A’ should not be in the list of neighbours, and the graphs can be assumed to have no

self-loops (self loops don’t necessarily make sense for epidemic modelling).

Print vertex operation format details The print vertex operation output the vertices

and associated SIR state in the graph in a single line. The line should specifies all

the valid vertex (indices) in the graph.

(vertex1,state1) (vertex2,state2) (vertex3,state3) ...

where state1 is the SIR state of vertex 1 etc.

Print edge operation format details The print edge operation output the edges in

the graph in over a number of lines. Each line specifies an edge in the graph, and should

be in the following format:

srcVertex tarVertex

SIR operation format details The networked SIR model file output format is each

line represents one iteration of the process, and each line will record the number of newly

infected and recovered vertices (newly infected or recovered nodes are ones that become

infected or recovered in this iteration). The format is as follows:

: [vertices>] : []

Example of all operations As an example of the operations, consider the output

from the following list of operations:

The output from the four neighbour operations (‘KN 1 A’, ‘KN 2 A’ ‘KN 1 F’, ‘KN 1

H’) should be (remember that the order these neighbourhoods do not matter so if your

output differs from this ordering but has the same set of vertices, that is fine):

A: B E F

A: B C D F E G

F: A B G

H:

The output from the print vertices operation (PV) could be (remember that the order

doesn’t matter):

(A, S ) (B,R) (D, S ) (E, I ) (F, S ) (G, S ) (H, S )

The output from the print edges operation (P E) could be (remember that the order

doesn’t matter):

The output from the ‘SIR A;F 0.8 0.5’ operation could be (note that it is a stochastic

model, so you can get different results for different runs, so use the following more for

what the output format should look like):

1 : [G] : [ ]

2 : [ ] : [A]

3 : [ ] : [ ]

4 : [ ] : [ ]

5 : [ ] : [ F G]

6 : [ ] : [ ]

Testing Framework

We provide Java skeleton code (see Table 1 for the important files) to help you get

started and automate the correctness testing. You may add your own Java files to your

final submission, but please ensure that they compile and can be run on the core teaching

servers.

Notes

Consider the specifications here carefully. If you correctly implement the “Implement

me!” parts and follow the output formating for operations like PV, PE and

SIR, you in fact do not need to do anything else to get the correct output formatting.

The main class in the provided skeleton code will handle this.

The implementation details are up to you, but you must implement your own data

structures and algorithms, i.e., do not use the built-in structures in Java unless for

the purposes outlined in the box above.

If you develop on your own machines, please ensure your code compiles and runs

on the university’s core teaching servers. You don’t want to find out last minute

that your code doesn’t compile on these machines. If your code doesn’t run on

these machines, we unfortunately do not have the resources to debug each one and

cannot award marks for testing.

❼ All submissions should compile with no warnings on Oracle Java 1.8 - this is the

version on the core teaching servers.

❼ Any clarifications on the Canvas FAQ or Assignment 1 Update page will override

the specifications here if they are conflicting.

8

file description

RmitCovidModelling.java Main class. Contains code that reads in operation commands

from stdin then executes those on the selected graph implementation.

Also will format the output as required. No need

to modify this file.

ContactsGraph.java Interface class for the graph representations. It contains the

common interface/methods that you’ll need to implement for

the various representations. Generally no need to modify this

file, but you might want to add helper functionality, and here

would be one place to do so for common ones across all representations..

AdjacencyList.java Code that implements the adjacency list implementation of

a graph. Complete the implementation (implement parts labelled

“Implement me!”).

AdjacencyMatrix.java Code that implements the adjacency matrix implementation

of a graph. Complete the implementation (implement parts

labelled “Implement me!”).

IncidenceMatrix.java Code that implements the incidence matrix implementation

of a graph. Complete the implementation (implement parts

labelled “Implement me!”).

SIRState.java Code that implements the three vertex states of a SIR model.

No need to modify this file.

SIRModel.java Code that implements the networked SIR model. Complete

the implementation (implement parts labelled “Implement

me!”).

Table 1: Table of Java files.

Task B: Evaluate your Data Structures and SIR model (20 marks)

In this second task, you will evaluate your implemented structures in terms of their time

complexities for the different operations and different use case scenarios. Scenarios arise

from the possible use cases of contract tracing and epidemic modelling on a graph. In

addition, run some epidemic modelling simulations to evaluate your structures within an

application (epidemic modelling and simulation) and to explore more about the effect of

the model parameters on epidemic spreading.

Write a report on your analysis and evaluation of the different implementations. Consider

and recommend in which scenarios each type of implementation would be most

appropriate. Report the outcomes of your SIR epidemic model simulations. The report

should be 10 pages or less, in font size 12. See the assessment rubric (Appendix A) for

the criteria we are seeking in the report.

9

Graph generators

To evaluate the data structures, we will generate a number of graphs, then apply a

number of operations on them, see the Use Case Scenarios section next. In this section,

we describe more about how to generate the graphs, and see on Canvas for further details

and videos.

There are many ways to generate graphs, but we will focus on two:

Random (Erdos-Renyi) – there is a probability for generating an edge between any

pair of vertices.

Scale-free – graphs whose degree distribution follow a power law (see https://en.

wikipedia.org/wiki/Scale-free_network).

We suggest to use Pajek1

, which is available on multiple platforms and has visualisation,

which allows one to look at the generated graphs. But it also has basic graph

generation functionality and doesn’t require programming to use, unlike more powerful

graph libraries2

.

Use Case Scenarios

Typically, you use real usage data to evaluate your data structures. However, for this assignment,

you will write data generators to enable testing over different scenarios of

interest. We are also interested in the effect of the average degree of the graph3 on these

scenarios. There are many possibilities, but for this assignment, consider the following

scenarios:

Scenario 1 k-hop Neighbourhoods: When doing contract tracing and/or epidemic

modelling, computing neighbourhoods is an important function. In this scenario, the

graph is not changing, but important operations such as k-hop neighbourhood (recall

this is all neighbours up to k-hops away) are requested.

You are to evaluate the performance of the the k-hop neighbourhood implementations,

as the average degree of the evaluated graph and k are varied.

Scenario 2 Dynamic Contact Conditions: In the real world, the contact conditions

between people are likely not static and there will be need to update these over time. In

this scenario, the contacts are changing (been added and deleted). In this scenario, you

are to evaluate the performance of your implementations in terms of:

edge additions

edge deletions

You are to evaluate the performance the edge operations as the average degree and

size (number of vertices) of the initial graph (before edge changes) are varied.

1http://mrvar.fdv.uni-lj.si/pajek/ the website looks a bit dated, but this is one of the well

known social network analysis tools.

2Networkx https://networkx.org/ – if you know Python, you are more than welcome to use this

instead.

3Average number of neighbours per node, which both graph generator models allows to be specified

10

Scenario 3 Dynamic People Tracing: As time goes on, the people been traced will

change. Although it is possible to have a large graph that tries to capture every possible

person, it is computationally difficult to do so and run simulations on it. In this scenario,

you are to evaluate the performance of your implementations in terms of:

vertex addition

vertex deletion

You are to evaluate the performance the vertex operations as the average degree and size

(number of vertices) of the initial graph (before vertex changes) are varied.

SIR Model Epidemic Simulation

In addition to evaluating how well the graph representations perform for the contract

tracing and epidemic modelling simulations, we also want to experiment with how the

networked SIR model works and performs.

Run epidemic simulation for different graph types (Erdos-Renyi and Scale-free), different

seed initialisations and different infection and recover probabilities. For performance

evaluation, evaluate each of the three data structures on a subset of the possible simulation

parameters and compare how they perform (in terms of timing efficiency).

We don’t expect a comprehensive analysis, but want you to explore and gain a better

understanding of epidemic modelling and what it implies for how to limit the spread of

epidemics.

Consider how you might present the results, but at a minimum plot the following:

Number of susceptible, infected and recovered after every iteration.

Data Generation

When generating the vertices and edges to add or remove and find neighbourhoods for,

the distribution of these elements, compared to what is in the graph already, will have

an effect on the timing performance. However, without the usage and query data, it is

difficult to specify what this distributions might be. Instead, in this assignment, uniformly

sample from a fixed range, e.g., 0 to max vertex label of your graph when generating the

vertices and edges for removing and adding and k-hop neighbourhoods, and a different

range (e.g., greater than the largest vertex label when adding vertices (we do not want

to repeatingly add vertices that are in the graph already).

For generating graphs with different initial average degrees and sizes, you can either

use Pajek or your favourite graph generator, we will not prescribe one particular approach.

Whichever method you decide to use, remember to generate graphs of different average

degrees to evaluate on. Due to the randomness of the data, you may wish to generate a

few graphs with the same average degrees and sizes and take the average across a number

of runs when performing the performance evaluation and analysis.

Analysis

In your analysis, you should evaluate each of your representations and data structures in

terms of the different scenarios outlined above.

Note, you may be generating and evaluating a significant number of graphs and evaluation

datasets, hence we advise you to get started on this part relatively early.

11

5 Report Structure

As a guide, the report could contain the following sections:

Explain your data and experimental setup. Things to include are (brief) explanations

of the data scenarios you decide to evaluate on (e.g., how many additions/removals

operations did you evaluate and why these), size of the graphs, the range of

average degrees tested (add some brief explanation of why this range selection), describe

how the scenarios were generated (a paragraph and perhaps a figure or high

level pseudo code suffice), which approach(es) you decide to use for measuring the

timing results, and briefly describe the fixed set(s) you sampled from to generate

the elements for addition, removal of vertices and edges and k-hop neighbourhood

of the graphs.

Evaluation of the data structures using the generated data (different scenarios,

average degree, graph size). Analyse, compare and discuss your results. Provide

your explanation on why you think the results are as you observed. You may

consider using the known theoretical time complexities of the operations of each

data structure to help in your explanation if useful.

Summarise your analysis as recommendations, e.g., for this certain data scenario

of this average degree (and size), I recommend to use this data structure because...

We suggest you refer to your previous analysis to help.

Evaluate the different parameters of the SIR model and its effect on the number

of susceptible, infected and recovered individuals over time. Which combination

of parameters has the fastest infection spread, which one has the most extensive

number of infected? How does the average timing compare between your three

graph representation? Can you explain why?

6 Submission

The final submission will consist of three parts:

Your Java source code of your implementations. Include only source code

and no class files, we will compile your code on the core teaching serers. Make

sure that your assignment can run only with the code included! That is, it should

be self contained, including all the skeleton code needed to run it.

Your source code should be placed into in a flat structure, i.e., all the files should

be in the same directory/folder, and that directory/folder should be named as

Assign1-. Specifically, if your student number is s12345,

when unzip Assign1-s12345.zip is executed then all the source code files should

be in directory Assign1-s12345.

Your data generation code. Create a sub-directory/sub-folder called “generation”

within the Java source file directory/folder. Place your generation code

within that folder. We will not run the code, but may examine their contents.

Your written report for part B in PDF format, called “assign1.pdf”. Submit

this separately to a Turnitin submission.

12

Note: submission of the report and code will be done via Canvas. We will

provide details closer to the submission deadline. We will also provide details on how

the setup we use to test the correctness of your code will be like, so you can ensure your

submitted structure will conform to the automated testing we will perform.

7 Assessment

The project will be marked out of 30. Late submissions will incur a deduction of 3 marks

per day, and no submissions will be accepted 5 days beyond the due date and will attract

0 marks for the assignment.

The assessment in this project will be broken down into two parts. The following

criteria will be considered when allocating marks.

Implementation (10/30):

You implementation will be assessed based on the number of tests it passes in our

automated testing.

While the emphasis of this project is not programming, we would like you to maintain

decent coding design, readability and commenting, hence commenting and

coding style will make up a portion of your marks.

Report (20/30):

The marking sheet in Appendix A outlines the criteria that will be used to guide the

marking of your evaluation report4

. Use the criteria and the suggested report structure

(Section 5) to inform you of how to write the report.

8 Team Structure

This project should be done in pairs (group of two). If you have difficulty in finding a

partner, post on the discussion forum or contact your lecturer. If you want to do the

assignment individually, please contact one of your lecturers first and obtain his

approval.

In addition, please submit what percentage each partner made to the assignment (a

contribution sheet will be made available for you to fill in), and submit this sheet in your

submission. The contributions of your group should add up to 100%. If the contribution

percentages are not 50-50, the partner with less than 50% will have their marks reduced.

Let student A has contribution X%, and student B has contribution Y%, and X > Y .

The group is given a group mark of M. Student A will get M for assignment 1, but student

B will get M

X

Y

.

This semester we will also institute some additional group management initiatives,

including registration and not allowing further group changes in the week before the due

date. We will detail more about this on Canvas.

4Note for the marking guide, if one of the criteria is not demonstrated a tall, then 0 marks will be

awarded for that criteria.

13

9 Academic integrity and plagiarism (standard warning)

Academic integrity is about honest presentation of your academic work. It means acknowledging

the work of others while developing your own insights, knowledge and ideas.

You should take extreme care that you have:

Acknowledged words, data, diagrams, models, frameworks and/or ideas of others

you have quoted (i.e. directly copied), summarised, paraphrased, discussed or mentioned

in your assessment through the appropriate referencing methods

❼ Provided a reference list of the publication details so your reader can locate the

source if necessary. This includes material taken from Internet sites. If you do not

acknowledge the sources of your material, you may be accused of plagiarism because

you have passed off the work and ideas of another person without appropriate

referencing, as if they were your own.

RMIT University treats plagiarism as a very serious offence constituting misconduct.

Plagiarism covers a variety of inappropriate behaviours, including:

Failure to properly document a source

Copyright material from the internet or databases

Collusion between students

For further information on our policies and procedures, please refer to the following:

https://www.rmit.edu.au/students/student-essentials/rights-and-responsibilities/

academic-integrity.

We will run both code and report similarity checks.

10 Getting Help

There are multiple venues to get help. First point of call should be Canvas, assignment

FAQ (on Canvas), recordings about it and the discussion forum. There will also be weekly

online consultation hours for the assignment (see Canvas for times). In addition, you are

encouraged to discuss any issues you have with your Tutor or Lab Demonstrator. Please

refrain from posting solutions to the discussion forum.

14

A Marking Guide for the Report

Design of Evaluation Analysis of Results Report Clarity and Structure

(Maximum = 5 marks) (Maximum = 10 marks) (Maximum = 5 marks)

5 marks 10 marks 5 marks

Data generation and

experimental setup are

well designed, systematic

and well explained. All

suggested scenarios, data

structures and a

reasonable range of

parameters were

evaluated. Each type of

test was run over a

number of runs and

results were averaged.

Analysis is thorough and demonstrates

understanding and critical

analysis. Well-reasoned explanations

and comparisons are provided

for all the data structures, scenarios

and parameters. All analysis,

comparisons and conclusions

are supported by empirical evidence

and possibly theoretical complexities.

Well reasoned recommendations

are given.

Very clear, well structured

and accessible report,

an undergraduate

student can pick up the

report and understand it

with no difficulty.

3.75 marks 7.5 marks 3.75 marks

Data generation and

experimental setup are

reasonably designed,

systematic and explained.

There are at least one

obvious missing suggested

scenarios, data structur

COSC 1285/2123

Assignment 1

Assessment Type Group assignment. Groups as allocated and notified on Canvas.

Submit online via Canvas → Assignments → Assignment

1. Marks awarded for meeting requirements as closely as

possible. Clarifications/updates may be made via announcements/relevant

discussion forums.

Due Date Friday 16th April 2021, 11:59pm

Marks 30

1 Overview

One of the frequent phrases we hear during the COVID-19 panademic are epidemic

modelling and contract tracing. As the virus is transmitted human to human, one of

the effective methods to stop an outbreak is to identify the close contacts of an infected

individual, and isolate all of them for a period of time (typically 14 days). To estimate

the impact of outbreaks, epidemic modelling is an extremely important tool. A graph

is a natural representation for these close contacts, tracing outbreaks and for epidemic

modelling.

When we represent these networks as a graph, the vertices in such a graph represent

people and edges represents close contact relationship.

In class, we studied three methods to represent the graph, the adjacency list, adjacency

matrix and vertex/edge list representations. There is a fourth type of representation

called incident matrix (see below for details). The performance of each representation

varies depending on the characteristics of the graph. In this assignment, we will implement

the adjacency list, adjacency matrix and incident matrix representations for the

relational parts, and evaluate on how well they perform when modelling close contacts

and for epidemic modelling. This will assist with your understanding of the tradeoffs

between data structure representations and its effect when implementing operations or

solving problems using algorithms and data structures.

2 Learning Outcomes

This assessment relates to all of the learning outcomes of the course which are:

CLO 1: Compare, contrast, and apply the key algorithmic design paradigms: brute

force, divide and conquer, decrease and conquer, transform and conquer, greedy,

dynamic programming and iterative improvement;

CLO 2: Compare, contrast, and apply key data structures: trees, lists, stacks,

queues, hash tables and graph representations;

CLO 3: Define, compare, analyse, and solve general algorithmic problem types:

sorting, searching, graphs and geometric;

CLO 4: Theoretically compare and analyse the time complexities of algorithms and

data structures; and

CLO 5: Implement, empirically compare, and apply fundamental algorithms and

data structures to real-world problems.

3 Background

3.1 Networked SIR Model

There are several well-known epidemic models, but the one that we will focus on in

this assignment is the (Networked) SIR model. The SIR epidemic model assumes each

individual has three states - Susceptible (S) to the disease, Infected (I) by the disease and

Recovered (R) to the disease. The progression of states is assumed to be susceptible (S)

to Infected (I) to Recovered (R). Initially some in the initial population are infected and

all others are susceptible. Once recovered, an individual has immunity and will not be

further infected. There are also other models, like SIS (Susceptible, Infected, Susceptible),

but for the purpose of this assignment, we will stick with the most fundamental model,

the SIR model, as others are really variations on this.

In the traditional SIR model, there is assumption of full mixing between individuals

- i.e., any individual can potentially infect another. The model then derives differential

equations that estimates the changes in the total number of susceptible, infected and

recovered individuals over time, and these are proportional to the numbers of the other

two states - e.g., change in number of infected individuals is dependent on the number

of susceptible and the number of recovered individuals. It is able to model some diseases

relatively well, but is inaccurate for others and misses out on local spreading and effects.

Hence a networked SIR model was proposed, where instead of full mixing between individuals,

infected individuals can now only infect its neighbours in the graph/network,

which typically represents some form of “close contact”.

The networked SIR model works as follows.

At each timestep, each susceptible individual can be infected by one of its infected

neighbours (doesn’t matter which one, as an individual is infected regardless of who

caused it). Each infected neighbour has a chance to infect the susceptible individual.

This chance is typically modelled as an infection parameter α - for each chance, there

is α probability of infection. In addition, at each timestep each infected individual have

a chance of recovering, either naturally or from medicine and other means (this model

doesn’t care which means). Let the recovery parameter β represent this, which means

there is β probability that an infected individual will recover and change to recovered

state. There is no transition for susceptible to recovered (which could be interesting, e.g.,

to model vaccination, but left for future assignments).

As you can probably infer, the more neighbours one has, the higher potential for

infection. Hence one reason for the general advice from the health departments to reduce

the number of close contacts. Also wearing masks reduces the infection probability, again

reducing the spread of an outbreak. In this assignment, you will get an opportunity to

experiment with these parameters and see what differences they make.

2

For more information about SIR and epidemic modelling, have an initial read here

https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology. We will

also release some recordings talking more about the model on Canvas, particularly providing

more details about how to run the SIR model to simulate an epidemic, please look

at those also when they become available.

3.2 Incidence Matrix Representation

The incidence matrix represents a graph as a set of vertices and list of edges incident to

each vertex as a 2D array/matrix. More formally, let the incidence matrix of a undirected

graph be an n x m matrix A with n and m are the number of vertices and edges of the

graph respectively. The values in the matrix A follows the following rules:

Each edge ek is represented by a column in A.

Each vertex is represented by a row in A.

Let the two incident vertices to an edge ek be vi and vj

. Then the matrix entries

Ai,k and Aj,k are both = 1.

Ai,k = 0 for all other non-incident edges.

For example, the following graph:

has its incidence matrix as below:

AB AC AD BC CD

A 1 1 1 0 0

B 1 0 0 1 0

C 0 1 0 1 1

D 0 0 1 0 1

For further readings about this graph representation, please see https://en.wikipedia.

org/wiki/Incidence_matrix.

4 Assignment Details

The assignment is broken up into a number of tasks, to help you progressively complete

the project.

3

Task A: Implement the Graph Representations, their Operations

and the Networked SIR model (10 marks)

In this task, you will implement data structures for representing undirected, unweighted

graphs using the adjacency list, adjacency matrix and incidence matrix representations.

Your implementation should also model vertices that have SIR model states associated

with them. Your implementation should support the following operations:

Create an empty undirected graph (implemented as a constructor that takes zero

arguments).

Add a vertex to the graph.

Add an edge to the graph.

Toggle the SIR state on vertices.

Delete a vertex from the graph.

Delete an edge from the graph.

Compute the k hop neighbours of a vertex in the graph.

Print out the set of vertices and their SIR states.

Print out the set of edges.

In addition, we want you to implement the (networked) SIR epidemic model to simulate

how a virus can spread through a population.

Note that you are welcome to implement additional functionality. When constructing

our solutions to the assignment, we have found that adding some methods was helpful,

particularly for implementing the SIR model. But the above functionalities are ones you

should complete and ones you will be assessed on.

Data Structure Details

Graphs can be implemented using a number of data structures. You are to implement

the graph abstract data type using the following data structures:

Adjacency list, using an array of linked lists.

Adjacency matrix, using a 2D array (an array of arrays).

Incidence matrix, using a 2D array (an array of arrays).

4

For the above data structures, you must program your own implementation, and not

use the LinkedList or Matrix type of data structures in java.utils or any other libraries.

You must implement your own nodes and methods to handle the operations. If you use

java.utils or other implementation from libraries, this will be considered as an invalid

implementation and attract 0 marks for that data structure. The only exceptions are:

if you choose to implement a map of vertex labels to a row or column index for the

adjacency matrix or incidence matrix, you may use one of the existing Map classes

to do this.

if you choose to implement a map of vertex labels to its associated SIR state, you

may use of the existing Map classes to do so.

Operations Details

Operations to perform on the implemented graph abstract data type are specified on the

command line. They are in the following format:

where operation is one of {AV, AE, TV, DV, KN, PV, PE, SIR, Q} and arguments is for

optional arguments of some of the operations. The operations take the following form:

AV

SIR state of susceptible (S).

AE

vertex ’tarLabel’ into the graph.

TV

the next state, i.e., from S to I, from I to R (if in R, remain in R).

DV

DE

vertex ’tarLabel’ from the graph.

KN

are up to k-hops away. The ordering of the neighbours does not matter. See below

for the required format.

PV – prints the vertex set of the graph. See below for the required format. The

vertices can be printed in any order.

PE – prints the edge set of the graph. See below for the required format. The edges

can be printed in any order.

SIR

as the network, the specified seed vertices as additional set of infected vertices (in

addition to any existing in the current graph) and the two infection and recover

5

probabilities. Runs the model simulation to completion, which means two things

a) there are no more vertices with Infected (I) state and there are no changes in the

number of infected or recovered in the latest iteration of the model; or b) if condition

there are still infected vertices but there has been no changes in the number of

infected or recovered for 10 iterations, then can stop the simulation. Outputs the

list of nodes infected at each iteration, see below for format.

Q – quits the program.

k-hop Neighbour operation format details The format of the output of the neighbour

operation for vertex ’A’ should take the form:

A: neighbour1 neighbour2 ...

If a vertex has no neighbours, then the list of neighbours should be empty. The node

’A’ should not be in the list of neighbours, and the graphs can be assumed to have no

self-loops (self loops don’t necessarily make sense for epidemic modelling).

Print vertex operation format details The print vertex operation output the vertices

and associated SIR state in the graph in a single line. The line should specifies all

the valid vertex (indices) in the graph.

(vertex1,state1) (vertex2,state2) (vertex3,state3) ...

where state1 is the SIR state of vertex 1 etc.

Print edge operation format details The print edge operation output the edges in

the graph in over a number of lines. Each line specifies an edge in the graph, and should

be in the following format:

srcVertex tarVertex

SIR operation format details The networked SIR model file output format is each

line represents one iteration of the process, and each line will record the number of newly

infected and recovered vertices (newly infected or recovered nodes are ones that become

infected or recovered in this iteration). The format is as follows:

Example of all operations As an example of the operations, consider the output

from the following list of operations:

The output from the four neighbour operations (‘KN 1 A’, ‘KN 2 A’ ‘KN 1 F’, ‘KN 1

H’) should be (remember that the order these neighbourhoods do not matter so if your

output differs from this ordering but has the same set of vertices, that is fine):

A: B E F

A: B C D F E G

F: A B G

H:

The output from the print vertices operation (PV) could be (remember that the order

doesn’t matter):

(A, S ) (B,R) (D, S ) (E, I ) (F, S ) (G, S ) (H, S )

The output from the print edges operation (P E) could be (remember that the order

doesn’t matter):

The output from the ‘SIR A;F 0.8 0.5’ operation could be (note that it is a stochastic

model, so you can get different results for different runs, so use the following more for

what the output format should look like):

1 : [G] : [ ]

2 : [ ] : [A]

3 : [ ] : [ ]

4 : [ ] : [ ]

5 : [ ] : [ F G]

6 : [ ] : [ ]

Testing Framework

We provide Java skeleton code (see Table 1 for the important files) to help you get

started and automate the correctness testing. You may add your own Java files to your

final submission, but please ensure that they compile and can be run on the core teaching

servers.

Notes

Consider the specifications here carefully. If you correctly implement the “Implement

me!” parts and follow the output formating for operations like PV, PE and

SIR, you in fact do not need to do anything else to get the correct output formatting.

The main class in the provided skeleton code will handle this.

The implementation details are up to you, but you must implement your own data

structures and algorithms, i.e., do not use the built-in structures in Java unless for

the purposes outlined in the box above.

If you develop on your own machines, please ensure your code compiles and runs

on the university’s core teaching servers. You don’t want to find out last minute

that your code doesn’t compile on these machines. If your code doesn’t run on

these machines, we unfortunately do not have the resources to debug each one and

cannot award marks for testing.

❼ All submissions should compile with no warnings on Oracle Java 1.8 - this is the

version on the core teaching servers.

❼ Any clarifications on the Canvas FAQ or Assignment 1 Update page will override

the specifications here if they are conflicting.

8

file description

RmitCovidModelling.java Main class. Contains code that reads in operation commands

from stdin then executes those on the selected graph implementation.

Also will format the output as required. No need

to modify this file.

ContactsGraph.java Interface class for the graph representations. It contains the

common interface/methods that you’ll need to implement for

the various representations. Generally no need to modify this

file, but you might want to add helper functionality, and here

would be one place to do so for common ones across all representations..

AdjacencyList.java Code that implements the adjacency list implementation of

a graph. Complete the implementation (implement parts labelled

“Implement me!”).

AdjacencyMatrix.java Code that implements the adjacency matrix implementation

of a graph. Complete the implementation (implement parts

labelled “Implement me!”).

IncidenceMatrix.java Code that implements the incidence matrix implementation

of a graph. Complete the implementation (implement parts

labelled “Implement me!”).

SIRState.java Code that implements the three vertex states of a SIR model.

No need to modify this file.

SIRModel.java Code that implements the networked SIR model. Complete

the implementation (implement parts labelled “Implement

me!”).

Table 1: Table of Java files.

Task B: Evaluate your Data Structures and SIR model (20 marks)

In this second task, you will evaluate your implemented structures in terms of their time

complexities for the different operations and different use case scenarios. Scenarios arise

from the possible use cases of contract tracing and epidemic modelling on a graph. In

addition, run some epidemic modelling simulations to evaluate your structures within an

application (epidemic modelling and simulation) and to explore more about the effect of

the model parameters on epidemic spreading.

Write a report on your analysis and evaluation of the different implementations. Consider

and recommend in which scenarios each type of implementation would be most

appropriate. Report the outcomes of your SIR epidemic model simulations. The report

should be 10 pages or less, in font size 12. See the assessment rubric (Appendix A) for

the criteria we are seeking in the report.

9

Graph generators

To evaluate the data structures, we will generate a number of graphs, then apply a

number of operations on them, see the Use Case Scenarios section next. In this section,

we describe more about how to generate the graphs, and see on Canvas for further details

and videos.

There are many ways to generate graphs, but we will focus on two:

Random (Erdos-Renyi) – there is a probability for generating an edge between any

pair of vertices.

Scale-free – graphs whose degree distribution follow a power law (see https://en.

wikipedia.org/wiki/Scale-free_network).

We suggest to use Pajek1

, which is available on multiple platforms and has visualisation,

which allows one to look at the generated graphs. But it also has basic graph

generation functionality and doesn’t require programming to use, unlike more powerful

graph libraries2

.

Use Case Scenarios

Typically, you use real usage data to evaluate your data structures. However, for this assignment,

you will write data generators to enable testing over different scenarios of

interest. We are also interested in the effect of the average degree of the graph3 on these

scenarios. There are many possibilities, but for this assignment, consider the following

scenarios:

Scenario 1 k-hop Neighbourhoods: When doing contract tracing and/or epidemic

modelling, computing neighbourhoods is an important function. In this scenario, the

graph is not changing, but important operations such as k-hop neighbourhood (recall

this is all neighbours up to k-hops away) are requested.

You are to evaluate the performance of the the k-hop neighbourhood implementations,

as the average degree of the evaluated graph and k are varied.

Scenario 2 Dynamic Contact Conditions: In the real world, the contact conditions

between people are likely not static and there will be need to update these over time. In

this scenario, the contacts are changing (been added and deleted). In this scenario, you

are to evaluate the performance of your implementations in terms of:

edge additions

edge deletions

You are to evaluate the performance the edge operations as the average degree and

size (number of vertices) of the initial graph (before edge changes) are varied.

1http://mrvar.fdv.uni-lj.si/pajek/ the website looks a bit dated, but this is one of the well

known social network analysis tools.

2Networkx https://networkx.org/ – if you know Python, you are more than welcome to use this

instead.

3Average number of neighbours per node, which both graph generator models allows to be specified

10

Scenario 3 Dynamic People Tracing: As time goes on, the people been traced will

change. Although it is possible to have a large graph that tries to capture every possible

person, it is computationally difficult to do so and run simulations on it. In this scenario,

you are to evaluate the performance of your implementations in terms of:

vertex addition

vertex deletion

You are to evaluate the performance the vertex operations as the average degree and size

(number of vertices) of the initial graph (before vertex changes) are varied.

SIR Model Epidemic Simulation

In addition to evaluating how well the graph representations perform for the contract

tracing and epidemic modelling simulations, we also want to experiment with how the

networked SIR model works and performs.

Run epidemic simulation for different graph types (Erdos-Renyi and Scale-free), different

seed initialisations and different infection and recover probabilities. For performance

evaluation, evaluate each of the three data structures on a subset of the possible simulation

parameters and compare how they perform (in terms of timing efficiency).

We don’t expect a comprehensive analysis, but want you to explore and gain a better

understanding of epidemic modelling and what it implies for how to limit the spread of

epidemics.

Consider how you might present the results, but at a minimum plot the following:

Number of susceptible, infected and recovered after every iteration.

Data Generation

When generating the vertices and edges to add or remove and find neighbourhoods for,

the distribution of these elements, compared to what is in the graph already, will have

an effect on the timing performance. However, without the usage and query data, it is

difficult to specify what this distributions might be. Instead, in this assignment, uniformly

sample from a fixed range, e.g., 0 to max vertex label of your graph when generating the

vertices and edges for removing and adding and k-hop neighbourhoods, and a different

range (e.g., greater than the largest vertex label when adding vertices (we do not want

to repeatingly add vertices that are in the graph already).

For generating graphs with different initial average degrees and sizes, you can either

use Pajek or your favourite graph generator, we will not prescribe one particular approach.

Whichever method you decide to use, remember to generate graphs of different average

degrees to evaluate on. Due to the randomness of the data, you may wish to generate a

few graphs with the same average degrees and sizes and take the average across a number

of runs when performing the performance evaluation and analysis.

Analysis

In your analysis, you should evaluate each of your representations and data structures in

terms of the different scenarios outlined above.

Note, you may be generating and evaluating a significant number of graphs and evaluation

datasets, hence we advise you to get started on this part relatively early.

11

5 Report Structure

As a guide, the report could contain the following sections:

Explain your data and experimental setup. Things to include are (brief) explanations

of the data scenarios you decide to evaluate on (e.g., how many additions/removals

operations did you evaluate and why these), size of the graphs, the range of

average degrees tested (add some brief explanation of why this range selection), describe

how the scenarios were generated (a paragraph and perhaps a figure or high

level pseudo code suffice), which approach(es) you decide to use for measuring the

timing results, and briefly describe the fixed set(s) you sampled from to generate

the elements for addition, removal of vertices and edges and k-hop neighbourhood

of the graphs.

Evaluation of the data structures using the generated data (different scenarios,

average degree, graph size). Analyse, compare and discuss your results. Provide

your explanation on why you think the results are as you observed. You may

consider using the known theoretical time complexities of the operations of each

data structure to help in your explanation if useful.

Summarise your analysis as recommendations, e.g., for this certain data scenario

of this average degree (and size), I recommend to use this data structure because...

We suggest you refer to your previous analysis to help.

Evaluate the different parameters of the SIR model and its effect on the number

of susceptible, infected and recovered individuals over time. Which combination

of parameters has the fastest infection spread, which one has the most extensive

number of infected? How does the average timing compare between your three

graph representation? Can you explain why?

6 Submission

The final submission will consist of three parts:

Your Java source code of your implementations. Include only source code

and no class files, we will compile your code on the core teaching serers. Make

sure that your assignment can run only with the code included! That is, it should

be self contained, including all the skeleton code needed to run it.

Your source code should be placed into in a flat structure, i.e., all the files should

be in the same directory/folder, and that directory/folder should be named as

Assign1-

when unzip Assign1-s12345.zip is executed then all the source code files should

be in directory Assign1-s12345.

Your data generation code. Create a sub-directory/sub-folder called “generation”

within the Java source file directory/folder. Place your generation code

within that folder. We will not run the code, but may examine their contents.

Your written report for part B in PDF format, called “assign1.pdf”. Submit

this separately to a Turnitin submission.

12

Note: submission of the report and code will be done via Canvas. We will

provide details closer to the submission deadline. We will also provide details on how

the setup we use to test the correctness of your code will be like, so you can ensure your

submitted structure will conform to the automated testing we will perform.

7 Assessment

The project will be marked out of 30. Late submissions will incur a deduction of 3 marks

per day, and no submissions will be accepted 5 days beyond the due date and will attract

0 marks for the assignment.

The assessment in this project will be broken down into two parts. The following

criteria will be considered when allocating marks.

Implementation (10/30):

You implementation will be assessed based on the number of tests it passes in our

automated testing.

While the emphasis of this project is not programming, we would like you to maintain

decent coding design, readability and commenting, hence commenting and

coding style will make up a portion of your marks.

Report (20/30):

The marking sheet in Appendix A outlines the criteria that will be used to guide the

marking of your evaluation report4

. Use the criteria and the suggested report structure

(Section 5) to inform you of how to write the report.

8 Team Structure

This project should be done in pairs (group of two). If you have difficulty in finding a

partner, post on the discussion forum or contact your lecturer. If you want to do the

assignment individually, please contact one of your lecturers first and obtain his

approval.

In addition, please submit what percentage each partner made to the assignment (a

contribution sheet will be made available for you to fill in), and submit this sheet in your

submission. The contributions of your group should add up to 100%. If the contribution

percentages are not 50-50, the partner with less than 50% will have their marks reduced.

Let student A has contribution X%, and student B has contribution Y%, and X > Y .

The group is given a group mark of M. Student A will get M for assignment 1, but student

B will get M

X

Y

.

This semester we will also institute some additional group management initiatives,

including registration and not allowing further group changes in the week before the due

date. We will detail more about this on Canvas.

4Note for the marking guide, if one of the criteria is not demonstrated a tall, then 0 marks will be

awarded for that criteria.

13

9 Academic integrity and plagiarism (standard warning)

Academic integrity is about honest presentation of your academic work. It means acknowledging

the work of others while developing your own insights, knowledge and ideas.

You should take extreme care that you have:

Acknowledged words, data, diagrams, models, frameworks and/or ideas of others

you have quoted (i.e. directly copied), summarised, paraphrased, discussed or mentioned

in your assessment through the appropriate referencing methods

❼ Provided a reference list of the publication details so your reader can locate the

source if necessary. This includes material taken from Internet sites. If you do not

acknowledge the sources of your material, you may be accused of plagiarism because

you have passed off the work and ideas of another person without appropriate

referencing, as if they were your own.

RMIT University treats plagiarism as a very serious offence constituting misconduct.

Plagiarism covers a variety of inappropriate behaviours, including:

Failure to properly document a source

Copyright material from the internet or databases

Collusion between students

For further information on our policies and procedures, please refer to the following:

https://www.rmit.edu.au/students/student-essentials/rights-and-responsibilities/

academic-integrity.

We will run both code and report similarity checks.

10 Getting Help

There are multiple venues to get help. First point of call should be Canvas, assignment

FAQ (on Canvas), recordings about it and the discussion forum. There will also be weekly

online consultation hours for the assignment (see Canvas for times). In addition, you are

encouraged to discuss any issues you have with your Tutor or Lab Demonstrator. Please

refrain from posting solutions to the discussion forum.

14

A Marking Guide for the Report

Design of Evaluation Analysis of Results Report Clarity and Structure

(Maximum = 5 marks) (Maximum = 10 marks) (Maximum = 5 marks)

5 marks 10 marks 5 marks

Data generation and

experimental setup are

well designed, systematic

and well explained. All

suggested scenarios, data

structures and a

reasonable range of

parameters were

evaluated. Each type of

test was run over a

number of runs and

results were averaged.

Analysis is thorough and demonstrates

understanding and critical

analysis. Well-reasoned explanations

and comparisons are provided

for all the data structures, scenarios

and parameters. All analysis,

comparisons and conclusions

are supported by empirical evidence

and possibly theoretical complexities.

Well reasoned recommendations

are given.

Very clear, well structured

and accessible report,

an undergraduate

student can pick up the

report and understand it

with no difficulty.

3.75 marks 7.5 marks 3.75 marks

Data generation and

experimental setup are

reasonably designed,

systematic and explained.

There are at least one

obvious missing suggested

scenarios, data structur