辅导 COMP20003 Algorithms 讲解C++编程数据结构

2019.09.07 - 首页 >> Algorithm 算法

COMP20003 Algorithms and Data Structures

Second (Spring) Semester 2019

[Assignment 1]

Taxi & For-Hire Vehicle Trip Dataset

Information Retrieval using Binary Search Trees

Handed out: Friday, 23 of August

Due: 8:00 AM, Monday, 9 of September

Purpose

The purpose of this assignment is for you to:

• Increase your proficiency in C programming, your dexterity with dynamic memory allocation

and your understanding of linked data structures, through programming a dictionary.

• Increase your understanding of how computational complexity can affect the performance of an

algorithm by conducting orderly experiments with your program and comparing the results of

your experimentation with theory.

• Increase your proficiency in using UNIX utilities.

Background

A dictionary is an abstract data type that stores and supports lookup of key, value pairs. For example,

in a telephone directory, the (string) key is a person or company name, and the value is the phone

number. In a student record lookup, the key would be a student ID number and the value would be a

complex structure containing all the other information about the student.

A dictionary can be implemented in C using a number of underlying data structures. Any implementation

must support the operations: makedict a new dictionary; insert a new item (key, value

pair) into a dictionary; search for a key in the dictionary, and return the associated value. Most

dictionaries will also support the operation delete an item.

Your task

In this assignment, you will create a simple instance of a dictionary, and we’ll use it to look up information

about for-hire vehicle trips in New York City.

There are two stages in this project. In the first stage you will code a dictionary in the C programming

language, using a binary search tree as the underlying data structure. You will insert records into this

dictionary from a file, and look up records by key. In the second stage, you will code additional functions

to retrieve information from this dictionary. You will use a Makefile to direct the compilation

of two separate executable programs, one for Stage 1 and one for Stage 2.

In both stages of the assignment, you will report on the number of key comparisons used for search

and analyse what would have been expected theoretically. The report should cover each file used to

initialize the dictionary.

You are not required to implement the delete functionality.

Stage 1 (7 marks)

In Stage 1 of this assignment, your Makefile will direct the compilation to produce an executable

program called dict1. The program dict1 takes two command line arguments: the first argument is

the name of the data file used to build the dictionary; the second argument is the name of the output

file, containing the data located in the searches. The data file consists of an unspecified number of

records, one per line, with the following information:

VendorID - Code to indicate the vendor which produced the record

passenger count - Number of passengers

trip distance - Length of the trip in miles

RatecodeID - Code to represent the fare rate for the trip

store and fwd flag - Indicates whether trip records were stored locally

PULocationID - TLC Taxi Zone where passengers were picked up

DOLocationID - TLC Taxi Zone where passengers were dropped off

payment type - Code to indicate payment type (e.g., cash)

fare amount - Fare for the trip in USD

extra - Extra charges in USD

mta tax - MTA tax in USD

tip amount - Tip in USD

tolls amount - Tolls in USD

improvement surcharge - Improvement surcharge in USD

total amount - Total cost of the trip in USD

PUdatetime - Date/time passengers were picked up

DOdatetime - Date/time passengers were dropped off

trip duration - Duration of the trip in minutes

This data comes from a publicly-available dataset released by the New York City Taxi & Limousine

Commission. More information about the dataset can be found at:

https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page

The field <PUdatetime> is an alphabetic string representing the date and time of the taxi trip in

the format YYYY-MM-DD HH:mm:ss (year-month-day hour:minute:second). The other columns can be

treated simply as the associated <data> field. Build a data structure of strings to save the associated

data collected about each taxi trip. The maximum size that any string can be is 128 characters. Each

string is separated by a comma “,”. This is a standard csv format where the delimiter used is a comma.

The <PUdatetime> field will serve as the dictionary key, so the records will be sorted in temporal

order. Note that because the datetime information is stored in lexicographical order, the values can be

compared as strings (e.g., with strcmp()) to determine which trip is earlier/later. The <data> is the

information sought during lookup.

In this assignment the search keys are not guaranteed to be unique – there are instances where multiple

taxis pick up passengers at exactly the same day and time. You should handle duplicates by

implementing a linked list for items with same key.

For the purposes of this assignment, you may assume that the input data is well-formatted, that the

input file is not empty, and that the maximum length of an input record (a single full line of the csv

file) is 256 characters. This number could help you to determine the buffer size to use when reading

the file.

In this first stage of the assignment, you will:

• Construct a binary search tree to store the information contained in the file specified in the

command line argument. Each record should be stored in a separate Node.

• Search the binary search tree for records, based on their keys. The keys are read in from stdin,

i.e. from the screen.

For testing, it is often convenient to create a file of keys to be searched, one per line, and redirect

the input from this file. Use the UNIX operator < to redirect input from a file.

• Examples of use:

– dict1 datafile outputfile then type in keys; or

– dict1 datafile outputfile < keyfile

• Your program will look up each key and output the information (the data found) to the output

file specified by the second command line parameter. If the key is not found in the tree, you

must output the word NOTFOUND.

The number of key comparisons performed during both successful and unsuccessful lookups

should be written to stdout.

• Remember that the entries in the file do not necessarily have unique keys. Your search must

locate and output all the data found for a matching key.

• Example output:

– output file (information):

2018-12-15 01:49:13 --> VendorID: 1 || passenger count: 1 || trip distance: 1.9

|| RatecodeID: 1.0 || store and fwd flag: 0 || PULocationID: 79 || DOLocationID: 234 ||

payment type: 1 || fare amount: 9.5 || extra: 0.5 || mta tax: 0.5 || tip amount: 2.15

|| tolls amount: 0.0 || improvement surcharge: 0.3 || total amount: 12.95 || DOdatetime:

2018-12-15 02:00:00 || trip duration: 10 ||

2018-12-15 01:49:13 --> VendorID: 1 || passenger count: 1 || trip distance: 0.6

|| RatecodeID: 1.0 || store and fwd flag: 0 || PULocationID: 79 || DOLocationID: 114 ||

payment type: 1 || fare amount: 5.0 || extra: 0.5 || mta tax: 0.5 || tip amount: 1.00

|| tolls amount: 0.0 || improvement surcharge: 0.3 || total amount: 7.35 || DOdatetime:

2018-12-02 01:53:38 || trip duration: 4 ||

1901-11-06 12:03:14 --> NOTFOUND

– stdout (comparisons):

2018-12-15 01:49:13 --> 423

1901-11-06 12:03:14 --> 401

Note that the key is output to both the file and to stdout, for identification purposes. Also note that

the number of comparisons is only output at the end of the search, so there is only one number for

key comparisons per key, even when multiple records have been located for that key.

The format need not be exactly as above. Variations in whitespace/tabs are permitted. The number of

comparisons shown above was made up; do not take it as an example of a correct result.

Stage 2 (2 marks)

In Stage 2, you will code a function which takes a taxi zone ID number as as input and returns to the

output file all of the <PUdatetime> keys from records which match the <PUlocationID>, using

in-order tree traversal. The keys should be output in sorted temporal order (that is, earlier records

should be printed first). If no records with the requested <PUlocationID> exist in the database,

this function should write the string NOTFOUND to the output file. As in Stage 1, the the number of

comparisons made during the search should be written to stdout.

The <PUlocationID> is an unsigned integer between 1 and 265 which indicates where the taxi

picked up passengers. You can find maps of the zones at the dataset website linked above, but you

do not need these maps for the assignment – you can treat the zone as simply an integer. You may

store the <PUlocationID> as a separate field in your struct, or you can check for the matching

<PUlocationID> inside the <data> field. As in Stage 1, you should handle duplicate keys by implementing

a linked list for items with same key. Note that this means there may be more than one

matching <PUlocationID> for a single key. If this is the case, the key should be output multiple

times to reflect the number of matches.

You should compile your code using a Makefile to produce an executable program called dict2.

The program dict2 takes two command line arguments: the first argument is the name of the data

file used to build the dictionary; the second argument is the name of the output file, containing the

data located in the searches. You may reuse your record insertion code from Stage 1 to build the

dictionary from the datafile in Stage 2.

• Examples of use:

– dict2 datafile outputfile then type in location IDs; or

– dict2 datafile outputfile < idsfile

• Example output:

– output file (information):

79 --> 2018-12-08 19:36:57

79 --> 2018-12-08 21:22:08

79 --> 2018-12-15 01:49:13

79 --> 2018-12-23 17:26:42

– stdout (comparisons):

79 --> 1528

The number of comparisons shown above was made up; do not take it as an example of a correct result.

Experimentation (4 marks)

You will run various files through your program to test its accuracy and also to examine the number of

key comparisons used when searching different files. You will report on the key comparisons used by

your Stage 1 dictionary dict1 for various data inputs and the key comparisons used by your Stage 2

dictionary dict2 for various data inputs too. You will compare these results with what you expected

based on theory (big-O) for these algorithms and data structure.

Your experimentation should be systematic, varying the size and characteristics of the dataset files you

use (e.g. sorted or random), and observing how the number of key comparisons varies. Repeating a

test case with different keys and taking the average can be useful.

Some useful UNIX commands for creating test files with different characteristics include sort, sort

-R (man sort for more information on the -R option), and shuf. You can randomize your input

data and pick the first x keys as the lookup keywords.

If you use only keyboard input for searches, it is unlikely that you will be able to generate enough

data to analyze your results. You should familiarize yourself with the powerful UNIX facilities for

redirecting standard input (stdin) and standard output (stdout). You might also find it useful to

familiarize yourself with UNIX pipes ‘|’ and possibly also the UNIX program awk for processing

structured output. For example, if you pipe your output into echo ‘‘abc:def’’ | awk -F ’:’

’{print $1}’, you will output only the first column (abc). In the example, -F specifies the delimiter.

Instead of using echo you can use cat filename.csv | awk -F ’;’ ’{print $1}’

which will print only the first column of the filename.csv file. You can build up a file of numbers of

key comparisons using the shell append operator >>, e.x. your command >> file to append to.

You will write up your findings and submit your results separately through the Turnitin system. You will

describe your results from each stage and also compare these to what you know about the theory of

binary search trees.

Tables and graphs are useful presentation methods. Select only informative data; more is not always

better.

You should present your findings clearly, in light of what you know about the data structures used in

your programs and in light of their known computational complexity. You may find that your results

are what you expected, based on theory. Alternatively, you may find your results do not agree with

theory. In either case, you should state what you expected from the theory, and if there is a discrepancy

you should suggest possible reasons. You might want to discuss space-time trade-offs, if this is

appropriate to your code and data.

You are not constrained to any particular structure in this report, but a useful way to present your

findings might be:

• Introduction: Summary of data structures and inputs.

• Stage 1:

– Data (number of key comparisons)

– Comparison with theory

• Stage 2:

– Data (number of key comparisons)

– Comparison with theory

• Discussion

Implementation Requirements

The following implementation requirements must be adhered to:

• You must code your dictionary in the C programming language.

• You must code your dictionary in a modular way, so that your dictionary implementation could be

used in another program without extensive rewriting or copying. This means that the dictionary

operations are kept together in a separate .c file, with its own header (.h) file, separate from the

main program.

• Your code should be easily extensible to allow for multiple dictionaries. This means that the functions

for insertion, search, and deletion take as arguments not only the item being inserted or a

key for searching and deleting, but also a pointer to a particular dictionary, e.g. insert(dict,

item).

• Your program should store strings in a space-efficient manner. If you are using malloc() to

create the space for a string, remember to allow space for the final end of string ‘\0’ (NULL).

• A Makefile is not provided for you. The Makefile should direct the compilation of two

separate programs: dict1 and dict2. To use the Makefile, make sure it is in the same directory

of your code, and type make dict1 to make the dictionary for Stage 1 and make dict2 to

make the dictionary for Stage 2. You must submit your makefile with your assignment. Hint: If

you havent used make before, try it on simple programs first. If it doesn’t work, read the error

messages carefully. A common problem in compiling multifile executables is in the included

header files. Note also that the whitespace before the command is a tab, and not multiple

spaces. It is not a good idea to code your program as a single file and then try to break it down

into multiple files. Start by using multiple files, with minimal content, and make sure they are

communicating with each other before starting more serious coding.

Data

The data files are provided at /home/shared/assg1/datafiles/ on JupyterHub. The data format

is as specified above in Stage 1.

No attempt has been made to remove or prevent duplicate keys in the original files, so you should

expect duplicate keys. Our script only formatted the data correctly making sure it complies with a csv

standard specification.

Resources: Programming Style (2 Marks)

Two locally-written papers containing useful guidelines on coding style and structure can be found

on the LMS Resources → Project Coding Guidelines, by Peter Schachte, and below and adapted version

of the LMS Resources → C Programming Style, written for Engineering Computation COMP20005 by

Aidan Nagorcka-Smith. Be aware that your programming style will be judged with 2 marks.

1 /** ***********************

2 * C Programming S t yl e f o r En ginee rin g Computation

3 * C rea ted by Aidan Nagorcka−Smith ( aidann@student . unimelb . edu . au) 13/03/2011

4 * D e f i n i t i o n s and i n cl u d e s

5 * D e f i n i t i o n s a re i n UPPER CASE

6 * I n cl u d e s go be f o re d e f i n i t i o n s

7 * Space between i n cl u de s , d e f i n i t i o n s and the main f u n c ti o n .

8 * Use d e f i n i t i o n s f o r any c o n s t a n t s i n your program , do no t j u s t w ri t e them

9 * i n .

10 *

11 * Tabs may be s e t t o 4−s p a ce s or 8−space s , depending on your e d i t o r . The code

12 * Below i s ``gnu ' ' s t y l e . I f your e d i t o r has ``bsd ' ' i t w i l l f oll ow the 8−space

13 * s t y l e . Both a re ve ry s t and a rd .

20 #i n cl u d e <s t d i o . h>

21 #i n cl u d e <s t d l i b . h>

22 #d e fi n e MAX STRING SIZE 1000

23 #d e fi n e DEBUG 0

24 i n t main( i n t argc , cha r **argv) {

31 /* D e f i n i t i o n s and i n cl u d e s a re mixed up */

32 #i n cl u d e <s t d l i b . h>

33 #d e fi n e MAX STING SIZE 1000

34 /* D e f i n i t i o n s a re given names l i k e v a r i a b l e s */

35 #d e fi n e debug 0

36 #i n cl u d e <s t d i o . h>

37 /* No s p a ci n g between i n cl u de s , d e f i n i t i o n s and main f u n c ti o n */

38 i n t main( i n t argc , cha r **argv) {

41 /** *****************************

42 * V a ri a bl e s

43 * Give them u s e f ul l owe r c a se names or camelCase . Ei t h e r i s fi n e ,

44 * as long as you a re c o n s i s t e n t and apply always the same s t y l e .

45 * I n i t i a l i s e them t o some thing t h a t makes sen se .

46 */

48 /**

49 * GOOD: l owe r c a se

50 */

52 i n t main( i n t argc , cha r **argv) {

54 i n t i = 0 ;

55 i n t num_fifties = 0 ;

56 i n t num_twenties = 0 ;

57 i n t num_tens = 0 ;

59 . . .

60 /**

61 * GOOD: camelCase

62 */

64 i n t main( i n t argc , cha r **argv) {

66 i n t i = 0 ;

67 i n t numFifties = 0 ;

68 i n t numTwenties = 0 ;

69 i n t numTens = 0 ;

71 . . .

72 /**

73 * BAD:

74 */

76 i n t main( i n t argc , cha r **argv) {

78 /* V a ri a bl e no t i n i t i a l i s e d − c au se s a bug because we didn ' t remember t o

79 * s e t i t be f o re the loop */

80 i n t i;

81 /* V a ri a bl e i n a l l caps − we ' l l ge t con fu sed between t h i s and c o n s t a n t s

82 */

83 i n t NUM_FIFTIES = 0 ;

84 /* Ove rly a b b revi a te d v a r i a b l e names make t hi n g s hard . */

85 i n t nt = 0

87 while (i < 10) {

88 . . .

89 i++;

90 }

92 . . .

94 /** ********************

95 * Spacing :

96 * Space i n t e l l i g e n t l y , v e r t i c a l l y t o group bl o c k s o f code t h a t a re doing a

97 * s p e c i f i c ope r a ti on , or t o s e p a r a t e v a r i a b l e d e c l a r a t i o n s from o t he r code .

98 * One t ab o f i n d e n t a ti o n wi t hi n e i t h e r a f u n c ti o n or a loop .

99 * Spaces a f t e r commas.

100 * Space between ) and { .

101 * No space between the ** and the a rgv i n the d e f i n i t i o n o f the main

102 * f u n c ti o n .

103 * When d e cl a ri n g a p oi n t e r v a r i a b l e or argument , you may pl a ce the a s t e r i s k

104 * a dj a c e n t t o e i t h e r the type or t o the v a r i a b l e name .

105 * Li n e s a t most 80 c h a r a c t e r s long .

106 * Cl o si n g b r ace goes on i t s own l i n e

107 */

108

109 /**

110 * GOOD:

111 */

112

113 i n t main( i n t argc , cha r **argv) {

114

115 i n t i = 0 ;

116

117 f o r (i = 100 ; i >= 0 ; i−−) {

118 i f (i > 0) {

119 printf( ”%d b o t t l e s o f beer , t a ke one down and p a s s i t around , ”

120 ” %d b o t t l e s o f bee r . \ n” , i , i − 1) ;

121 } e l s e {

122 printf( ”%d b o t t l e s o f beer , t a ke one down and p a s s i t around . ”

123 ” We ' re empty . \ n” , i) ;

134 /* No space a f t e r commas

135 * Space between the ** and a rgv i n the main f u n c ti o n d e f i n i t i o n

136 * No space between the ) and { a t the s t a r t o f a f u n c ti o n */

137 i n t main( i n t argc , cha r ** argv) {

138 i n t i = 0 ;

139 /* No space between v a r i a b l e d e c l a r a t i o n s and the r e s t o f the f u n c ti o n .

140 * No s p a ce s around the boolean o p e r a t o r s */

141 f o r (i=100;i>=0;i−−) {

142 /* No i n d e n t a ti o n */

143 i f (i > 0) {

144 /* Line too long */

145 printf( ”%d b o t t l e s o f beer , t a ke one down and p a s s i t around , %d

146 b o t t l e s o f bee r . \ n” , i , i − 1) ;

147 } e l s e {

148 /* Spacing f o r no good re a son . */

149

150 printf( ”%d b o t t l e s o f beer , t a ke one down and p a s s i t around . ”

151 ” We ' re empty . \ n” , i) ;

152

153 }

154 }

155 /* Cl o si n g b r ace no t on i t s own l i n e */

156 r e t u r n 0 ;}

157

158 /** ****************

159 * B r ace s :

160 * Opening b r a ce s go on the same l i n e as the loop or f u n c ti o n name

161 * Cl o si n g b r a ce s go on t h e i r own l i n e

162 * Cl o si n g b r a ce s go a t the same i n d e n t a ti o n l e v e l as the t hi n g they a re

163 * c l o s i n g

164 */

165

166 /**

167 * GOOD:

168 */

169

170 i n t main( i n t argc , cha r **argv) {

171

172 . . .

173

174 f o r ( . . . ) {

175 . . .

176 }

177

178 r e t u r n 0 ;

179 }

180

181 /**

182 * BAD:

183 */

184

185 i n t main( i n t argc , cha r **argv) {

186

187 . . .

188

189 /* Opening b r ace on a d i f f e r e n t l i n e t o the f o r loop open */

190 f o r ( . . . )

191 {

192 . . .

193 /* Cl o si n g b r ace a t a d i f f e r e n t i n d e n t a ti o n t o the t hi n g i t ' s

194 c l o s i n g

195 */

196 }

197

198 /* Cl o si n g b r ace no t on i t s own l i n e . */

199 r e t u r n 0 ;}

200

201 /** **************

202 * Commenting :

203 * Each program should have a comment e x pl ai ni n g what i t does and who c r e a t e d

204 * i t .

205 * Al s o comment how t o run the program , i n cl u di n g o p ti o n al command l i n e

206 * pa rame te r s .

207 * Any i n t e r e s t i n g code should have a comment t o e x pl ai n i t s e l f .

208 * We should no t comment obviou s t hi n g s − w ri t e code t h a t documents i t s e l f

209 */

210

211 /**

212 * GOOD:

213 */

214

215 /* change . c

216 *

217 * C rea ted by Aidan Nagorcka−Smith ( aidann@student . unimelb . edu . au)

218 13/03/2011

219 *

220 * P r i n t the number o f each c oin t h a t would be needed t o make up some

221 change

222 * t h a t i s i n p u t by the u se r

223 *

224 * To run the program type :

225 * . / c oi n s −−num coins 5 −−s h a p e c oi n s t r a p e z oi d −−ou tpu t bl a bl a . t x t

226 *

227 * To see a l l the i n p u t parame ters , type :

228 * . / c oi n s −−help

229 * Op tion s : :

230 * −−help Show help message

231 * −−num coins a rg Inpu t number o f c oi n s

232 * −−s h a p e c oi n s a rg Inpu t c oi n s shape

233 * −−bound a rg (=1) Max bound on xxx , d e f a u l t v alue 1

234 * −−ou tpu t a rg Output s o l u t i o n f i l e

235 *

236 */

237

238 i n t main( i n t argc , cha r **argv) {

239

240 i n t input_change = 0 ;

241

242 printf( ” Pl e a s e i n p u t the v alue o f the change (0−99 c e n t s

243 i n c l u s i v e ) :\n” ) ;

244 scanf( ”%d” , &input_change) ;

245 printf( ” \n” ) ;

246

247 // V ali d change v al u e s a re 0−99 i n c l u s i v e .

248 i f (input_change < 0 | | input_change > 99) {

249 printf( ” Inpu t no t i n the range 0−99.\n” )

250 }

251

252 . . .

253

254 /**

255 * BAD:

256 */

257

258 /* No e x pl a n a ti o n o f what the program i s doing */

259 i n t main( i n t argc , cha r **argv) {

260

261 /* Commenting obviou s t hi n g s */

262 /* C re a te a i n t v a r i a b l e c a l l e d inpu t ch an ge t o s t o r e the i n p u t from

263 the

264 * u se r . */

265 i n t input_change;

266

267 . . .

268

269 /** ****************

270 * Code s t r u c t u r e :

271 * F a i l f a s t − i n p u t chec k s should happen f i r s t , then do the compu ta tion .

272 * S t r u c t u r e the code so t h a t a l l e r r o r h andlin g happens i n an ea sy t o read

273 * l o c a t i o n

274 */

275

276 /**

277 * GOOD:

278 */

279 i f (input_is_bad) {

280 printf( ” E r r o r : Inpu t was no t v a l i d . E x i t i n g . \ n” ) ;

281 exit(EXIT_FAILURE) ;

282 }

283

284 /* Do compu ta tion s here */

285 . . .

286

287 /**

288 * BAD:

289 */

290

291 i f (input_is_good) {

292 /* l o t s o f compu ta tion here , pushing the e l s e p a r t o f f the s c ree n .

293 */

294 . . .

295 } e l s e {

296 fprintf(stderr , ” E r r o r : Inpu t was no t v a l i d . E x i t i n g . \ n” ) ;

297 exit(EXIT_FAILURE) ;

298 }

Additional Support

Your tutors will be available to help with your assignment during the scheduled workshop times. Questions

related to the assignment may be posted on the Piazza forum, using the folder tag assignment1

for new posts. You should feel free to answer other students’ questions if you are confident of your

skills.

A tutor will check the Discussion Forum regularly, and answer some questions, but be aware that for

some questions you will just need to use your judgment and document your thinking. For example, a

question like, How much data should I use for the experiments?, will not be answered; you must try

out different data and see what makes sense.

In this subject, we’ll be supporting the shared JupyterHub system, its terminal and file editor. Your

final program must compile and run on the shared JupyterHub instance.

Submission

You will need to make two submissions for this assignment:

• Your C code files (including your Makefile) will be submitted through the LMS page for this

subject: Assignments → Assignment 1 → Assignment 1: Code.

• Your experiments report file will be submitted through the LMS page for this subject: Assignments

→ Assignment 1 → Assignment 1: Experimentation. This file can be of any format, e.g. .pdf, text

or other.

Program files submitted (Code)

Submit the program files for your assignment and your Makefile.

If you wish to submit any scripts or code used to generate input data, you may, although this is not

required. Just be sure to submit all your files at the same time.

Your programs must compile and run correctly on the shared JupyterHub instance. You may have developed

your program in another environment, but it still must run on the JupyterHub at submission

time. For this reason, and because there are often small, but significant, differences between compilers,

it is suggested that if you are working in a different environment, you upload and test your code

on the shared JupyterHub instance at reasonably frequent intervals.

A common reason for programs not to compile is that a file has been inadvertently omitted from the

submission. Please check your submission, and resubmit all files if necessary.

Experiment file submitted using Turnitin

As noted above, your experimental work will be submitted through the LMS, via the Turnitin system.

Go to the LMS page for this subject: Assignments → Assignment 1 → Assignment 1: Experimentation

and follow the prompts.

Your file can be in any format. Plain text or .pdf are recommended, but other formats will be accepted.

It is expected that your experimental work will be in a single file, but multiple files can be

accepted. Add your username to the top of your experiments file.

Please do not submit large data files. There is no need to query every key on the dictionary.

Assessment

There are a total of 15 marks given for this assignment, 7 marks for Stage 1, 2 marks for Stage 2, and

4 marks for the separately submitted Experimentation Stage. 2 marks will be given based on your

C programming style.

Your C program will be marked on the basis of accuracy, readability, and good C programming structure,

safety and style, including documentation. Safety refers to checking whether opening a file

returns something, whether mallocs do their job, etc. The documentation should explain all major

design decisions, and should be formatted so that it does not interfere with reading the code. As much

as possible, try to make your code self-documenting, by choosing descriptive variable names.

Your experimentation will be marked on the basis of orderliness and thoroughness of experimentation,

comparison of your results with theory, and thoughtful discussion.

Plagiarism

This is an individual assignment. The work must be your own.

While you may discuss your program development, coding problems and experimentation with your

classmates, you must not share files, as this is considered plagiarism.

If you refer to published work in the discussion of your experiments, be sure to include a citation to

the publication or the web link.

Borrowing of someone elses code without acknowledgment is plagiarism. Plagiarism is considered

a serious offense at the University of Melbourne. You should read the University code on Academic

honesty and details on plagiarism. Make sure you are not plagiarizing, intentionally or unintentionally.

You are also advised that there will be a C programming component (on paper, not on a computer) in

the final examination. Students who do not program their own assignments will be at a disadvantage

for this part of the examination.

Administrative issues

When is late? What do I do if I am late? The due date and time are printed on the front of this

document. The lateness policy is on the handout provided at the first lecture and also available on the

subject LMS page. If you decide to make a late submission, you should send an email directly to the

lecturer as soon as possible and he will provide instructions for making a late submission.

What are the marks and the marking criteria Recall that this project is worth 15% of your final

score. There is also a hurdle requirement: you must earn at least 15 marks out of a subtotal of 30 for

the projects to pass this subject.

Finally Despite all these stern words, we are here to help! There is information about getting help in

this subject on the LMS pages. Frequently asked questions about the project will be answered in the

LMS discussion group.