辅导COMP202 C++、辅导Data File Lab
- 首页 >> C/C++编程Data File Lab – Assignment 1
COMP202
This lab is the first of twoassignments in COMP202.
Commences: Week 1.
Progress: 5pm, Mondays ofweeks 4, 5, 6, 7.
Due: 5pm, first Monday of thebreak – 16 September 2019
Value: 20% (16%for task, 4% weekly progress)
1. Overview of the Lab
Data files exist in variousformats. In Unix, text files are common for simple data, but large data filesare stored as binary data. In this lab, you will be developing programs toread, write, modify, and reformat the data in binary data files. The labconsists of a sequence of stages which build on each other. The first threestages develop your skills. In the final stage you will reverse engineer a datafile using your knowledge of data representations. You have the choice betweenan easier final stage (stage 4) that can earn you at most 3 marks for thestage, or a more difficult final stage (stage 5) worth up to 4 marks for thestage. You may attempt both stages 4 and 5, but only the maximum of the twomarks will count towards your total.
The marking outline is:
Section | Value |
Stage 1 | 3 |
Stage 2 | 3 |
Stage 3 | 3 |
Stage 4 (max 3 marks) or | 4 |
stage 5 (max 4 marks) |
|
Code style | 3 |
Progress | 4 |
Total | 20 |
LearningOutcomes
This lab will involve you indeveloping the following specific skills and capabilities.
· Able to write programs that use Cdata structures, pointers and arrays.
· Able to read and write binary datafiles, and write data in text format.
· Able to convert between differentdata representations.
· Able to use malloc and free to constructdata structures using the heap.
· Able to implement simple commandline parameters.
· Able to interpret and recognisebinary data representations.
· Research Unix library and system calls
2. Fetching your lab
The lab files are accessedthrough the lab command which can be found at
/home/unit/group/comp202/lab
Thereis no Unix man page for the lab command (it is not a Unix system command) butthere is documentation on iLearn and if you don’t give it any command-lineparameters or options then it will print out some brief documentation itself (asimilar feature is common in many Unix programs). To see how this works, trythe following command (where the $ symbol represents the Unix command-lineprompt – you should type the command that is underlined in this example).
$/home/unit/group/comp202/lab
The option –g is used to get a lab stage. For example, to getlab 1 stage 1, do:
$/home/unit/group/comp202/lab–g 1.1
Forstage 2, the option would be –g 1.2instead. Please see the Lab CommandManual in iLearn for more information about the lab command, includingoptions for submitting assignments, getting marking reports, checking due datesand claiming your free extension days. Also, you can set up your Unix accountso that you can abbreviate the command and just type “lab”instead of the full path name “/home/unit/group/comp202/lab”.For the rest of this document, we will use the abbreviated name.
The lab get command downloadsyour lab data as a tar file. For stage 1, the tar file is stage1.tar.
Taris an archive utility (like zip) – it stores many files packed into one file.Use tar to extract the contents of thisfile. You can read all about tar in the Unix man page
$ man 1tar
Here is the command toextract the contents of stage1.tar.
$ tar xvfstage1.tar
This will create a directorycalled stage1 and put the downloaded filesin that directory.
3. Feedback during theassignment, submission and marking
Inthis assignment, you can submit your code as often as you like, and receiveimmediate feedback and marks. There are rewards (progress marks) for workingconsistently throughout the assignment period and achieving stages of work bytheir due dates. Most of your final mark will be computed from the results ofthe automarker. A small number of marks are awarded for code style which ismanually marked after the assignment closes.
The maximummark for the assignment is 20 marks. Of those marks, 13 marks are forachievements in the various stages, 4 marks are for progress and 3 marks arefor code style.
A. Introduction to COMP202 automarking
This is the first of two labassignments in COMP202. In both lab assignments, an automarker will track yourprogress and provide feedback to you. This is more than just telling you yourmark – it is a feedback mechanism designed to help you as you work through theassignment. Firstly, the feedback is immediate, so when you think you havesolved a problem you can submit your revised solution and see immediatelywhether it has enabled you to pass the automarker tests. Secondly,
theautomarker provides a detailed breakdown of your mark, which can help youisolate specific problems (such as a memory leak when using malloc and free).Thirdly, the automarker sometimes provides specific hints to help youunderstand what you need to address – such as identifying which columns of yourdata file are incorrect.
Youcannot rely only on the automarker, however. In this assignment, you will beprovided with test data files, and you can and should compile your programyourself and run it on the test data files, examining the output yourself andidentifying errors in your code. The automarker does not replace goodold-fashioned debugging – one of the essential skills for all programmers.
B. Individual work andinformation resources
Thestages of the lab are based on data files that will be provided to you. Eachstudent will have their own specific data files to work with, with their ownunique data format.
Thislab must be your own work. However, you may use resources on the Internet toobtain general information including information about the C language andlibraries, information about binary and text data formats, and informationabout the operating system. If you obtain useful information from the Internet,you must include comments at the relevant points in your code acknowledging thesource of the information (URL) and briefly describing the key idea(s) that youare using. (Exception: information from the Unix manual pages does not requirecitation in your program).
TheUnix manual pages are available online on ash and iceberg – use the man command. You can also find Unix manual pages onlinethrough Google. For example, to find out about the printflibrary call, use the command “man 3 printf”or Google “man printf” and to find out about the directory listing command ‘ls’, use the command “manls” or Google “man ls”. However, you should be cautious aboutusing information found online because sometimes there are differences betweendifferent Unix systems and our systems may not behave exactly the same asdescribed in some online documentation.
The manual pages on thesystem (man command) are divided into sections:
1. Systemcommands such as ls, wc,etc.
2. Unixsystem calls such as read(), open(), etc.
3. Unixlibrary such as printf(), fopen(),etc.
4. Sections 4-8 contain other information.
Formore information on the man command, use the command “manman” to read the manual pages about the man command.
C. Submitting your lab solution – achievement marks[13 marks]
Yourlab solution can be submitted using the labcommand. The option –s is used to submit a solution to alab stage. After the option, list all the files that you want to submit. Eachtime you submit, it is treated as a fresh submission, so you must list all thefiles that you want to submit every time. (If you find that tedious, learnabout wildcards in the bash shell.) For example:
$ lab–s1.1 stage1.c sub.c defs.h
The labutility sends your submitted files to a server which compiles the C filestogether into a program, runs it, and tests that it works correctly for yourparticular lab assignment. The server records information about your submissionand also sends back information to you through the labcommand.
Youcan submit as many times as you like. As a matter of personal achievement, youshould aim to achieve a really good score on your initial submit, havingchecked that your program compiles without errors and performs correctly on theprovided sample data files. However, if there are problems identified by theauto marker, you can resubmit without penalty.
You must download each stage before youattempt to submit a solution to that stage. Further, you need to download eachstage because the download provides you with the input and output data filesthat you need in order to test your program yourself.
The marksawarded by the automarker for each stage of the assignment are called the achievementmarks for that stage.
D. Progress marks [4 marks]
Each lab assignment includes marksthat are awarded for progress on the task each week. The lab assignments are tobe done both during lab sessions (with the assistance of lab supervisors) andin your own time. Each week that the lab is out, you earn a progress mark ifyou achieve the specified milestone by 5pm on the specified date. Each milestone is achieving a mark of at least2.0/3.0 in a specific assignment stage. You can earn the progress marks early,but you cannot earn them late.
Ifyou do not achieve the milestone for a progress mark by the specified date thenyou lose that week’s progress mark and the milestone “slips” and becomes due onthe next progress date. All the later milestones also slip back by one week,but the last milestone is lost. If you achieve the slipped milestone by the newprogress date then you receive the progress mark for that date, but you havelost the progress mark for the missed date and you cannot make it up later.
The Milestones
· Monday of Week 4: Stage 1 achievement mark of atleast 2.0/3.0
· Monday of Week 5: Stage 2 achievement mark of atleast 2.0/3.0
· Monday of Week 6: Stage 3 achievement mark of atleast 2.0/3.0
· Monday of Week 7: Stage 4 or 5 achievement mark ofat least 2.0/3.0 or 2.0/4.0
· Monday of the first week of the break: Lab closes
E. Code Style [3 marks]
We will markone of your submitted programs for code style. We recommend that you adhere tocode style guidelines for all your programs. See the documents Some Important Comments on CodeStyle and Systems Programming Style.
Each time you submit a lab 1 solution using the lab command, you will be notified which version of yourprogram will be marked for style. The decision is made by an algorithm (seebelow). We prefer to mark later stages of the lab where your programs will bemore sophisticated. However, we prefer not to mark programs where you have notyet solved the stage.
The stage selected for stylemarking is the latest stage for which you achieved a mark ofatleast 2.0. For example, if youearn 3.0 marks in stage 1, 2.8 marks in stage 2, 2.1 marks in stage 3 and 1.9marks in stage 4, we will mark your stage 3 submission for style. We will markthe last successful or forced submission to that stage. Whenever you submit,the lab command clearly tells you whatstage will be marked for style, and whether it is theprogram that you just submitted that will be marked forstyle.
All your programs should be writtenwith good style. If you write with consistently good style then you won’t becaught out with a poor style mark if (for example) you manage to achieve 2.0marks in stage 4 at the last minute and have no time left to improve the styleof your program!
4. Detailed informationabout marking
The labcommand computes your marks and records them on the server. Normally, the markrecorded at the end of the assignment will be your final mark for theachievement and progress parts of the assignment. The code style will bemanually marks later, and that mark will be uploaded to ilearn. Once theassignment has closed for all students, the automarker marks can also beuploaded to ilearn. All marks are computed to 1 decimal place as displayed inthe marking reports that you receive from the labcommand.
A. Detailed marking guidesfor each stage
Whenyou download and extract the files for a stage you will find a file called marking-guide.txt in the extracted files. This textfile contains a detailed marking rubric for the stage.The auto marker uses this rubric to mark your submission forthe stage. The marking guide includes detailed notes that describe how eachmark is calculated and what is being marked. In particular, the marking guidewill tell you whether each item is marked proportionally, by error count, or asa
Boolean (see “Types ofAchievement Marks”, below).
Inlater stages, some auto marker checks are thresholds. Threshold conditions maynot contribute marks to your total, but are required for your program to beeligible to earn other marks. The marking report will display if any thresholdhas failed, and it will indicate which marks are suppressed due to the failedthreshold. Thresholds and marks that require thresholds are indicated in themarking guide marking-guide.txt. The ideabehind threshold marks is that you need to have a program which meets the basicrequirements before awarding you marks for more sophisticated behaviour of yourprogram.
Themarking-guide.txt file isgenerated by the server from configuration information that is part of theautomarking process. The marking guide itself is the same for all students.However, generating it in the server and delivering it to you in this wayensures that the marking guide is consistent with the server’s marking system.
B. Types of Achievement Marks
There are three types ofachievement marks, as explained in the marking-guide.txtfiles.
· Ordinary marks are proportional,computed as a percentage and scaled to the maximum mark. For example, there isa mark awarded for the correctness of your output file, that is computed fromthe proportion of correct rows and the proportion of correct columns in theoutput file. After scaling according to the maximum mark, the mark is rounded down to a multiple of 0.1. For example,if the percentage mark is 98% and the mark is scaled to a maximum of 1.0, thenthe rounded mark would be 0.9 (not 1.0). Rounding down ensures that full marksare only awarded for perfect scores of 100% on the particular marking item.
· Error count marks deduct a fixedamount (usually 0.1 or 0.15) from the maximum mark for each error that iscounted, until the mark reaches 0.0. Error count marks are typically used forerror checking such as checking your structure definition – a fixed amount isdeducted for each error found in the definition, and the automarker gives youare hint identifying the errors.
· Boolean marks are used for testconditions which are either success or failure. The mark is awarded either asthe full mark or as 0.0. The full mark is awarded when the test condition issatisfied, and 0.0 is awarded when the test fails. Boolean marks typically havesmall values (such as 0.1 or 0.2) and are awarded for specific tests such asensuring that your program exits without an error status in normal operation,or that there are no memory leaks.
C. Maximising Your Mark
Here are some hints to getthe most marks in this assignment.
1. Work on this assignment every weekuntil the deadline. Don’t wait until you’ve finished the assignments for yourother units before you start this assignment. This assignment is intended to beworked on over a period of 5 weeks and almost certainly cannot be completed ina few days.
2. Achieve at least 2.0 marks in eachstage of this assignment by the progress mark deadline. Progress marks rewardyou for consistently working on the assignment. You show that you are workingconsistently by achieving a mark of at least 2.0/3.0 for the next stage of theassignment each week.
3. Start thinking about the nextstage, and start working on it, once you have a reasonably good mark (at least2.0) for the earlier stages. You may have an obscure bug that costs you 0.1 or0.2 marks in the current stage, but you can earn more marks by working on thenext stage than by spending all your time trying to perfect your current stagescore.
4. Do your own testing as well as usingthe hints provided by the automarker. The automarker can give you a generalidea of your problems, but running your program yourself allows you to examinethe particular mistakes that you are making.
The Stages of Assignment 1
Stage 1: Initialising aC struct and printing it out as text [3 marks]
In thisstage you will declare a C data structure, create an instance of it andstatically initialise it (declare it as a static or global variable andinitialise it in one statement using braces). You will then print out the instance.This stage develops the following specific skills:
· Declaring a C struct.
· Initialising a C struct
· Printing various data types using printf
Note:Do not use bit fields in your struct. All the data types that are specified correspond toordinary C data types.
Note:The automarker checks your structdefinition against expected ways of writing it and awards marks forcorrectness. Field names must be exactly correct. Types should be the common Clanguage data types as defined in ANSI C.
Resources
The following documents oniLearn may be helpful:
· Compile, Run,Make C Programs on Linux
· C ProgrammingNotes for Data File Lab
Your downloaded stage1.tar file contains the following files:
· filestruct-description.txt:A simple description of the fields that are in yourstruct – their names and type description.
· initialisation-specification.txt:Specifies the initial value for each field ofyour struct. The initial value has to be formatted in aspecific way in your source code – this may mean that you have to convert onerepresentation to another. See the lab note Decimal,Binary, Octal and Hex. Note: It makes no difference to the data that isstored insidethe computer whetheryou initialise the field with decimal or the equivalent hexadecimal or octal.However, as an exercise, we require you to make the appropriate typeconversions and the automarker will check your code.
· expected-output.txt:Stage 1 expected output file. Use the example in this file towork out what formatting options to use in printf.
Useful Unix commands
You might find the followingUnix system commands helpful.
· cat
· diff
Task
Write a Cprogram that declares your particular data structure as described in the Cstructure description file. Statically1initialise an instance of the data structure to the initial values as specified
1 Static initialisationmeans to initialise the whole data structure as part of its declaration, wherethe field values are listed inside curly braces. Don’t write separate lines ofcode that initialise each member of the struct. The automarker looksspecifically for the required type of initialisation.
inthe file – use the data formats as specified in the file such as hexadecimal,decimal or octal constants. In the main program, print out the data structureusing printf formatting to make itexactly match the provided sample output file. Note that you may need to usevarious formatting options with printfto control the appearance of the output. You are expected to read about printf and work out how to format the data so that it exactlymatches the expected output.
Submit your program forautomatic assessment using the labcommand. Your program style may be assessed according to the coding standardsin the documents Some Important Commentson CodeStyle and Systems Programming Style which areavailable on iLearn.
Stage 2: Reading abinary data file and printing it out [3 marks]
In thisstage you will read a binary data file in a known format, storing theinformation into instances of a C data structure which you will then print out.This stage develops the following specific skills:
· Reading binary data
· Opening and closing files
· Printing various data types using printf.
· Using a simple command-line parameter.
Resources
· filestruct-description.txt:Describes the members of the C data struct whichcorrespond to fields in the records of the data file.
· input-*.bin: Samplebinary input files.
· output-*.txt: Sampletext output files corresponding to the input files.
Useful Unix commands
You might find the followingUnix system commands helpful.
· “more”or “less”
· diff
· od
Task
Writea C program that reads a file of binary data records as described in thestructure description file. The program will obtain the file name as a commandline parameter (see below). The program will read and print all the records ina binary data file where each record has the format described in filestruct-description.txt. You already developed code toprint out a single record instage1, so the focus of this stage is reading a binary data file into memory.
Theoutput formatting requirements for this stage are the same as in stage 1.However, it is possible that you mayneed to modify your record printing code – it could be that your printf call worked correctly for the single initialisedrecord in stage 1 but it may not be correct for all the data records in thefiles. You should check the output against the expected output using diff, and improve your printfstatement in whatever way is needed to get the correctoutput.
Yourprogram must accept one command-line parameter which is the name of the inputfile. The automarker will run your program many times, each time with adifferent input file name as the parameter, and it will compare the output ofeach run with the expected output. You should do the same thing for your owntesting.
The fieldsof the records are stored using the types specified in the data filedescription. The fields are stored packed next to each other in the data file.You cannot read the entire record directly into a C struct in one call becauseC inserts additional unused space between some of the fields in the struct (this is called alignment padding; we will discuss itlater in COMP202 lectures)2. You mustread the data record one field at a time. It issuggested to use fread to read each field.
Eachrecord that you read should be printed out as text. Your output should exactlymatch the sample output files.
Rememberthat coding style is important: use good modularisation, and use header filesappropriately. Your program’s style may be assessed according to the codingstandards in the documents Some ImportantComments on Code Style and SystemsProgramming Style.
Submityour program for marking using the labcommand. We may use additional data files for testing, including files that arelarger than the samples provided to you.
Stage 3: Sorting abinary data file [3 marks]
Inthis stage you will sort files of binary data in a known format. This stagedevelops the following specific skills:
· Reading and writing binary data files.
· Opening and closing files.
· Working with pointers to structures.
· Memory allocation, dynamically sizing an array.
· Using system library routines (specifically, asystem library sort routine).
· Writing code to compare structures with a lexicalsort order.
· Using a function pointer in C.
Resources
· filestruct-description.txt:Describes the members of the C data structure whichcorrespond to fields in the records of the data file.
· filestruct-sort.txt:Specifies the sorting order.
· input-*.bin: Samplebinary input files.
· output-*.bin: Samplebinary output files corresponding to the input files. The outputfiles contain the same data as the input files, but therecords are sorted.
Useful Unix commands
You might find the followingUnix system commands helpful.
· od
· cmp
Task
Modify your program fromstage 2 so that it reads the input file (parameter 1), storing all the recordsinto a dynamic array in memory. The program should then sort the data recordsand write the output file (parameter 2) in sorted order. The automarker willtest your program by running it many
2 The C compiler has a special way of creating structs that arepacked, but this is a non-standard extension and the automarker does not acceptprograms that use it.
times, eachtime with a different input file name and an output file name, and it will thencompare your output file with the expected output file.
Use the Linux library sort routine qsort to perform thesorting. Use the Unix manual (section 3) to find out how to call the qsort library routine.Hint: you must write a comparison routine that can compare two structuresaccording to the sort order specified for your lab.
Yourprogram will need to store all the records in memory in order to sort them. Theprogram will allocate a dynamic array of structs (or some other datastructure), and read the data file into the array. You do not know how largethe file may be, so you must accommodate different file sizes. Here are twopossible approaches (there are others).
1. Dynamic sized array: Allocate aninitial array of some size (e.g. 100 records) and then if (while reading thefile) you find that the array is not large enough then use realloc to increase (e.g. double) the size of it. Realloc allocates a new larger array in memory and copies thedata from the existing array to the new larger array, before freeing theoriginal array. Repeatedly doubling the size allows you to accommodatearbitrarily large data files without copying the data too many times. See the Unixmanual pages for malloc and realloc.
2. Compute the number of records fromthe file size: This is a systems approach that will require some reading tofind out how to achieve. There is a system call statthat can tell you the total number of bytes in a file. There are also otherways to find out how many bytes are in a file but you should NOT read theentire file just to find out how big it is! Your file description gives you theinformation about how long each record is, so you can compute the number ofrecords in the file from the number of bytes. You can then allocate an array ofstruct to the exact correct sizeusing malloc. See the Unix manual pagesfor stat andmalloc.
Aftersorting the records, write them out in binary form. It is suggested to use fwrite to write each field individually.
Students aiming for D or HD grade: Itis more efficient to sort an array of pointers to thestructs than to sort the structs themselves, because it is cheaperto move pointers than to move entire records. Therefore, top marks are awardedfor sorting pointers. However, it is suggested to first sort the array itselfand then implement pointer sorting if you have time.
The output files mustexactly match the sample output files provided.
Rememberthat coding style is important: use good modularisation, and use header filesappropriately. Your program’s style may be assessed according to the codingstandards in the documents Some ImportantComments on Code Style and SystemsProgramming Style.
Submityour program for auto marking. We will test your program on additional samplefiles that have not been provided to you.
Lexical sorting
Therecords are to be sorted according to the values in various fields of therecords. The sort order specification lists the fields that should beconsidered, and for each field it specifies whether that field is sorted inascending or descending order. If you are familiar with sorting in Excel, thisworks similarly.
Forexample, consider the following simple text file, shown with line numbers. Thefirst line is the header line that gives the name of each member of the recordstructure.
1. horse, cat, paper, train
2. 3, word, 1.4, 1
3. 4, wood, 1.4, 1
4. 1, word, 1.7, 0
5. 2, word, 1.5, 0
Suppose that this file issorted in the following way: First, by train in ascending order, then by cat indescending order, then by paper ascending and finally by horse descending. Thedata to be sorted is lines 2 through 5. Examining the last column (train),lines 4 and 5 have the value 0 whereas lines 2 and 3 have the value 1.Therefore, lines 4 and 5 will be sorted before lines 2 and 3. Now, comparinglines 2 and 3, which have the same value for train, the values for cat aredifferent. Sorting these records by the field cat in descending order, “wood”should come after “word” because the sort is reverse of alphabetical, so record3 is to be sorted after record 2. Finally, comparing records 4 and 5, they arethe same for fields train and cat, but differ in the field paper which is to besorted ascending. Record 4 is therefore sorted after record 5.
The final sorted text fileis:
horse,cat, paper, train 2, word, 1.5, 0
1, word, 1.7,0 3, word, 1.4, 1 4, wood, 1.4, 1
Note: You areexpected to use a system library sorting routine, not to write your own sortingalgorithm.
Stages 4 and/or 5: Reading and sorting an unknown data format [3or 4 marks]
In the finalstage(s) you will reverse engineer an unknown file format containing the samedata fields that you are familiar with but stored using different representations.These stage(s) develop the following specific skills:
· Recognising binary data formats
· Interpreting and converting binary data formats
· Exploring binary files with dumpingtools
· Reading and writing binary datafiles
· Converting data from one format toanother
Options for your finalstage
Youhave the choice of which stage(s) to attempt to complete this assignment. Thefollowing are suggested guidelines, but the choice is entirely yours.
· Moststudents should complete stage 4 as the last stage of thisassignment. This option istheeasiest option. You can earn at most 3.0 marks for stage 4. You can stillachieve a very good total mark for the lab.
· Studentsaiming for HD grades may choose to skip stage 4 andcomplete stage 5 as the laststageof this assignment. This option may be the most difficult option. You can earnat most
4.0 marksfor stage 5.
· Studentsaiming for D or HD grades may first complete stage 4 and thenattempt stage 5 asthe last stage ofthis assignment. This option is the most work because the input files forstages 4 and 5 are completely different. Stage 4 is marked out of 3.0 and stage5 is marked out of 4.0, but your final mark will only include either your stage4 mark or your stage 5 mark
– whicheveris greater. For example, if you complete stage 4 and earn 2.9 marks and alsoearn 3.5 marks for stage 5, your final mark will include the 3.5 marks forstage 5 but not the
2.9marksfor stage 4. On the other hand, if your stage 4 mark was 2.9 and your stage 5mark was only 2.2 then your final mark would include the 2.9 marks earned forstage 4 but not the 2.2 marks for stage 5.
You can download both stages using the lab command and thendecide which stage you want to attempt first. You can change your decision atany time, but the structures of stages 4 and 5 input files are completelydifferent so there will be additional work involved if you work on both stages.
Resources
· C structure description
· Data file description
· Sort order specification
· Stage 4 or 5 sample input and outputfiles. The output files contain the same data as the input files, but theoutput files are converted to the original file format. For stage 4, the outputfiles are not sorted – this makes it possible to award partial marks if yourprogram correctly converts only part of each input record. For stage 5, theoutput files are sorted, and correct sorting requires correctly converting allthe fields of the input records.
· Debug output files to assist withstage 4. These files contain the text version of the output files. You shouldbe able to produce the same files by running your stage 2 program over thebinary output files, so these files are provided only as a convenience.
Task
Studythe provided sample data files. The input files are binary files in a new fileformat, while the output files are in your known file format (and, for stage 5,the output files are sorted). Your first task is to identify the input fileformat by comparing the information contained in it with the informationcontained in the sample files. Fortunately, there are some small files providedwhich will make it a lot easier to work out what information in the input filecorresponds with what information in the output file.
Youwill need to spend some time examining hex dumps (and possibly other dumpformats) of the sample input files and comparing the information in the byteswith what you might expect to find. Feel free to use whatever tools you canfind to examine the bytes in your files. The input files and output filescontain the same data values, but the binary formats of individual fields aredifferent, so you are looking for the correspondences between the data valuesin two files. For example, a 16-bit signed integer in your original file formatmight be represented as 32-bit unsigned or as a floating-point in the new fileformat.
Hint: On iLearn,there is a separate documentHints on Reverse Engineering a Data Filewhich provides helpful hints andsuggestions for completing stage 4 and/or stage 5 of this lab assignment.
Once you have identifiedyour input file data record’s format, modify your program from stage 3 so thatit can read a binary file in the new file format. As each record is read,convert the data into the correct form to store in your C structure. For stage4, your program will then output the data as an unsorted binary file in theoriginal file format. For stage 5, your program will sort the records andoutput a new sorted binary file in the original file format. For full marks,the output files must exactly match the sample output files provided, and yourprogram must work correctly on additional test data that we do not provide toyou. For stage 4, if you cannot work out how to convert a small number offields of the input file, you can still earn partial marks by correctlyconverting the other fields, and simply outputting zeros in the unconvertedfields; however, this approach would not work for stage 5 because the files aresorted.
As in stage3, use the command line to obtain the file names for the input and outputfiles. However, in this stage, the input file is a binary file in the new datafile format, while your program must write the converted data to the outputfile named on the command line.
Rememberthat coding style is important: use good modularisation, and use header filesappropriately. Your program’s style may be assessed according to the codingstandards in the documents Some ImportantComments on Code Style and SystemsProgramming Style.
Submityour program for auto marking. We will test your program on additional samplefiles that have not been provided to you.
About the New Data FileFormat
Thenew format input data files will contain the same information as the originalformat output data files, but the records of the input files are not sorted. Ifyou are attempting stage 4, the order of the fields in each record willbe the same as in the original format. On the other hand, if you areattempting stage 5, the order of the fields in the records will be differentfrom the original format. This makes stage 5 more difficult than stage 4because you have less knowledge of the unknown file format. Also, stage 5 ismade more difficult because the output files are sorted.
Thedata formats of individual fields can be different in the new data formatcompared to the original data format. In particular:
· Numeric fields can be represented as any of thenumeric types: signed or unsigned integers
in 8, 16 or32 bits; float or double. Note carefully: This means that an integer type inthe original file format may be represented as floating point in the new fileformat, and also that a float in the original file format may be represented asan integer in the new file format.
· Booleans can be represented in 8,16, or 32 bits, or several Boolean fields can be packed as specific bits in afield of 8 bits.
· Strings can have a different lengthand characters can be converted to strings in the new format.
In cases where a field has adifferent numeric type in the new format, we guarantee that the sample data valuespresent in the input and output files will semantically correspond (i.e. theywill have the same meaning). For example, if the input file has 16-bit signedintegers but the output file has 8-bit unsigned integers then the sample datavalues for that field will all be positive values in the range 0 to 255, sincethese values can be represented in both 16-bit signed integers and 8-bitunsigned integers. As another example, if the input file has 8-bit signedintegers and the output file has 32-bit unsigned integers, then the data valuesfor that field will all be positive values in the range 0 to 127, since thesevalues can be represented as both 8-bit signed integers and 32-bit unsignedintegers.
Useful Unix commands
You might find the followingUnix system commands helpful.
· od
· cmp