代做CSCI 3280 Introduction to Multimedia Systems Spring 2024, Assignment 2代写留学生Python程序
- 首页 >> CSCSCI 3280 Introduction to Multimedia Systems
Spring 2024, Assignment 2 - LZW Compression
Due Date: Mar. 28th, 2024 (11:59pm) Submission via Blackboard
Late submission penalty: 10% deduction per day (maximum 30%)
PLAGIARISM penalty: Whole Course Failed
Introduction
LZW is a dictionary-based compression algorithm and does not perform any analysis on the input text. LZW is widely known for its application in GIF and in the V.42
communication standard. The idea behind the LZW algorithm is simple but there are implementation details that should betaken care of. It is lossless, meaning no data is lost when compressing. The algorithm is simple to implement and has the potential for very high throughput in hardware implementations. In this assignment, you are required to write an LZW algorithm (both the compression and decompression). The following sections explain the details.
Code Dictionary
Code dictionary is a dictionary holding the correspondence between strings and
codes. By using the LZW algorithm, we can construct the dictionary on the run, and represent a string with its corresponding code in the dictionary so that less number of bits can be used to store the same information.
In this assignment, we will be using 12-bit code sequence to represent strings of 8- bit characters in the original file. Because a 12 bit can have values from 0 to 4095, the dictionary can hold 4095 different strings (except code 4095 which reserved as EOF code). The first 256 entries of the dictionary need to be initialized as the 256 ASCII characters to represent all single-character strings.
An interesting aspect of this algorithm is that we don’t need to store the dictionary
in the compressed file; instead, we discard the dictionary once the files are compressed, and reconstruct on the run when westart to decompress.
Once you dictionary is full, you need to delete the dictionary and start a new dictionary.
You can refer to the example below. The first 256 and the last entries have
predefined values, while the remaining entries can be used to hold multi-character strings encountered during the compression.
Algorithm for Compression
You can follow the procedure below to compress data using LZW algorithm.
1) Initialize DICT with the first 256 entries
2) Define 2 variables: STRING, CHAR
3) Get first byte from file and store in STRING
4) While there is more byte in file, read 1 byte and store in CHAR
5) If (STRING+CHAR) is in DICT, let STRING = (STRING+CHAR)
6) Else, write code of STRING to compressed file, add (STRING+CHAR) to DICT and let STRING = CHAR
7) Write code of STRING
In this way, we will be scanning through input character stream, and constantly adding incoming characters into a variable STRING. Once the STRING + its following character CHAR forms a new string which is not currently in the dictionary, we will record this STRING+CHAR into the DICT, then output the code corresponding to STRING, and finally, start the new STRING from the current CHAR (we will not be using the newly added code to represent the first occurrence of this unseen combination, but to use it when we see this combination again).
Make sure you start a new dictionary when the current one has no empty entries.
On the other hand, you should not start new dictionaries when you finish compressing one file and start with another, unless the dictionary is full.
Algorithm for Decompression
1) Initialize DICT with the first 256 entries
2) Define 4 variables: CURRENT, NEXT, STRING, CHAR
3) Get first code from file and store in CURRENT
4) While there is more code in input, read 1 code and store in NEXT
5) Translate CURRENT and store in STRING, and write STRING to output
6) If entry NEXT is non-empty in DICT, translate NEXT and store the first character in CHAR
7) Else, let CHAR = first char of STRING
8) Add (STRING+CHAR) to DICT and let CURRENT = NEXT
9) Output translation of CURRENT
Following this procedure, wescan through the compressed code sequence, each time load a code into CURRENT and the following one into NEXT.
Remember that we output the code of STRING and save (STRING+CHAR) into the
DICT if and only if the current STRING is interrupted by the incoming CHAR during the compression. To do the inverse process of the Compression, for each CURRENT, NEXT pair in the sequence, we need to first decode CURRENT back to STRING and write STRING to the decompressed file, and then save (STRING + the following
CHAR) into DICT.
Because NEXT is representing the string following the one represented by CURRENT, in most cases we can use the first character of the translation of NEXT as CHAR. But the tricky part is that sometimes NEXT does not exist in DICT, waiting for the newest (STRING+CHAR) to be inserted first to know its value. In this case, we can reason
that CHAR==(STRING+CHAR)[0], and CHAR==STRING[0] for any STRING that has length >= 1. As a result, we can instead use STRING[0] as the value of CHAR.
If you implemented the algorithm correctly, the DICT in decompression should be
exactly the same with the DICT in compression. Similar to the compression part, you need to start new dictionaries on demand.
Basic Part (80%)
1) Write a Python program based on the given skeleton code to perform compression and decompression of files using LZW.
2) Program has to support compressing multiple files into one file and decompress
back to multiple files. The code sequences of different files should be separated by a <EOF> code. There is no need to extract filenames from input file paths; code for reading and writing file headers has been given in the skeleton code. An example of compressed file (the header is in plaintext) is given as:
.\File1.txt\n
.\File2.txt\n
\n
<Code sequence for File1.txt> <EOF(4095)> <Code sequence for File2.txt> <EOF(4095)>
3) The code size has to be 12-bit (dictionary size = 4096).
4) The decompressed file has to be exactly the same with the input file.
5) Program will be tested with both ASCII files and binary files.
6) The input file can have arbitrary length. (You should ensure your program to work well with several mega-bytes of input data)
7) The compressed file will not be compared with a sample compressed output,
however, there will be penalty if your compressed file is more than 10% larger than the sample compressed file (significantly over-sized output indicate you are
implementing the dictionary incorrectly).
8) The program should run as a console program and accepts the following arguments:
For compression:
python lzw.py –c <compressed file name> <file 1> <file 2> …
(Example) python lzw.py –c .\output.lzw .\text.txt .\image.png
For decompression:
python lzw.py –d <compressed file name> -o <output dir>
(Example) python lzw.py –d .\output.lzw -o .\output
Enhancement Part (20%)
In this part, you can implement features related to LZW. Here are some examples:
• Try implementing encryption of the compressed file. You can use your
password to manipulate the dictionary or the initial state of the compressed file to avoid decompression of meaningful content. You can choose any
simple encryption algorithms suitable to be used in the compression process.
• Try implement binary-to-text encoding (e.g. base64, hex) to convert the
compressed file from binary file into text characters. Such method is widely used by systems like E-mailto provide compatibility for non-ASCII files.
• Try implement variable-width code which use 8-bit code in the beginning and increase the code size by 1 each time all codes in the current dictionary have been used up. By this means, shorter code in the early stage can be stored in a more compact way.
You can also implement other interesting features about LZW compression.
Please write a report to describe your enhancement features. For each additional feature:
1) Detail of the features
2) Necessary code explanation related to the features
3) Sample input and output (if applicable)
4) The techniques used (if applicable)
5) References (if applicable)
The report is important for the identification and understanding of your
implemented feature. We will do the grading and test the correctness of the source code based on the features you mentioned in the report.
If the features claimed in the report can only be barely used (e.g., with a lot of bugs) or even do not exist in the source code, marks will be deducted.
General Specifications
1. Program should be coded in Python and uses libraries listed in the skeleton code only.
2. You are not allowed to use extra libraries that are not imported in the skeleton code for your assignment. If you are proposing enhancement features that are not mentioned in the specification, and have good reason to use extra libraries, please consult the TAs.
3. The following files are provided for you to write and test your program. a. lzw_skeleton.py: The skeleton code for LZW program.
i. read_file_header, write_file_header functions for helping you to read and write file headers in the required format.
ii. read_code, write_code functions for reading and writing N-bit code from and to files.
b. Three test files: Windows.txt, CSE.txt, web.bmp
c. lzw.exe & compressed.lzw: A sample program for compression and decompression, and a sample compressed file containing all three test files. (The sample program does not support “-o” option to change the output directory, the decompressed file will overwrite the input file if you decompress in the same folder)
(You don’t need to have exact same output with the sample program)
4. You are required to submit source codes only. We will test your code using a Python3 interpreter.
Please make sure that your source code is fully operational, failing to execute will receives 10-point deduction for each failed source.
Submission ( Deadline: Mar. 28th, 2024 11:59pm )
We expect the following files zipped into a file named by your student ID (e.g., 1155xxxxxx.zip) and have it uploaded to the course’s Blackboard system.
1. report.pdf (Tell us anything that we should pay attention to, especially about the enhancement part)
2. lzw.py (Basic requirement source code)
3. lzw_enhancement.py (Enhancement part source code)
Other files that are provided are not required to be submitted.