DESIGN AND IMPLEMENTATION OF A DATA COMPRESSION SOFTWARE
CHAPTER ONE
GENERAL INTRODUCTION
1.1.0 INTRODUCTION
In recent times, there has been a great need driven towards the maximizing of data transfer between communication terminals thereby making efficient use of network bandwidth and disk space. Compression is the process used to reduce the physical size of a block of information. Data encoding is the term used to refer to algorithms that perform compression. Data compression is a type of data encoding. Doyle and Carlson (2000) write that data compression “has one of the most simple and elegant design theories in all engineering”. A simple characterization of data is that it involves transforming a string of characters in some representation (such as ASCII) into a new string (0f bits for example) which contains the same information, but whose length is as small as possible. Data compression squeezes data so it requires less disk space storage, less bandwidth on a data transmission channel. Communication equipment’s like modems, bridges and routers use compression scheme to improve throughput over standard leased lines or phone lines.
File compression can be employed at various levels; a user can choose to compress individual files, a whole folder or the whole of a drive. Most compression schemes take advantage of the fact that data contains a lot of repetitions. For example, alphanumeric characters are normally represented by a 7-bit ASCII code but a compression scheme can use a 3-bit code to represent its most common letters. Compressed files are called archives. Archives can contain more than one file. Archive files are manipulated with utilities such as WinZip or IZArc.
1.2.0 PROBLEM DEFINITION
Many sources of information contain redundant data or data that adds little to the stored information. This results in tremendous amount of data being transferred between client and server application. Many times lots and lots of information is to be transferred over a communication channel; this information if not compressed requires a lot of disk space for storage. Similarly, it is important to note that large bits of information require large bandwidth over a transmission channel. This bandwidth is measured in bits/seconds which makes it costly. A large chunk of information require more transmission time than less information. All these factors are the problems that gave rise to the need for compression.
1.3.0 OBJECTIVE OF THE STUDY
There are many reasons for data compressions; the main aim of data compression is to reduce redundancy by reducing storage requirements. When the amount of data to be transmitted is reduced, the effect is that of increased storage capacity of the communication channel. Similarly, compressing a file to half its original size is equivalent to doubling the capacity of the storage medium. It may then become flexible to store the data at a higher rate thus faster level of storage hierarchy and reduce the load on the input/output channels of the computer system.
One objective of this project is to achieve a faster file transfer as well as make use of less bandwidth on a data. For data communication, the transfer of compressed data over medium results in the increase in the rate of information transfer. This is another aim of file compression.
Basically, source coding for data compression is a method , utilized in data systems to reduce the volume of digital data to achieve benefits in areas including but not limited to;
(a) Reduction of the transmission channel bandwidth
(b) Reduction of the buffering and storage requirements
Reduction of data transmission time at a given rate. Thus at the end of this project, I should be able to develop a data/file compression and decompression utility that aids easy transfer of data.
1.4.0 RESEARCH JUSTIFICATION
Bandwidth is used as a synonym for data transfer rate (DTR) which is the amount of digital data that is moved from one place to another in a given time. It can be viewed as the speed of travel of a given amount of data from one point to another. In general, the greater the bandwidth of a given path, the higher the transfer rate together with other resources like disk space, time and money which are very necessary in networking form the motivation for this project.
This project is essential to all users of the internet and indeed all users of the computer system as compression will allow more work to be done.
1.5.0 RESEARCH METHODOLOGY
Data compression can be implemented on existing hardware by software or through the use of special hardware devices that incorporates compression techniques. The efficiency of compression utility also depends on the specific algorithm used by the compression program. While it is possible to compress and decompress data using tools such as WinZip, Gzip and Java Application (or jar) these are used as standalone applications. The WinZip tool is used to create a compressed archive and to extract files from compressed archive in the windows. On UNIX, tar is used to create archive file then the Gzip command is used to compress the file. Others are the lossy and lossless techniques. The lossless data compression has the ability to return the decompressed data after compression back to its original form. On the other hand, in the lossy compression the decompressed data may be different from the original data. An example of lossless compression is WinZip and JPEG is an example of a lossy compression. Lossy compression method typically offers a three-way trade off between compression speed, compressed data size and quality.
In this one, it is intended that the lossless algorithm shall be used as a tool to create a compression utility like WinZip, Gzip and JPEG to solve the problem of high use of internet bandwidth, reduce the problem of low disk space, hence doubling the capacity of the storage medium to aid early file transfer.
1.6.0 SCOPE AND LIMITATION OF STUDY
The scope of the study; implementation and design of a file compression is based on the study of already existing compression utility and compression algorithm which shall lead to an introduction of a new compression utility that aid resources like storage space, data transfer rate, bandwidth, disk space, time and money.
The limitations that will hinder effective implementation of this project include;
(a) The scope is centered on basic implementation of compression and decompression processes.
(b) Another limitation is that the project implementation would not take into consideration low level details and technicalities involved in creating a compression utility, but will focus on employing various pre-existing APN (Application programming Interface) and libraries in order to create such a utility.
1.7.0 DEFINITION OF TERMS
ALGORITHM: a set of instructions followed in a fixed order and used to solve computer programs.
THROUGHPUT: the amount of work, goods or people that are dealt with in a particular period of time.
STUBS: accidentally strike against something.
UTILITY: a piece of computer software that has a particular use.
PROLIFERATION: a sudden increase in the amount or number of something.
ENSEMBLE: a set of things that go together to form a whole.
PERMUTES: submit to a process of alteration, rearrangement or permutation.
PREMISE: a previous statement from which another is inferred.
PATENT: a special document that gives you the right to make or sell a new invention or product that no one is allowed to copy.
ENCRYPTION: a process or securing information on the computer using special codes that only some people can read.
METADATA: information that describes what is contained in large computer data bases
INNOCUOUS: something that is not likely to cause harm to anyone or to cause trouble
.