WJEC Database Systems

Sequential File

A sequential file is a serial file which is physically stored in order of the key field. e.g. an employee file would be stored in employee number order. Items are read one at a time from the logical start of the file in key value order. Direct access to a particular record is not possible, as the file is read from start to finish.

A new record is added/deleted to a sequential file by:

• Copying the old file to a new file UNTIL the point of insertion/deletion is reached.

• Insert/delete the new record.

• Then copy over the remaining records to the new file.

If multiple records are to be added/deleted, these should preferably be sorted before the above process to avoid multiple updates.

Indexed Sequential File Access

Records are stored in key sequence order, and an index is used to allow the data to be accessed directly. Each record in the file has a key field, which uniquely identifies that record. The file consists of two parts: • The index.

• The home area.

The index is a table, which has one entry per record. Each entry contains the record key and its corresponding disk address of the record.

The home area contains all the records, stored in sequential order (i.e. in order of the key field). Therefore it is possible to access records in this file directly through the index, or sequentially, by reading through the whole file. To allow for this type of access, the file and index must be stored on a direct access medium .e.g disk, CD, memory stick Indexed sequential files are used as master files in large systems, e.g. the employee file. During the month, the file would be accessed randomly to apply any amendments such as change of name and address, as only individual records would need to be accessed. At the end of the month, when every record has to be accessed to process the payroll, then the file would be accessed sequentially.

The advantage of indexed sequential access over standard sequential access is that the index allows direct access to individual records, without having to read all previous records, which means that access times are reduced.

Multi-level Indexes

If an indexed sequential file is large, then the corresponding index will also be large. This may slow down the processing of the file considerably. In order to overcome this problem, the index is split into a number of separate indexes, or levels. Each index will be linked via addresses or pointers to the next level of index. E.g. to locate record for key 2202

To search for record 2202, the first level index is read, and directed to an address, where the second level index is stored. The second level index is read, and directed to an address, where the third level index is stored. This third index then points directly to the address of the record.

Overflow in an indexed sequential file

The home area of a file will be allocated when the file is first created, and it is allocated to be large enough to accommodate the typical number and size of records. However, after a period of time, the home area will become full, and it is necessary to store the records elsewhere on the disk. This area is referred to as the overflow area. When there is no room in the home area, then a pointer is left in the position in the home area, which points to the address in overflow where the record can be found. An indexed sequential file therefore consists of 3 parts: • A home area where the records are initially stored.

• One or more index areas to fold the indexes.

• On overflow area to hold records that will not fit into the home area.

File re-organisation

After a period of time, if records are continually being added to and deleted from an indexed sequential file, a large proportion of records will end up in the overflow area. This means that the overall time to access records will increase dramatically. Sooner or later it becomes necessary to re-organise the file. This means placing all the records back in the home area, and re-writing the indexes. Reading the file sequentially and writing the data on to a new file do this.

Direct (Random) Files

This type of file access is used when direct access to records is required, i.e. there will never be a need to read through the file sequentially. The records are not stored in sequence: the position of the record is determined by a calculation called a hashing algorithm. When a record is to be stored in the file, a hashing algorithm is applied to the record key and this determines the address of the block (physical location) where the record is to be stored. A typical algorithm will use the division/reminder method. An example algorithm would be:

The following algorithm converts a six digit record key as follows. The 6 digits are added together, then the result is divided by 29, and the remainder used as the output from the algorithm, i.e. the address.

Some algorithms are more complex mathematical functions.

Overflow in Direct (Random) files

Occasionally, the address provided by the hashing algorithm may already be in use, and it is necessary to store the records elsewhere on the disk. This area is referred to as the overflow area. The hashing algorithm creates a pointer, which points to the address in overflow where the record can be found. Data is normally stored and searched serially in this overflow area. When there are many records in the overflow area,, access to the records may become slower. To resolve this, the file will need to be re-organised, which may mean creating a new hashing algorithm and allocating a larger area for the file.

Advantages of Sequential files over Random files

If only sequential access is required, e.g. to read the whole file, then the random file would be slow because of the extra overheads of hashing. Also, less space is wasted.

Advantage if Indexed Sequential over Random files

Indexed sequential access will allow for faster sequential processing of the whole file

Advantages of Random files over Sequential

There is no need to access every record, as each required record can be accessed directly, without reading the others first.

File Security

Data security involves the use of various methods to make sure that data is correct, is kept confidential and safe. Data and information are valuable and need to be protected against all threats – accidental or deliberate. This can be achieved through the use of backup copies, keeping generations of files, and logs of all transactions made, which help prevent the loss or destruction of data.

Backup Copies

There are many types of backup and which you choose will depend on money available and how critical the loss of data would be. After a back-up has been performed, it is important to remove the copy to a safe location, usually off site, although a fire proof cabinet is often used. This will allow the system to be recovered in the event of a disaster. Different types of back up are: • File copy – simply copy the file to another medium.

• Incremental back up – make a full copy at first, then only copy data that has changed since the last full back up.

• Transaction log(used with on-line updating) – all data used to update the master file are stored in the transaction log. If there is a failure, all transactions stored on the transaction log are applied to the most recent backup of the master file in order to restore the master file to the same point as immediately before the failure. This is a costly system, but results in minimal data loss.

• Mirror – this is where a complete system is run in parallel. All changes are made to both systems at the same time. This is very costly.

In all of these different file types, previous generations of the file may be kept to increase the security. The general practice is to keep three of the most recent versions of the file as generations, called grandfather, father and son, so that if one of a file becomes corrupt, then there will be previous versions (generations) available.


An archive file is one which contains data that is no longer in current use, but which must be kept securely in long term storage as it may be required at some future date e.g. for audit or legal purposes. They are usually kept away from the computer system, in a fire proof safe or off site. Archiving is necessary to reduce file sizes, which will then increase access speeds to the current data, and free up more data storage.

File Privacy

Data Privacy is the requirement that data is only accessed or disclosed to authorised persons. The Data Protection Act requires that systems have safeguards (both physical and software) built into them to reduce the risk of unauthorised access. This can be achieved by methods such as passwords, access levels, and encryption.

User Passwords and Levels of Access

People are often given user names and passwords, which they need to log on to the system. Once logged on, they are given a level of access by being granted access rights to files and resources by the system administrator. Different users can be given different access rights e.g. read only, read and write, read, write and update.


Encryption is a method of safeguarding data by making the data in a computer system unintelligible. Files can be encrypted so that they can be read but not understood by unauthorised users, who will not have access to the decryption process. Decryption is converting the unintelligible data back into an understandable form. An encryption key is a word or code selected by the user to govern the encryption process, and a decryption key is needed before the data can be understood. All authorised users must have a copy of the key before they can carry out the encryption and decryption processes.

computer science

QR Code
QR Code file_system_wjec (generated for current page)