Week 1 - Basic Data Structures

This week we will be looking at some basic data structures, the array, the linked list and the stack, and looking at their relevance.

The importance of data structures

We will start with a look at data structures. First of all, what is a data structure and why is it important? When it comes to programming, we frequently need to deal with collections of data, rather than a single item of data. For example, we might write a program to manage student records. This program would need to deal with not just one student, but many students. Similarly, you might want to write a program for a live music venue, to allow the venue to store concerts and allow users to book tickets. Again, such a program would have to deal with many concerts and many users.

To store multiple items in a computer program, we need to choose an appropriate data structure. A data structure is a programming construct which allows us to store multiple items. When dealing with data structures, an important thing to consider is which is the most appropriate data structure for the problem we are trying to solve? Certain data structures are more suited to specific types of problem, and we will be looking at this through the module.

A basic data structure - the array

Let's start with the most basic data structure of all, the array. An array is a data structure which can hold more than one item of data, and can be thought of as a series of "boxes" which can hold data. With an array, the key thing is that each box is adjacent to its neighbours in memory which has some consequences in terms of efficiency which we will consider as we go on

The diagram below shows an example of an array which holds the names of five items of fruit.

An array

The things to note here are:

Arrays do have some disadvantages. Their simplicity makes them easy to work with but this simplicity can lead to limitations.

John Stevenson
Jane Smith
Tim Wilson
Kate Stevenson
Kate Palmer
Tom Eastman
Laura Green
Mike Watson
Sally Black
Mark Ramsey

You are writing a program to store employees for a company. It's a small company, with only 10 employees, shown above.

Advantages and disadvantages of arrays

Advantage: Fast lookup

As we saw above, the advantage of using an array is that it is very fast to index an array. This is because arrays are always sequentially stored in memory, so that in the example above, "apple", "orange", "pear", "grape" and "cherry" will be stored adjacent to each other in memory.

When indexing an array, the computer is able to work out where in memory the data is stored, because each member of an array uses up the same number of bytes. The equation is simply:

Address = Start Address + Index * Size Of One Member

So imagine, for example, that each member of an array uses up 32 bytes, and the start of the array is at memory location 24576. It's an easy calculation to work out where the member with the index of 3 is located. If we substitute the values into the equation we get:

Address = 24576 + 3 * 32

which is 24576 + 96, which is 24672.

We ourselves do not need to do this calculation. The computer does it. But we can see, because it's simple arithmetic, the computer is able to do it very quickly. So, arrays are optimised for fast look-up of data using a numerical index.

Creating a simple program making use of an array

Here is a program which makes use of an array. We are creating an array called people to hold the names of members of a team.


import numpy as np
fruits = np.array(['Apple', 'Orange', 'Cherry']) 

print("The first fruit is " + fruits[0])
print("The second fruit is " + fruits[1])
print("The third fruit is " + fruits[2])

Note a couple of things:

Coding exercise 1.1

Write a program which creates an array with the 10 employees mentioned in Exercise 1. Display "Jane Smith" and "Mike Watson" by indexing the array appropriately.

Disadvantage: not flexible, an array cannot be resized

Now, imagine you want to add two more entries to the array:

employees[10] = "Kevin Kennedy"
employees[11] = "Lisa Lord"

Now run it. Do you get the result that you expected? See how this illustrates an issue with arrays: they have fixed size. If an array is initialised with a certain number of elements, you cannot add additional elements later and the only valid indices are those indices within the capacity of the array. Here, the array was initialised with 10 elements so only indices 0-9 are valid. Furthermore attempting to append to the end of the array via an append operation would not work either:

employees.append("Kevin Kennedy")

Not enough space, or too much space!

Imagine we have a larger array storing company employees, as shown in the diagram below. (Just the initials of the employees are shown).

Employee array

Let's say, though, the company takes on an 11th employee, "Ulysses Vernon" (UV). Do we have space in the array to fit this 11th employee? No, we don't, because an array has fixed size!

So what happens if the company expands and takes on more employees? What could we do?

Adding to an array

Using a List

One solution is to use a more complex data structure, which can be extended with new data. In fact Python uses one such data structure, the list. A list is essentially a wrapper round an underlying array. With the list you can add new items of data to the end of the list using the append operation, and keep doing so until the computer runs out of memory.

Other languages have similar "extensible array" data structures, for instance C++ has the vector and Java has the ArrayList.

Creating a simple program making use of a list

Here is a program which makes use of a list. We are creating a list called people to hold the names of members of a team. The list initially has space for three members, however we can later append to it, to expand the list.


print("Please enter the three people in your team")
people = [None] * 3
people[0] = input("Please enter the first person:")
people[1] = input("Please enter the second person:")
people[2] = input("Please enter the third person:")

print("The first person is " + people[0])
print("The second person is " + people[1])
print("The third person is " + people[2])

fourth = input("Please enter the fourth person:")
fifth = input("Please enter the fourth person:")
people.append(fourth)
people.append(fifth)
print("The fourth person is " + people[3])
print("The fifth person is " + people[4])

Note a couple of things:

Coding Exercise 1.2

Rewrite your exercise 1.1 code to use a list rather than an array. Try appending to it using append(). Does it work?

How does a list work under the hood?

We have looked at how to use a list. However we are now going to look at what actually happens when a list is appended to, via an interactive exercise.

Lists contain an internal array. So what happens when we need to extend a list? We create a new array with more space when the old one runs out of space. We need to:

This is shown below. This diagram shows a simplified version of what happens when you append to a list. It is a repeat of the diagram showing array resizing, but also shows the list (in green) as a wrapper round the array. It shows how the internal array has to be recreated and the old data copied across when you append to a list.

Appending to a list

This however is inefficient because a new internal array is created with additional space to hold the new elements, and then the old data is copied across from the old array to the new array. We have to loop through each element in the old array and copy it across to the new array, which is slow

(The Python list does however include some optimisations to improve the efficiency of append and insertion operations. For instance, more memory is allocated for the internal array than is needed, meaning that a new internal array need not be created if only a small number of items are added as there will be spare space at the end of the array to hold them. See here for details.)

The stack

A stack data structure involves adding items from bottom to top, rather like a stack of plates. When we remove items from the stack, we remove from the top, again just like a stack of plates. The stack is known as a "last in first out" or "LIFO" data structure. It is called this, because the last things we add to the stack, are the first things we remove. Here is an example of a simple stack of numbers.

Simple stack

A stack can be used for any operation in which we need to navigate back to a previous state. Examples could include:

(In actual fact, each of these is now implemented in a slightly more complex way, in the sense that you can, in modern browsers, move both back and forwards along your history, but we are assuming a more simplified implementation in which you can only move back for the purposes of illustrating a stack).

Another use of stacks, which you will appreciate more when you have done more programming, is:

The two key operations of a stack, adding and removing items, have special terms.

An additional operation is:

Paper Exercise 1.3 : Stacks

We are now going to perform another paper-based exercise with stacks, to help you understand them and their operations.

Imagine you have an empty stack. Draw the stack after each operation below, and explain what, if anything is returned from each operation and any errors that might occur.

push (a), push (b), pop (), push (c), peek (), pop (), pop (), pop (), push (d), push (e), push (f), pop (), push (g), push (h), peek (), push (i), pop (), pop (), pop (), peek ().