Week 1 - Basic Data Structures

This week we will be looking at some basic data structures, the array, and the list, and looking at their relevance. We will also start some very basic Python programming using lists.

The importance of data structures

First of all, what is a data structure and why is it important? When it comes to programming, we frequently need to deal with collections of data. For example, we might write a program to manage student records. This program would need to deal with not just one student, but many students. Similarly, you might want to write a program for a live music venue, to allow the venue to store concerts and allow users to book tickets. Again, such a program would have to deal with many concerts and many users.

To store multiple items in a computer program, we need to choose an appropriate data structure. A data structure is a programming construct which allows us to store multiple items. When dealing with data structures, an important thing to consider is which is the most appropriate data structure for the problem we are trying to solve? Certain data structures are more suited to specific types of problem, and we will be looking at this through the module.

A basic data structure - the array

Let's start with the most basic data structure of all, the array. An array is a data structure which can hold more than one item of data, and can be thought of as a series of "boxes" which can hold data. With an array, the key thing is that each box is adjacent to its neighbours in memory which has some consequences in terms of efficiency which we will consider as we go on

The diagram below shows an example of an array which holds the names of five items of fruit.

An array

The things to note here are:

Arrays do have some disadvantages. Their simplicity makes them easy to work with but this simplicity can lead to limitations.

Working with arrays in Python

Note that this section is not necessary if you are studying this module over the summer, as you should be comfortable with this as you have already completed COM411. If you are on the summer instance, we will skip this section.

You are being introduced to Python programming on another module, COM411. In this section, however, wewill explore how to create and index a simple array like data structure in Python. Note that Python comes with a range of different data structures to represent collections of data. The default is the list, which is a flexible language component able to be used not just as an array but as various other data structures. However, here, we will treat the list as if it was an array.

Using Repl

If you have had your first COM411 class already, you will have been introduced to Repl and should have an account. If not, here is a quick introduction. Repl is an online environment which allows you to program without setting up software on your own computer.

You can login to Repl using either a Repl account or your GitHub account. You may well already have a GitHub account, if not, you will find it useful throughout your time at university and also when coding in general. GitHub is a collaborative platform for developing projects which allows you to version-control your code, in other words you can commit your code every time you reach a certain stage, and then "roll back" to previous commits if you are unhappy with new code. You will learn more about GitHub, and the Git version-control system, on other modules, but for now, simply create an account on GitHub.

Once you have created a Repl account or GitHub account, you can log in with it on Repl. If you have created a simple "hello world" project already in COM411 you do not need to do anything else. If not, test that it all works by creating a new project.

Repl interface

If you select the menu icon on the top left, you can select "New Repl". A Repl is a project - essentially, a single program. The idea is that you create a new Repl for each program you are going to write. You can then go back and view your previous Repls to load the code.

New Repl

Once you have done this, just enter the following Python code:

print("Hello World!")

Run the code and you should see the output appear in the console to the right:

Hello World!

Repl Console Output

Advantages and disadvantages of arrays

Advantage: Fast lookup

As we saw above, the advantage of using an array is that it is very fast to index an array. This is because arrays are always sequentially stored in memory, so that in the example above, "Linux", "Windows" and "Mac OS X" will be stored adjacent to each other in memory.

When indexing an array, the computer is able to work out where in memory the data is stored, because each member of an array uses up the same number of bytes. The equation is simply:

Address = Start Address + Index * Size Of One Member

So imagine, for example, that each member of an array uses up 32 bytes, and the start of the array is at memory location 24576. It's an easy calculation to work out where the member with the index of 3 is located. If we substitute the values into the equation we get:

Address = 24576 + 3 * 32

which is 24576 + 96, which is 24672.

We ourselves do not need to do this calculation. The computer does it. But we can see, because it's simple arithmetic, the computer is able to do it very quickly. So, arrays are optimised for fast look-up of data using a numerical index.

Exploring the problems with arrays

Disadvantage: not flexible, an array cannot be resized

Imagine we have an array storing company employees, as shown in the diagram below. (Just the initials of the employees are shown).

Employee array

Let's say, though, the company takes on an 11th employee, "Ulysses Vernon" (UV). Do we have space in the array to fit this 11th employee? No, we don't, because an array has fixed size!

So what happens if the company expands and takes on more employees? What could we do?

Adding to an array

Process needed to extend an array

If we wanted to add additional members onto an array, we would have to:

Array extension

Further disadvantage: insertion difficult

What if we wanted to add a new member at a specific position in the array? For example, imagine we had an array of contacts that we wished to keep in alphabetical order, e.g:

Alex Acland
Bonnie Black
Charley Chase
Earl Edwards
Frances Freeman

Imagine we had a new name, "Danielle Dawson" which we wanted to insert between "Charley Chase" and "Earl Edwards" in the array above. What we would have to do is:

Array Insertion

Clearly this is inefficient as we have to create a new array, which is expensive in memory usage, as we have to have the old and new arrays in memory at the same time.

More flexible data structures

We have seen, through the exercise above, the limitations of the basic array. Consequently, there are a whole range of more flexible and specialised data structures which we can use for particular scenarios.

In fact Python uses one such data structure, the list. A list can act, amongst other things, as an extensible array, which does not have a fixed initial size; you can add additional items of data to the end of the list, and keep doing so until the computer runs out of memory. However, because the list is implemented internally using an array, this may require creation of a new array with additional space to hold the new elements.

(The Python list does however include some optimisations to improve the efficiency of append and insertion operations. For instance, more memory is allocated for the internal array than is needed, meaning that a new internal array need not be created if only a small number of items are added as there will be spare space at the end of the array to hold them. See here for details.)

Other languages have similar "extensible array" data structures, for instance C++ has the vector and Java has the ArrayList.

The linked list

We will start with a look at linked lists. Linked lists are unlike arrays in that they are not stored continuously in memory Instead, data is stored as a series of linked nodes. Each node contains one item of data, and links to the memory locations of the previous and the next item of data in the linked list.

Linked list diagram

Each node has a link to the previous and the following node. When we add a new item of data, we make the previous node link to the new node, and we link the new node back to the previous node to form a two-way link.

The first node in the list links to nothing in the reverse direction (indicated in Python by the special value None) and similarly, the final node in the list links to nothing in the forward direction.

What are the consequences of this?

The stack

A stack data structure involves adding items from bottom to top, rather like a stack of plates. When we remove items from the stack, we remove from the top, again just like a stack of plates. The stack is known as a "last in first out" or "LIFO" data structure. It is called this, because the last things we add to the stack, are the first things we remove. Here is an example of a simple stack of numbers.

Simple stack

A stack can be used for any operation in which we need to navigate back to a previous state. Examples could include:

(In actual fact, each of these is now implemented in a slightly more complex way, in the sense that you can, in modern browsers, move both back and forwards along your history, but we are assuming a more simplified implementation in which you can only move back for the purposes of illustrating a stack).

Another use of stacks, which you will appreciate more when you have done more programming, is:

The two key operations of a stack, adding and removing items, have special terms.

An additional operation is:

Exercises

Exercise 1: Arrays: Paper-based

You are writing a program to store employees for a company. It's a small company, with only 10 employees. Try drawing, on paper, an array containing these 10 employees. Draw each name in each position in the array, similar to the diagram above containing an array of fruit.

John Stevenson
Jane Smith
Tim Wilson
Kate Stevenson
Kate Palmer
Tom Eastman
Laura Green
Mike Watson
Sally Black
Mark Ramsey

Answer these questions:

Exercise 2: Arrays (code): Creating a simple program making use of an array

  1. Here is a program which makes use of an array (it's actually a Python list, but we're going to treat it as an array).
operating_systems = ["Linux", "Windows", "Mac OS X"]
print(operating_systems[0])

Note a couple of things:

Try running this code but before you do so, try to predict what will be displayed.

Once you have done so, extend your code so that "Mac OS X" is displayed (by indexing the array).

Now, imagine you want to add two more entries to the array, "Android" and "iOS". Before trying it, do you think this would work?

operating_systems[3] = "Android"
operating_systems[4] = "iOS"

Now run it. Do you get the result that you expected? See how this illustrates an issue with arrays: they are not resizable.

  1. Write a program which creates an array with the 10 employees mentioned in Exercise 1. Display "Jane Smith" and "Mike Watson" by indexing the array appropriately.

Exercise 3: Adding new data to an array (paper)

Do this exercise on paper. This is a similar example to that given in the lecture, and is designed to illustrate the problems with adding new data to an array.

Exercise 4 : Linked List (paper)

  1. Think about what you would have to do to search for a particular item in a linked list using its index, starting at the beginning.

    Draw out a linked list containing the 5 items of data:

    - Linux 
    - Windows 
    - Mac OS X
    - Android
    - iOS
    

    Imagine we wish to retrieve the item with index 3 (Android). How could we do this? Draw out how you think it could be done on paper, and ask yourself: how efficient is this, particularly compared to doing the same thing with an array.

  1. Remember with an array, that we had problems if we wanted to add new data to the array. We had to create a brand new array with more space, and copy the old data into the new array before adding the new data. Think about doing the same thing with a linked list. On paper, draw a linked list containing the elements

    - Linux
    - Windows
    - Mac OS X
    

    Now try and add new element "Android" to the end of the linked list. What do you have to do? Draw out what happens on paper.

Exercise 5: Stacks

We are now going to perform another paper-based exercise with stacks, to help you understand them and their operations.

Imagine you have an empty stack. Draw the stack after each operation below, and explain what, if anything is returned from each operation and any errors that might occur.

push (a), push (b), pop (), push (c), peek (), pop (), pop (), pop (), push (d), push (e), push (f), pop (), push (g), push (h), peek (), push (i), pop (), pop (), pop (), peek ().

Exercise 6: Linked list insertion

Think about adding a new item of data into the middle of a linked list. Consider this ordered list of names:

On paper draw these as a linked list. Now try to add "Danielle Dawson" at the appropriate place between "Charley Chase" and "Earl Edwards". What operations need to be done now? Draw them on paper again. Do you think this would be more or less efficient than using an array?

Solution

Note the mixed efficiency when inserting into the middle of the linked list:

Reading for next week

Next time we will look at how we can actually implement these data structures in Python. To prepare for this, try doing a bit of reading on Classes and Objects in Python.