Most recent comments
Jogging og blogging
Are, 9 måneder, 2 uker
Liveblogg nyttårsaften 2016
Are, 9 måneder, 2 uker
Reading in dark times
Are, 11 måneder, 1 uke
Moldejazz 2016
Camilla, 1 år, 2 måneder
Karoline, 1 år, 3 måneder
Tor, 1 år, 4 måneder
Sony Smartwatch 3 review
Tor, 1 år, 4 måneder
Numerikk, takk
Tor, 1 år, 4 måneder
Topp tur
Camilla, 1 år, 6 måneder
Tolkien reading day
Tor, 1 år, 7 måneder
50 book challenge
Camilla, 9 måneder, 3 uker
Five years ago

Python for humanities people, part II

Continuing from part 1 of this series, we'll be talking about lists and the for loop. There will be no recap, so read up on the last installment if neccessary.

Another way to create a list is with the range function, which creates a list of integers:
>>> range(10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

You'll notice that this list contains the 10 first integers, starting from 0 and ending up at 9. You might perhaps have expected the numbers from 1 to 10, but this all ties in with the zero-indexing, i.e., that the index of the first element in a list is 0. However, range can also produce other ranges, for example if you want the list to start at something other than 0, than give the number you want to start at first:
>>> range(3, 10)
[3, 4, 5, 6, 7, 8, 9]

By the way, you might be wondering what range actually is, and the answer is that it is a built in function. We'll get back to functions, but for now let's just say that they are called by typing their name followed by the paranthesis, and the things inside the paranthesis are called arguments.

A third way to create lists, which will come in handy when we start looking at text, is to split up a string of text into smaller pieces. For example, let's define a variable holding some text:
>>> lorem = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.'

We can now use the method called split to split this text into words:
>>> lorem.split(' ')
['Lorem', 'ipsum', 'dolor', 'sit', 'amet,', 'consectetur', 'adipisicing', 'elit,', 'sed', 'do', 'eiusmod', 'tempor', 'incididunt', 'ut', 'labore', 'et', 'dolore', 'magna', 'aliqua.']

Now you might have a few questions. First of all, why do I say that split is a method, when it looks sort of like a function? We'll get back to this as well, but briefly, a method is a function which is bound to a particular object or class of objects. In this case, the variable lorem is a string, and split is a method which works with strings. The argument to split, which here is ' ' (a single space), tells the function which character it should use to split the string into smaller parts, so here we interpret the string as words separated by spaces. Secondly, you might wonder what to do with the special characters. For example, the fifth element in the list above is 'amet,' and not 'amet'. We'll get back to that as well. But first, we'll look at our first control construct, the for loop.

A control construct is so named because it allows us to control the execution of our program. In the case of the for loop, it makes it easy to do the same task a number of times without writing the same code over and over. Let's consider a simple example:
>>> for i in range(5):
...     print i

Several things are happening here. First, remember that writing range(5) is pretty much equivalent to writing [0,1,2,3,4], which is a list. The code inside the loop, in this case the statement print i, is executed once for each element in the list, an on each occasion, the value of the variable i (often called the loop variable) will be the value of the corresponding element. The print i statement simply writes the value of i to the terminal, so in this example, the value of i is set to 0, then i is printed, the result being that 0 shows up in the output, then i is set to 1, which is printed, etc. Note also that the code inside the loop is indented, and if there were more than one line inside the loop, they would all have to be indented the same amount, like this:
>>> for i in range(3):
...     print 'This line is inside the loop'
...     print 'This line is also inside the loop'
This line is inside the loop
This line is also inside the loop
This line is inside the loop
This line is also inside the loop
This line is inside the loop
This line is also inside the loop

This example also illustrates that you don't actually have to use the loop variable for anything, so in this case I just used a for loop to execute the same two lines of code three times.

We don't have to use a list of consecutive numbers to run a for loop, in fact it doesn't have to be numbers at all. Considering the text we looked at before, which we stored in the variable lorem, we can do the following:
>>> for word in lorem.split(' '):
...     print word

Note that here, I called the loop variable word, instead of i. It can be called anything you like, and it's often a good idea to use intuitive names in order to write readable code.

Next, we're going to try to do something more interesting than just printing words. We have at our fingertips a way to for example go through a large text, and do something with each word. As a fairly easy example, let's try counting the number of occurences of a particular word:
>>> text = "Fog everywhere. Fog up the river, where it flows among green aits and meadows; fog down the river, where it rolls defiled among the tiers of shipping and the waterside pollutions of a great (and dirty) city. Fog on the Essex marshes, fog on the Kentish heights. Fog creeping into the cabooses of collier-brigs; fog lying out on the yards and hovering in the rigging of great ships; fog drooping on the gunwales of barges and small boats. Fog in the eyes and throats of ancient Greenwich pensioners, wheezing by the firesides of their wards; fog in the stem and bowl of the afternoon pipe of the wrathful skipper, down in his close cabin; fog cruelly pinching the toes and fingers of his shivering little 'prentice boy on deck. Chance people on the bridges peeping over the parapets into a nether sky of fog, with fog all round them, as if they were up in a balloon and hanging in the misty clouds."
>>> n = 0
>>> for word in text.split(' '):
...     if word == 'Fog':
...         n = n + 1
>>> print n

Here, we first create the variable text, which holds one of the opening paragraphs of Bleak House, lifted from The Gutenberg Project. Then, we create the variable n, which is set to 0. Next, we create a for loop over all the words in the paragraph, and then we run into something we haven't seen before, which is the if statement.

It's been said that Python is like executable English, and in this case I'd say it's not far from the truth. This if statement problably does what you would think by looking at it. Looping over all the elements, which in this case is words, if it finds an element which is equal to 'Fog' it sets n equal to n + 1 (which is the same as increasing the value of n by 1. Then, after the loop is finished, we print the value of n, which turns out to be 5.

Two important points: First, note that the equals sign is used in two different ways here, both as == and as =. The first, ==, is called the comparison operator, and checks if two things are equal. If they are equal, they statement is true, if not, the statement is false:
>>> 5 == 5
>>> 5 == 4

The second case, =, is called the assignment operator, and is what we've been using to assign values to variables all the time. Mathematically, the statement n = n + 1 looks a bit dodgy, but in Python it means "assign the value of n + 1 to the variable n".

The second important point: While there is indeed five occerences of the exact word "Fog" in the paragraph above, there is also another seven occurences of the word "fog", and one of the word "fog,". Clearly, we need to be able to deal with these things, and that will be the topic of the next installment.


Camilla,  17.04.13 11:22

Jeg merker meg at det dukker opp ting her som jeg ikke fikk forklart. Sjokkerende.