Strings

In this chapter we shall take a closer look at the string data type and some of the operations associated with it. The following page makes heavy reference to online notes by Dr. Andrew N. Harrington, Hands-on Python 3 Tutorial [Str1].

String Literals

A string literal simply refers to how you specify that the data you are writing is a string. In Python this is achieved by placing quotes around the string contents. For example:

str_single = 'This is a string'

You are not limited to single quotes. For single line strings you can use double quotes as well:

str_double = "This is a string"

Note that these two strings are identical.

In most cases you are free to decide which quotes you want to use. The standard for Python is to use single quotes where possible, but what’s most important is that your style choice is consistent within a project.

Sometimes it is advantages to use single or double quotes specifically. For example, if you want to use double quotes inside your string this will break a double quote string literal, but not a single quote one, and vice versa.

print('String using single quotes, " does not break the string.')
print("String using double quotes, ' doesn't break the string.")
String using single quotes, " does not break the string.
String using double quotes, ' doesn't break the string.

For strings containing line breaks, you can use either ''' (three single quotes) or """ (three double quotes) to enclose the string contents:

print(
'''String with a
line break'''
)

print(
"""Another string with a
line break"""
)
String with a
line break
Another string with a
line break

Note that white space (like indentations) will show up in these strings:

print(
    '''
    A string with
    Indented lines
    '''
)
    A string with
    Indented lines
    

This can give you trouble when you are defining strings in an indented code block, in these cases you may be better off using the \n special character, which creates new lines.

Triple quotes can be used for single line strings as well. This may come in handy when single or double quotes are no longer an option:

print('''I said: "Hello world! How's it going?" ''')
I said: "Hello world! How's it going?" 

Concatenation +

For strings the + symbol is used to concatenate two strings together. For example:

print('One string' + ' and another')
One string and another

Duplication *

The duplication * operator takes a string and an integer and repeats the string as many times as the integer value:

print('hello '*4)
print(2*'bye ')
hello hello hello hello 
bye bye 

Indexing []

Strings can be seen as a collection of characters. Each of these character has an integer index associated with it, based on it’s position in the string. For example, take the string 'computer':

character c o m p u t e r
index 0 1 2 3 4 5 6 7

You can access individual characters in the string by index using:

string[index]

for example:

computer_string = 'computer'

print('Index 3:', computer_string[3])

print('Index 7:', computer_string[7])
Index 3: p
Index 7: r

If you use an index that is too large for the given string, Python will return an error:

print('Index 11', computer_string[11])
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-10-abeba3add71f> in <module>
----> 1 print('Index 11', computer_string[11])

IndexError: string index out of range

You can find the number of characters in a string using the len() function:

print('There are', len(computer_string), 'characters in the string')
There are 8 characters in the string

Notice how the length of computer_string is one greater than its largest index. This is because Python indexes from 0.

Thus, if we don’t know how long a string is before hand (if a variable holding a string is subject to change for instance) and we want to index the last value of the string, we could use len() - 1 as the index:

print('The last character:', computer_string[len(computer_string) - 1])
The last character: r

This method works, but Python gives us a far cleaner way of doing this: using an index of -1. This won’t work for most other programming languages.

print('The last character:', computer_string[-1])
The last character: r

In general, negative indices in Python index the strings (and other objects) backwards:

print('Second last character', computer_string[-2])

print('Third last character', computer_string[-3])
Second last character e
Third last character t

Note that the index -8 corresponds to the 0 index (len(computer_string) - 8 is 0) so anything less than this would be out of bounds.

Slicing

Slicing allows us to extract segments of the string, as apposed to individual characters. The syntax for string slicing is:

string[start_index:stop_index]

where the stop_index is not included in the slice, rather the slice stops before this index. For example, consider the slice:

print(computer_string[2:5])
mpu

where the last character is 'u', but the character with index 5 is 't'.

If we want to take a slice from the beginning of a string we could use 0 as the start_index:

print(computer_string[0:3])
com

Alternatively if we left the start_index blank Python will interprate this as starting from the beginning of the string:

print(computer_string[:3])
com

Similarly if we wanted to take a slice up to and including the last character in the string, we can use:

print(computer_string[3:len(computer_string)])
puter

or simply leave the stop_index blank:

print(computer_string[3:])
puter

Notice the slice above is not the same as if we used -1 as the stop_index:

print(computer_string[3:-1])
pute

even though the same rules apply as with indexing, the slice always stops before the stop_index.

We can use a third index when slicing as a step size:

string[start_index: stop_index: step_size]

For example, we can get every second character from a string using a step size of 2:

print('Starting from 0:', computer_string[0:8:2])
print('Starting from 1:', computer_string[1:8:2])
Starting from 0: cmue
Starting from 1: optr

The step size can be any integer. Note that by default it is set to 1. As another example lets print out every second character from computer_string starting from the first:

print(computer_string[::3])
cpe

The step size need not be positive. If a negative step size is used the string will be sliced backwards. For example if we want to print out the whole of computer_string backwards:

print(computer_string[::-1])
retupmoc

Note, when slicing with a negative step size you must ensure that start_index is greater than stop_index, otherwise your slice will be empty.

print('Empty slice:', computer_string[0:6:-1])
print('Not empty slice:', computer_string[6:0:-1])
Empty slice: 
Not empty slice: etupmo

Also notice how, in the second slice above, the 0 index character is not present. Even when slicing with a negative step size the stop_index is not included in the slice.

References

Str1

Dr. Andrew N. Harrington. Hands-on python 3 tutorial. 2019. [Online; accessed 6-December-2019; released under the CC BY-NC-SA 4.0 license]. URL: http://anh.cs.luc.edu/python/hands-on/3.1/handsonHtml/index.html.