Python Notes (0.14.0)

4.4. Strings

See also

Print function

In short, strings are immutable sequence of characters. There are a lot of methods to ease manipulation and creation of strings as shown here below.

4.4.1. Creating a string (and special characters)

Single and double quotes are special characters. There are used to defined strings. There are actually 3 ways to define a string using either single, double or triple quotes:

text = 'The surface of the circle is 2 pi R = '
text = "The surface of the circle is 2 pi R = "
text = '''The surface of the circle is 2 pi R = '''

In fact the latest is generally written using triple double quotes:

text = """The surface of the circle is 2 pi R = """

Strings in double quotes work exactly the same as in single quotes but allow to insert single quote character inside them.

The interest of the triple quotes (‘’’ or “””) is that you can specify multi-line strings. Moreover, single quotes and double quotes can be used freely within the triple quotes:

text = """ a string with special character " and ' inside """

The ” and ‘ characters are part of the Python language; they are special characters. To insert them in a string, you have to escape them (i.e., with a \ chracter in front of them to indicate the special nature of the character). For instance:

text = " a string with escaped special character \", \' inside "

There are a few other special characters that must be escaped to be included in a string. See The print statement for more information.

To include unicode, you must precede the string with the u character:

>>> u"\u0041"
A

Note

unicode is a single character set used to represent 65536 different characters.

See also

See Encoding/Decoding/Unicode for more information about unicode.

Similarly, you may see strings preceded by the r character to indicate that the string has to be interpreted as it is without interpreting the special character \. This is useful for docstrings that contain latex code for instance:

r" \textbf{this is bold text in LaTeX} "

See also

See The print statement for more information about string format and printing.

4.4.2. Strings are immutable

You can access to any character using slicing:

text[0]
text[-1]
text[0:]

However, you cannot change any character:

text[0] = 'a' #this is incorrect.

4.4.3. Formatter

In Python, the % sign lets you produce formatted output. A quick example will illustrate how to print a formatted string:

>>> print("%s" % "some text")
"some text"

The syntax is simply:

string % values

If you have more than one value, they should be placed within brackets:

>>> print("%s %s" % ("a", "b"))

The string contains characters and conversion specifiers (here %s)

To escape the sign %, just double it:

>>> print "This is a percent sign: %%"
This is a percent sign: %

There are different ways of formatting a string with arguments. The one based on a string method called format() is more and more common:

>>> "{a}!={b}".format(a=2, b=1)
2!=1

See also

Print function

4.4.4. Operators

The mathematical operators + and * can be used to create new strings:

t = 'This is a test'
t2 = t+t
t3 = t*3

and comparison operators >, >=, ==, <=, < and != can be used to compare strings.

4.4.5. Methods

The string methods are numerous, however, many of them are similar (as you will see in this page).

4.4.5.1. Methods to query information

There are a few methods to check the type of alpha numeric characters present in a string: isdigit(), isalpha(), islower(), isupper(), istitle(), isspace(), str.isalnum():

>>> "44".isdigit()  # is the string made of digits only ?
True
>>> "44".isalpha()  # is the string made of alphabetic characters only ?
False
>>> "44".isalnum()  # is the string made of alphabetic characters or digits only ?
True
>>> "Aa".isupper()  # is it made of upper cases only ?
False
>>> "aa".islower()  # or lower cases only ?
True
>>> "Aa".istitle()  # does the string start with a capital letter ?
True
>>> text = "There are spaces but not only"
>>> text.isspace() # is the string made of spaces only ?
False

You can count the occurence of a character with count() or get the length of a string with len():

>>> mystr = "This is a string"
>>> mystr.count('i')
3
>>> len(mystr)
16

4.4.5.2. Methods that return a modified version of the string

The following methods return modified copy of the original string, which is immutable.

First, you can modify the cases using title(), capitalize(), lower(), upper() and swapcase():

>>> mystr = "this is a dummy string"
>>> mystr.title()       # return a titlecase version of the string
'This Is A Dummy String'
>>> mystr.capitalize()  # return a string with first letter capitalised only.
'This is a dummy string'
>>> mystr.upper()       # return a capitalised version of the string
'THIS IS A DUMMY STRING'
>>> mystr.lower()       # return a copy of the string converted to lower case
'this is a dummy string'
>>> mystr.swapcase()    # replace lower case by upper case and vice versa
'THIS IS A DUMMY STRING'

Second, you can add trailing characters with center() and just() methods:

>>> mystr = "this is a dummy string"
>>> mystr.center(40)              # center the string in a string of length 40
'         this is a dummy string         '
>>> mystr.ljust(30)               # justify the string to the left (width of 20)
'this is a dummy string        '
>>> mystr.rjust(30, '-')          # justify the string to the right (width of 20)
'--------this is a dummy string'

There is also a zfill() methods that adds zero to the left, which is equivalent to .rjust(width, '0'):

>>> mystr.zfill(30)
'00000000this is a dummy string'

or remove trailing spaces with the strip() methods:

>>> mystr = "  string with left and right spaces   "
>>> mystr.strip()
'string with left and right spaces'
>>> mystr.rstrip()
'  string with left and right spaces'
>>> mystr.lstrip()
'string with left and right spaces   '

or expand tabs with expandtabs():

>>> 'this is a \t tab'.expandtabs()
'this is a     tab'

You can remove some specific characters with translate() or replace words with replace():

>>> mystr = "this is a dummy string"
>>> mystr.replace('dummy', 'great', 1)  # the 1 means replace only once
'this is a great string'
>>> mystr.translate(None, 'aeiou')
ths s dmmy strng

Finally, you can separate a string with respect to a single separator with partition():

>>> mystr = "this is a dummy string"
>>> t.partition('is')
('th', 'is', ' is a line')
>>> t.rpartition('is')
('this ', 'is', ' a line')

4.4.5.3. Methods to find position of substrings

The are methods such as endswith(), startswith(), find() and index() that allow to search for substrings in a string.

>>> mystr = "This is a dummy string"
>>> mystr.endswith('ing')       # may provide optional start and end indices
True
>>> mystr.startswith('This')    # may provide start and end indices
True
>>> mystr.find('is')            # returns start index of 'is' first occurence
2
>>> mystr.find('is', 4)         # starting at index 4, returns start index of 'is' first occurence
5
>>> mystr.rfind('is')           # returns start index of 'is' last occurence
5
>>> mystr.index('is')           # like find but raises error when substring is not found
2
>>> mystr.rindex('is')          # like rfind but raises error when substring is not found
5

4.4.5.4. Methods to build or decompose a string

A useful function is the split() methods that splits a string according to a character. The inverse function exist and is called join().

>>> message = ' '.join(['this' ,'is', 'a', 'useful', 'method'])
>>> message
'this is a useful method'
>>> message.split(' ')
['this', 'is', 'a', 'useful', 'method']

The split() function can be applied to a limited number of times if needed. However, it starts from the left. If you want to start from the right, use rsplit() instead:

>>> message = ' '.join(['this' ,'is', 'a', 'useful', 'method'])
>>> message.rsplit(' ', 2)
['this is a', 'useful', 'method']

If a string is multi-lines, you can split it with splitlines():

>>> 'this is an example\n of\ndummy sentences'.splitlines()
['this is an example', ' of', 'dummy sentences']

you can keep the endline character by giving True as an optional argument.

Finally, note that split() removes the splitter:

>>> "this is an exemple".split(" is ")
['this', 'an exemple']

If you want to keep the splitter as well, use partition()

>>> "this is an exemple".partition(" is ")
('this', ' is ', 'an exemple')

4.4.5.5. Encoding/Decoding/Unicode

We’ve seen how to create a unicode by adding the letter u in front of a string:

s = u"\u0041"

The function unicode() converts a standard string to unicode string using the encoding specified as an argument (default is the default string encoding):

s = unicode("text", "ascii")

In order to figure out the default encoding, type:

>>> import sys
>>> sys.getdefaultencoding()
'ascii'

Here are some encodings:

ascii, utf-8, iso-8859-1, latin-1, utf-16, unicode-escape.

The unicode function takes also a third argument set to: ‘strict’, ‘ignore’ or ‘replace’.

Let us take another example with accents:

>>> # Let us start wil a special character.
>>> text = u"π"
>>> # to obtain its code (in utf-8), let us use the encode function
>>> encoded = text.encode("utf-8")
>>> decoded = text.decode("utf-8")

Todo

examples