Python Notes (0.14.0)

7. The os module (and sys, and path)

The os and sys modules provide numerous tools to deal with filenames, paths, directories. The os module contains two sub-modules os.sys (same as sys) and os.path that are dedicated to the system and directories; respectively.

Whenever possible, you should use the functions provided by these modules for file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like os.path.split work on UNIX, Windows, Mac OS, and any other platform supported by Python.

See also

These shutil, tempfile, glob modules from the Python documentation.

7.1. Quick start

You can build multi-platform path using the proper separator symbol:

>>> import os
>>> import os.path
>>> os.path.join(os.sep, 'home', 'user', 'work')
'/home/user/work'

>>> os.path.split('/usr/bin/python')
('/usr/bin', 'python')

7.2. Functions

The os module has lots of functions. We will not cover all of them thoroughly but this could be a good start to use the module.

7.2.1. Manipulating Directories

The getcwd() function returns the current directory (in unicode format with getcwdu() ).

The current directory can be changed using chdir():

os.chdir(path)

The listdir() function returns the content of a directory. Note, however, that it mixes directories and files.

The mkdir() function creates a directory. It returns an error if the parent directory does not exist. If you want to create the parent directory as well, you should rather use makedirs():

>>> os.mkdir('temp') # creates temp directory inside the current directory
>>> os.makedirs(/tmp/temp/temp")

Once created, you can delete an empty directory with rmdir():

>>> import os
>>> os.mkdir('/tmp/temp')
>>> os.rmdir('/tmp/temp')

You can remove all directories within a directory (if there are not empty) by using os.removedirs().

If you want to delete a non-empty directory, use shutil.rmtree() (with cautious).

7.2.2. Removing a file

To remove a file, use os.remove(). It raise the OSError exception if the file cannot be removed. Under Linux, you can also use os.unlink().

7.2.3. Renaming files or directories

You can rename a file from an old name to a new one by using os.rename(). See also os.renames().

7.2.4. Permission

you can change the mode of a file using chmod(). See also chown, chroot, fchmod, fchown.

The os.access() verifies the access permission specified in the mode argument. Returns 1 if the access is granted, 0 otherwise. The mode can be:

   
os.F_OK Value to pass as the mode parameter of access() to test the existence of path.
os.R_OK: Value to include in the mode parameter of access() to test the readability of path.
os.W_OK Value to include in the mode parameter of access() to test the writability of path.
os.X_OK Value to include in the mode parameter of access() to determine if path can be
>>> os.access("validFile", os.F_OK)
True

You can change the mask of a file using the the os.umask() function. The mask is just a number that summarises the permissions of a file:

os.umask(644)

7.2.5. Using more than one process

On Unix systems, os.fork() tells the computer to copy everything about the currently running program into a newly created program that is separated, but almost entirely identical. The newly created process is the child process and gets the data and code of the parent process. The child process gets a process number known as pid. The parent and child processes are independent.

The following code works on Unix and Unix-like systems only:

import os
pid = os.fork()
if pid == 0: # the child
     print "this is the child"
elif pid > 0:
     print "the child is pid %d" % pid
else:
    print("An error occured")

Here, the fork is zithin the executed script but ,ost of the time; you would require the

One of the most common things to do after an os.fork call is to call os.execl immediately afterward to run another program. os.execl is an instruction to replace the running program with a new program, so the calling program goes away, and a new program appears in its place:

import os
pid = os.fork()
# fork and exec together
print "second test"
if pid == 0: # This is the child
    print "this is the child"
    print "I'm going to exec another program now"
    os.execl(`/bin/cat', `cat', `/etc/motd')
else:
    print "the child is pid %d" % pid
    os.wait()

The os.wait function instructs Python that you want the parent to not do anything until the child process returns. It is very useful to know how this works because it works well only under Unix and Unix-like platforms such as Linux. Windows also has a mechanism for starting up new processes. To make the common task of starting a new program easier, Python offers a single family of functions that combines os.fork and os.exec on Unix-like systems, and enables you to do something similar on Windows platforms. When you want to just start up a new program, you can use the os.spawn family of functions.

The different between the different spawn versions:

  • v requires a list/vector os parameters. This allows a command to be run with very different commands from one instance to the next without needing to alter the program at all.
  • l requires a simple list of parameters.
  • e requires a dictionary containing names and values to replace the current environment.
  • p requires the value of the PATH key in the environment dictionary to find the program. The

p variants are available only on Unix-like platforms. The least of what this means is that on Windows your programs must have a completely qualified path to be usable by the os.spawn calls, or you have to search the path yourself:

import os, sys
if sys.platform == `win32':
     print "Running on a windows platform"
     command = "C:\\winnt\\system32\\cmd.exe"
     params = []
if sys.platform == `linux2':
     print "Running on a Linux system, identified by %s" % sys.platform
     command = `/bin/uname'
     params = [`uname', `-a']
print "Running %s" % command
os.spawnv(os.P_WAIT, command, params)

The exec function comes in different flavours:

  • execl(path, args) or execle(path, args, env) env is a dict with env variables.
  • exexp(file; a1; a2, a3) or exexp(file; a1; a2, a3, env)

todo

os.getloadavg              os.setegid
os.getlogin                os.seteuid
os.abort                   os.getpgid                 os.setgid
os.getpgrp                 os.setgroups
os.setpgid                 os.setpgrp
os.UserDict                os.getresgid               os.setregid
os.getresuid               os.setresgid               os.getsid
os.setresuid               os.setreuid
os.closerange              os.initgroups              os.setsid
os.confstr                 os.isatty                  os.setuid
os.confstr_names           os.ctermid
os.defpath                 os.devnull
os.link                    os.dup                     os.dup2
os.errno        os.major
os.error                   os.makedev                 os.stat_float_times
os.execl
os.execle                  os.minor                   os.statvfs
os.execlp                  os.statvfs_result
os.execlpe                 os.mkfifo                  os.strerror
os.execv                   os.mknod                   os.symlink
os.execve
os.execvp                  os.sysconf
os.execvpe                 os.open                    os.sysconf_names
os.extsep                  os.openpty                 os.system
os.fchdir                  os.pardir                  os.tcgetpgrp
os.tcsetpgrp    os.pathconf                os.tempnam
os.fdatasync               os.pathconf_names          os.times
os.fdopen                  os.tmpfile
os.pipe                    os.tmpnam
os.forkpty                 os.popen                   os.ttyname
os.fpathconf               os.popen2                  os.popen3
os.fstatvfs                os.popen4
os.fsync                   os.putenv                  os.unsetenv
os.ftruncate               os.read                    os.urandom
os.readlink                os.utime
os.wait                    os.wait3
os.getenv                  os.wait4
os.waitpid                os.getgroups

The os.walk() function allows to recursively scan a directory and obtain tuples containing tuples of (dirpath, dirnames, filename) where dirnames is a list of directories found in dirpath, and filenames the list of files found in dirpath.

Alternatevely, the os.path.walk can also be used but works in a different way (see below).

7.2.6. user id and processes

7.3. Cross platform os attributes

An alternative character used by the OS to separate pathame components is provided by os.altsep().

The os.curdir() refers to the current directory. . for unix and windows and : for Mac OS.

Another multi-platform function that could be useful is the line separator. Indeed the final character that ends a line is coded differently under Linux, Windows and MAC. For instance under Linux, it is the n character but you may have r or rn. Using the os.linesep() guarantees to use a universal line_ending character.

The os.uname gives more information about your system:

>>> os.uname
('Linux',
'localhost.localdomain',
'3.3.4-5.fc17.x86_64',
'#1 SMP Mon May 7 17:29:34 UTC 2012',
'x86_64')

The function os.name() returns the OS-dependent module (e.g., posix, doc, mac,...)

The function os.pardir() refers to the parent directory (.. for unix and windows and :: for Mac OS).

The os.pathsep() function (also found in os.path.sep()) returns the correct path separator for your system (slash / under Linux and backslash under Windows).

Finally, the os.sep() is the character that separates pathname components (/ for Unix, for windows and : for Mac OS). It is also available in os.path.sep()

>>> # under linux
>>> os.path.sep
'/'

Another function that is related to multi-platform situations is the os.path.normcase() that is useful under Windows where the OS ignore cases. So, to compare two filenames you will need this function.

7.3.1. More about directories and files

os.path provides methods to extract information about path and file names:

>>> os.path.curdir # returns the current directory ('.')
>>> os.path.isdir(dir) # returns True if dir exists
>>> os.path.isfile(file) # returns True if file exists
>>> os.path.islink(link) # returns True if link exists
>>> os.path.exists(dir) # returns True if dir exists (full pathname or filename)
>>> os.path.getsize(filename) # returns size of a file without opening it.

You can access to the time when a file was last modified. Nevertheless, the output is not friendly user. Under Unix it corresponds to the time since the Jan 1, 1970 (GMT) and under Mac OS since Jan 1, 1904 (GMT)Use the time module to make it easier to read:

>>> import time
>>> mtime = os.path.getmtime(filename) # returns time when the file was last modified

The output is not really meaningful since it is expressed in seconds. You can use the time module to get a better layout of that time:

>>> print time.ctime(mtime)
Tue Jan 01 02:02:02 2000

Similarly, the function os.path.getatime() returns the last access time of a file and os.path.getctime() the metadata change time of a file.

Finally, you can get a all set of information using os.stat() such as file’s size, access time and so on. The stat() returns a tuple of numbers, which give you information about a file (or directory).

>>> import stat
>>> import time
>>> def dump(st):
...    mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime = st
...    print "- size:", size, "bytes"
...    print "- owner:", uid, gid
...    print "- created:", time.ctime(ctime)
...    print "- last accessed:", time.ctime(atime)
...    print "- last modified:", time.ctime(mtime)
...    print "- mode:", oct(mode)
...    print "- inode/dev:", ino, dev
>>> dump(os.stat("todo.txt"))
- size: 0 bytes
- owner: 1000 1000
- created: Wed Dec 19 19:40:02 2012
- last accessed: Wed Dec 19 19:40:02 2012
- last modified: Wed Dec 19 19:40:02 2012
- mode: 0100664
- inode/dev: 23855323 64770

There are other similar function os.lstat() for symbolic links, os.fstat() for file descriptor

You can determine is a path is a mount point using os.ismount(). Under unix, it checks if a path or file is mounted on an other device (e.g. an external hard disk).

7.3.2. Splitting paths

To get the base name of a path (last component):

>>> import os
>>> os.path.basename("/home/user/temp.txt")
temp.txt

To get the directory name of a path, use os.path.dirname():

>>> import os
>>> os.path.dirname("/home/user/temp.txt")
/home/user

The os.path.abspath() returns the absolute path of a file:

>>> import os
>>> os.path.abspath('temp.txt')

In summary, consider a file temp.txt in /home/user:

function Output
basename ‘temp.txt’
dirname ‘’
split (‘’, ‘temp.txt’)
splitdrive (‘’, ‘temp.txt’)
splitext (‘temp’; ‘txt’)
abspath ‘/home/user/temp.txt
os.path.extsep       os.path.genericpath  os.path.realpath
os.path.relpath      os.path.samefile
os.path.sameopenfile os.path.samestat
os.path.isab
os.path.commonprefix
os.path.defpath      os.path.supports_unicode_filenames
os.path.devnull      os.path.lexists
os.path.warnings     .expanduser       os.path.expandvars

Split the basename and directory name in one function call using os.path.split(). The split function only splits off the last part of a component. In order to split off all parts, you need to write your own function:

Note

the path should not end with ‘/’, otherwise the name is empty.

os.path.split(‘/home/user’) is not the same as os.path.split(‘/home/user/’)

>>> def split_all(path):
...    parent, name = os.path.split(path)
...    if name == '':
...        return (parent, )
...    else:
...        return split_all(parent) + (name,)
>>> split_all('/home/user/Work')
('/', 'home', 'user', 'Work')

The os.path.splitext() function splits off the extension of a file:

>>> os.path.splitext('image.png')
('image', 'png')

For windows users, you can use the os.splitdrive() that returns a tuple with 2 strings, there first one being the drive.

Conversely, the join method allows to join several directory name to create a full path name:

>>> os.path.join('/home', 'user')
'/home/user'

os.path.walk() scan a directory recursively and apply a function of each item found (see also os.walk() above):

def print_info(arg, dir, files):
    for file in files:
        print dir + '    ' + file
os.path.walk('.', print_info, 0)

7.4. Accessing environment variables

You can easily acecss to the environmental variables:

import os
os.environ.keys()

and if you know what you are doing, you can add or replace a variable:

os.environ[NAME] = VALUE

7.5. sys module

When starting a Python shell, Python provides 3 file objects called stadnard input, stadn output and standard error. There are accessible via the sys module:

sys.stderr
sys.stdin
sys.stdout

The sys.argv is used to retrieve user argument when your module is executable.

Another useful attribute in the sys.path that tells you where Python is searching for modules on your system. see Module for more details.

7.5.1. Information

  • sys.platform returns the platform version (e.g., linux2)
  • sys.version returns the python version
  • sys.version_info returns a named tuple
sys.exitfunc               sys.last_value             sys.pydebug
sys.flags                  sys.long_info              sys.real_prefix
sys.builtin_module_names   sys.float_info             sys.setcheckinterval
sys.byteorder              sys.float_repr_style       sys.maxsize                sys.setdlopenflags
sys.call_tracing           sys.getcheckinterval       sys.maxunicode             sys.setprofile
sys.callstats              sys.meta_path             sys.copyright
sys.getdlopenflags         sys.modules                sys.settrace
sys.displayhook            sys.getfilesystemencoding  sys.path
sys.dont_write_bytecode    sys.getprofile             sys.path_hooks
sys.exc_clear              sys.path_importer_cache
sys.exc_info               sys.getrefcount              sys.exc_type               sys.getsizeof              sys.prefix                 sys.excepthook
sys.gettrace           sys.ps1
sys.exec_prefix     sys.ps2                    sys.warnoptions
sys.executable             sys.last_traceback         sys.ps3
sys.last_type              sys.py3kwarning

The sys.modules attribute returns list of all the modules that have been imported so far in your environment.

7.5.2. recursion

See the Functions section to know more about recursions. You can limit the number of recursions and know about the number itself using the sys.getrecursionlimit() and sys.setrecursionlimit() functions.

7.6. UserDict module

References:[Norton]