7. The os module (and sys, and path)¶
The os and sys modules provide numerous tools to deal with filenames, paths, directories. The os module contains two sub-modules os.sys (same as sys) and os.path that are dedicated to the system and directories; respectively.
Whenever possible, you should use the functions provided by these modules for file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like os.path.split work on UNIX, Windows, Mac OS, and any other platform supported by Python.
7.1. Quick start¶
You can build multi-platform path using the proper separator symbol:
>>> import os
>>> import os.path
>>> os.path.join(os.sep, 'home', 'user', 'work')
'/home/user/work'
>>> os.path.split('/usr/bin/python')
('/usr/bin', 'python')
7.2. Functions¶
The os module has lots of functions. We will not cover all of them thoroughly but this could be a good start to use the module.
7.2.1. Manipulating Directories¶
The getcwd() function returns the current directory (in unicode format with getcwdu() ).
The current directory can be changed using chdir():
os.chdir(path)
The listdir() function returns the content of a directory. Note, however, that it mixes directories and files.
The mkdir() function creates a directory. It returns an error if the parent directory does not exist. If you want to create the parent directory as well, you should rather use makedirs():
>>> os.mkdir('temp') # creates temp directory inside the current directory
>>> os.makedirs(/tmp/temp/temp")
Once created, you can delete an empty directory with rmdir():
>>> import os
>>> os.mkdir('/tmp/temp')
>>> os.rmdir('/tmp/temp')
You can remove all directories within a directory (if there are not empty) by using os.removedirs().
If you want to delete a non-empty directory, use shutil.rmtree() (with cautious).
7.2.2. Removing a file¶
To remove a file, use os.remove(). It raise the OSError exception if the file cannot be removed. Under Linux, you can also use os.unlink().
7.2.3. Renaming files or directories¶
You can rename a file from an old name to a new one by using os.rename(). See also os.renames().
7.2.4. Permission¶
you can change the mode of a file using chmod(). See also chown, chroot, fchmod, fchown.
The os.access() verifies the access permission specified in the mode argument. Returns 1 if the access is granted, 0 otherwise. The mode can be:
os.F_OK | Value to pass as the mode parameter of access() to test the existence of path. |
os.R_OK: | Value to include in the mode parameter of access() to test the readability of path. |
os.W_OK | Value to include in the mode parameter of access() to test the writability of path. |
os.X_OK | Value to include in the mode parameter of access() to determine if path can be |
>>> os.access("validFile", os.F_OK)
True
You can change the mask of a file using the the os.umask() function. The mask is just a number that summarises the permissions of a file:
os.umask(644)
7.2.5. Using more than one process¶
On Unix systems, os.fork() tells the computer to copy everything about the currently running program into a newly created program that is separated, but almost entirely identical. The newly created process is the child process and gets the data and code of the parent process. The child process gets a process number known as pid. The parent and child processes are independent.
The following code works on Unix and Unix-like systems only:
import os
pid = os.fork()
if pid == 0: # the child
print "this is the child"
elif pid > 0:
print "the child is pid %d" % pid
else:
print("An error occured")
Here, the fork is zithin the executed script but ,ost of the time; you would require the
One of the most common things to do after an os.fork call is to call os.execl immediately afterward to run another program. os.execl is an instruction to replace the running program with a new program, so the calling program goes away, and a new program appears in its place:
import os
pid = os.fork()
# fork and exec together
print "second test"
if pid == 0: # This is the child
print "this is the child"
print "I'm going to exec another program now"
os.execl(`/bin/cat', `cat', `/etc/motd')
else:
print "the child is pid %d" % pid
os.wait()
The os.wait function instructs Python that you want the parent to not do anything until the child process returns. It is very useful to know how this works because it works well only under Unix and Unix-like platforms such as Linux. Windows also has a mechanism for starting up new processes. To make the common task of starting a new program easier, Python offers a single family of functions that combines os.fork and os.exec on Unix-like systems, and enables you to do something similar on Windows platforms. When you want to just start up a new program, you can use the os.spawn family of functions.
The different between the different spawn versions:
- v requires a list/vector os parameters. This allows a command to be run with very different commands from one instance to the next without needing to alter the program at all.
- l requires a simple list of parameters.
- e requires a dictionary containing names and values to replace the current environment.
- p requires the value of the PATH key in the environment dictionary to find the program. The
p variants are available only on Unix-like platforms. The least of what this means is that on Windows your programs must have a completely qualified path to be usable by the os.spawn calls, or you have to search the path yourself:
import os, sys
if sys.platform == `win32':
print "Running on a windows platform"
command = "C:\\winnt\\system32\\cmd.exe"
params = []
if sys.platform == `linux2':
print "Running on a Linux system, identified by %s" % sys.platform
command = `/bin/uname'
params = [`uname', `-a']
print "Running %s" % command
os.spawnv(os.P_WAIT, command, params)
The exec function comes in different flavours:
- execl(path, args) or execle(path, args, env) env is a dict with env variables.
- exexp(file; a1; a2, a3) or exexp(file; a1; a2, a3, env)
todo
os.getloadavg os.setegid
os.getlogin os.seteuid
os.abort os.getpgid os.setgid
os.getpgrp os.setgroups
os.setpgid os.setpgrp
os.UserDict os.getresgid os.setregid
os.getresuid os.setresgid os.getsid
os.setresuid os.setreuid
os.closerange os.initgroups os.setsid
os.confstr os.isatty os.setuid
os.confstr_names os.ctermid
os.defpath os.devnull
os.link os.dup os.dup2
os.errno os.major
os.error os.makedev os.stat_float_times
os.execl
os.execle os.minor os.statvfs
os.execlp os.statvfs_result
os.execlpe os.mkfifo os.strerror
os.execv os.mknod os.symlink
os.execve
os.execvp os.sysconf
os.execvpe os.open os.sysconf_names
os.extsep os.openpty os.system
os.fchdir os.pardir os.tcgetpgrp
os.tcsetpgrp os.pathconf os.tempnam
os.fdatasync os.pathconf_names os.times
os.fdopen os.tmpfile
os.pipe os.tmpnam
os.forkpty os.popen os.ttyname
os.fpathconf os.popen2 os.popen3
os.fstatvfs os.popen4
os.fsync os.putenv os.unsetenv
os.ftruncate os.read os.urandom
os.readlink os.utime
os.wait os.wait3
os.getenv os.wait4
os.waitpid os.getgroups
The os.walk() function allows to recursively scan a directory and obtain tuples containing tuples of (dirpath, dirnames, filename) where dirnames is a list of directories found in dirpath, and filenames the list of files found in dirpath.
Alternatevely, the os.path.walk can also be used but works in a different way (see below).
7.2.6. user id and processes¶
- os.getuid() returns the current process’s user id.
- os.getgid() returns the current process’s group id.
- os.geteuid() and os.getegid() returns the effective user id and effective group id
- os.getpid() returns the current process id
- os.getppid() returns the parent’s process id
7.3. Cross platform os attributes¶
An alternative character used by the OS to separate pathame components is provided by os.altsep().
The os.curdir() refers to the current directory. . for unix and windows and : for Mac OS.
Another multi-platform function that could be useful is the line separator. Indeed the final character that ends a line is coded differently under Linux, Windows and MAC. For instance under Linux, it is the n character but you may have r or rn. Using the os.linesep() guarantees to use a universal line_ending character.
The os.uname gives more information about your system:
>>> os.uname
('Linux',
'localhost.localdomain',
'3.3.4-5.fc17.x86_64',
'#1 SMP Mon May 7 17:29:34 UTC 2012',
'x86_64')
The function os.name() returns the OS-dependent module (e.g., posix, doc, mac,...)
The function os.pardir() refers to the parent directory (.. for unix and windows and :: for Mac OS).
The os.pathsep() function (also found in os.path.sep()) returns the correct path separator for your system (slash / under Linux and backslash under Windows).
Finally, the os.sep() is the character that separates pathname components (/ for Unix, for windows and : for Mac OS). It is also available in os.path.sep()
>>> # under linux
>>> os.path.sep
'/'
Another function that is related to multi-platform situations is the os.path.normcase() that is useful under Windows where the OS ignore cases. So, to compare two filenames you will need this function.
7.3.1. More about directories and files¶
os.path provides methods to extract information about path and file names:
>>> os.path.curdir # returns the current directory ('.')
>>> os.path.isdir(dir) # returns True if dir exists
>>> os.path.isfile(file) # returns True if file exists
>>> os.path.islink(link) # returns True if link exists
>>> os.path.exists(dir) # returns True if dir exists (full pathname or filename)
>>> os.path.getsize(filename) # returns size of a file without opening it.
You can access to the time when a file was last modified. Nevertheless, the output is not friendly user. Under Unix it corresponds to the time since the Jan 1, 1970 (GMT) and under Mac OS since Jan 1, 1904 (GMT)Use the time module to make it easier to read:
>>> import time
>>> mtime = os.path.getmtime(filename) # returns time when the file was last modified
The output is not really meaningful since it is expressed in seconds. You can use the time module to get a better layout of that time:
>>> print time.ctime(mtime)
Tue Jan 01 02:02:02 2000
Similarly, the function os.path.getatime() returns the last access time of a file and os.path.getctime() the metadata change time of a file.
Finally, you can get a all set of information using os.stat() such as file’s size, access time and so on. The stat() returns a tuple of numbers, which give you information about a file (or directory).
>>> import stat
>>> import time
>>> def dump(st):
... mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime = st
... print "- size:", size, "bytes"
... print "- owner:", uid, gid
... print "- created:", time.ctime(ctime)
... print "- last accessed:", time.ctime(atime)
... print "- last modified:", time.ctime(mtime)
... print "- mode:", oct(mode)
... print "- inode/dev:", ino, dev
>>> dump(os.stat("todo.txt"))
- size: 0 bytes
- owner: 1000 1000
- created: Wed Dec 19 19:40:02 2012
- last accessed: Wed Dec 19 19:40:02 2012
- last modified: Wed Dec 19 19:40:02 2012
- mode: 0100664
- inode/dev: 23855323 64770
There are other similar function os.lstat() for symbolic links, os.fstat() for file descriptor
You can determine is a path is a mount point using os.ismount(). Under unix, it checks if a path or file is mounted on an other device (e.g. an external hard disk).
7.3.2. Splitting paths¶
To get the base name of a path (last component):
>>> import os
>>> os.path.basename("/home/user/temp.txt")
temp.txt
To get the directory name of a path, use os.path.dirname():
>>> import os
>>> os.path.dirname("/home/user/temp.txt")
/home/user
The os.path.abspath() returns the absolute path of a file:
>>> import os
>>> os.path.abspath('temp.txt')
In summary, consider a file temp.txt in /home/user:
function | Output |
---|---|
basename | ‘temp.txt’ |
dirname | ‘’ |
split | (‘’, ‘temp.txt’) |
splitdrive | (‘’, ‘temp.txt’) |
splitext | (‘temp’; ‘txt’) |
abspath | ‘/home/user/temp.txt |
os.path.extsep os.path.genericpath os.path.realpath
os.path.relpath os.path.samefile
os.path.sameopenfile os.path.samestat
os.path.isab
os.path.commonprefix
os.path.defpath os.path.supports_unicode_filenames
os.path.devnull os.path.lexists
os.path.warnings .expanduser os.path.expandvars
Split the basename and directory name in one function call using os.path.split(). The split function only splits off the last part of a component. In order to split off all parts, you need to write your own function:
Note
the path should not end with ‘/’, otherwise the name is empty.
os.path.split(‘/home/user’) is not the same as os.path.split(‘/home/user/’)
>>> def split_all(path):
... parent, name = os.path.split(path)
... if name == '':
... return (parent, )
... else:
... return split_all(parent) + (name,)
>>> split_all('/home/user/Work')
('/', 'home', 'user', 'Work')
The os.path.splitext() function splits off the extension of a file:
>>> os.path.splitext('image.png')
('image', 'png')
For windows users, you can use the os.splitdrive() that returns a tuple with 2 strings, there first one being the drive.
Conversely, the join method allows to join several directory name to create a full path name:
>>> os.path.join('/home', 'user')
'/home/user'
os.path.walk() scan a directory recursively and apply a function of each item found (see also os.walk() above):
def print_info(arg, dir, files):
for file in files:
print dir + ' ' + file
os.path.walk('.', print_info, 0)
7.4. Accessing environment variables¶
You can easily acecss to the environmental variables:
import os
os.environ.keys()
and if you know what you are doing, you can add or replace a variable:
os.environ[NAME] = VALUE
7.5. sys module¶
When starting a Python shell, Python provides 3 file objects called stadnard input, stadn output and standard error. There are accessible via the sys module:
sys.stderr
sys.stdin
sys.stdout
The sys.argv is used to retrieve user argument when your module is executable.
Another useful attribute in the sys.path that tells you where Python is searching for modules on your system. see Module for more details.
7.5.1. Information¶
- sys.platform returns the platform version (e.g., linux2)
- sys.version returns the python version
- sys.version_info returns a named tuple
sys.exitfunc sys.last_value sys.pydebug
sys.flags sys.long_info sys.real_prefix
sys.builtin_module_names sys.float_info sys.setcheckinterval
sys.byteorder sys.float_repr_style sys.maxsize sys.setdlopenflags
sys.call_tracing sys.getcheckinterval sys.maxunicode sys.setprofile
sys.callstats sys.meta_path sys.copyright
sys.getdlopenflags sys.modules sys.settrace
sys.displayhook sys.getfilesystemencoding sys.path
sys.dont_write_bytecode sys.getprofile sys.path_hooks
sys.exc_clear sys.path_importer_cache
sys.exc_info sys.getrefcount sys.exc_type sys.getsizeof sys.prefix sys.excepthook
sys.gettrace sys.ps1
sys.exec_prefix sys.ps2 sys.warnoptions
sys.executable sys.last_traceback sys.ps3
sys.last_type sys.py3kwarning
The sys.modules attribute returns list of all the modules that have been imported so far in your environment.
7.5.2. recursion¶
See the Functions section to know more about recursions. You can limit the number of recursions and know about the number itself using the sys.getrecursionlimit() and sys.setrecursionlimit() functions.