Python and the system

Python can be used to manage and to communicate with the operating system. The standard library contains functions for

and much more...

String processing

String objects (variables) offer a variety of methods (functions) that can be used to format and process the string. The description of the built-in methods can be found in the standard documentation about string methods.

Examples

Split

Returns a list of the words in the string S, using an optional separator as a delimiter string.

   1 S='Hello string'
   2 S.split()
   3 
   4 ['Hello', 'string']

Time and date

Example: profiling

A common problem is the optimization of run time execution of a program. At first you have to identify the bottlenecks. Simple profiling with a timer can be done like this:

   1 time0=time.clock()
   2 some_function()# You want to know how long it takes to execute this function
   3 time1=time.clock()
   4 print 'Elapsed time',time1-time0

More sophisticated functions for profiling can be found here.

When you have found a bottleneck you find here some performance tips to optimize your code

Input output

This lesson deals with the ways of reading and writing data in different formats

Basic Python

The file object can be used for reading and writing plain text as well as unformatted binary data. The following code writes a message in the file with the name out.txt, reads and print the data

   1 file('out.txt','w').write('Hallo Datentraeger')
   2 print file('out.txt').read()

Pickle

The pickle module implements an algorithm for serializing and de-serializing a Python object structure. Pickling is the process whereby a Python object hierarchy is converted into a byte stream, and unpickling is the inverse operation, whereby a byte stream is converted back into an object hierarchy. The cPickle module is a much faster implementation and should be preferred.

   1 a={'A':1}# a python object
   2 pickle.dump(a,open('test.dat','w')) # writes object to file
   3 
   4 b=pickle.load(open('test.dat','r')) # reads the object from file

Comma Separated Values

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets (Excel) and databases. The csv module enables CSV file reading and writing

Pylab

X = load('test.dat')  # data in two columns
t = X[:,0]
y = X[:,1]

Interaction with the operating system

The modules sys and os provide the basic interface to the operating system. The module os creates a portable abstraction layer which is used by high-level modules like glob, socket, thred, time, fcntl.

Module sys

The module sys provides access to system-specific parameters by the interpreter.

Example argv

   1 #system1.py
   2 import sys
   3 print sys.argv

run system1.py parameter1 parameter2
['system1.py', 'parameter1', 'parameter2']

The script prints the command line arguments that are passed to the script. argv[0] is the script name

A more sophisticated way of evaluating command line arguments is provided by the module optparse

Module os

The module os is a portable operating system interface.

Some examples:

Module fnmatch

The module fnmatch provides support for Unix shell-style wildcards

Module glob

The module glob finds all the pathnames matching a specified pattern according to the rules used by the Unix shell.

The following example looks for all pdf files in the current working directory and converts them into postscript files.

   1 #!/usr/bin/env python
   2 import os,glob
   3 
   4 filelist=glob.glob('*.pdf')
   5 for f in filelist:
   6     psfilename=f.split('.')[0]+'.ps'
   7     cmd='pdftops '+f+' '+psfilename
   8     print cmd
   9     os.system(cmd)

Module shutil — High-level file operations

The shutil module offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal.

Unix Specific Services

Features that are unique to the Unix operating system are for example shell pipelines (data streams) that pipe the output of one program to another. The pipeline symbol is |. For example, the command ls -s | sort -rg pipes the output of ls -s to the sort program. The result is a list of filenames sorted by its size

A python pipeline to a Unix programm can be established using the module pipes

System programming: walk example

The following script walks through a directory tree and looks for all files with the matching extension:

   1 #!/usr/bin/env python
   2 import os,fnmatch,sys
   3 
   4 # Usage:
   5 # ./walkdir.py directory extension
   6 
   7 dir,ext=sys.argv[1],sys.argv[2]
   8 for root, dirs, files in os.walk(dir):
   9     f=fnmatch.filter(files,'*.'+ext)
  10     if type(f)==type([]):
  11         for fi in f:
  12             print root+fi

Save the file as walkdir.py and use chmod +x walkdir.py to set the execution permissions of the file. The first magic line starts the python interpreter. The script can be exectuted on the bash shell using:

./walkdir.py $HOME/subdir ps

Without the magic line, the script has to be run like this:

python walkdir.py $HOME/sync/ ps

Or within ipython using run

Exercise 1

Read in the following meta-data and extract the field ACQUISITION_DATE L5058011_01120090712_MTL.txt

/Solution1

Exercise 2

Dear User,

according to our records you use too much disk space in your home directory.
Please delete at least 862312312.29 Megabytes in the next 1.2 days.

If you have any questions do not hesitate to send a mail to
help-it@zmaw.de.

best regards

Central IT Services - ZMAW

Have you ever received such an email? If not, perfect! Otherwise you should look at least for the largest files.

/Solution2

LehreWiki: OpenSource2010/Lesson4 (last edited 2010-11-08 11:24:58 by anonymous)