#acl AdminGroup:read,write,delete,revert  EditorGroup:read,write,delete,revert All:read
#format wiki
#language en
#pragma section-numbers off


<<TableOfContents(3)>>

= Python and the system =

Python can be used to manage and to communicate with the operating system. The [[http://docs.python.org/library/|standard library]] contains functions for 

 * string processing
 * time and date
 * working with paths, directories and files
 * networking, email
 * interprocess and thread communication
 * graphical user interface
 * Unix, MS-Windows, Mac, SGI, and SunOS specific services

and much more...

= String processing  =

String objects (variables) offer a variety of methods (functions) that can be used to format and process the string. The description of the built-in methods can be found in the standard documentation about [[http://docs.python.org/library/stdtypes.html#string-methods|string methods]].

== Examples ==

=== Split ===
    
Returns a list of the words in the string S, using an optional separator as a delimiter string.

{{{#!python
S='Hello string'
S.split()

['Hello', 'string']
}}}



= Time and date =

 * [[http://docs.python.org/library/time.html|Time access and conversions]]
 * [[ http://docs.python.org/library/datetime.html#module-datetime|Basic date and time types]]

== Example: profiling ==

A common problem is the optimization of run time execution of a program. At first you have to identify the bottlenecks.
Simple profiling with a timer can be done like this:
{{{#!python
time0=time.clock()
some_function()# You want to know how long it takes to execute this function
time1=time.clock()
print 'Elapsed time',time1-time0
}}}

More sophisticated functions for profiling can be found [[http://docs.python.org/library/profile.html|here]]. 

When you have found a bottleneck you find here some [[http://wiki.python.org/moin/PythonSpeed/PerformanceTips|performance tips]] to optimize your code


= Input output =
This lesson deals with the ways of reading and writing data in different formats

== Basic Python ==
The [[http://docs.python.org/lib/bltin-file-objects.html|file object]] can be used for reading and  writing plain text as well as unformatted binary data. The following code writes a message in the file with the name ''out.txt'', reads and print the data

{{{
#!python

file('out.txt','w').write('Hallo Datentraeger')
print file('out.txt').read()
}}}
 * {{{write()}}} writes a string to the file
 * {{{read()}}} reads complete file
 * {{{read(N)}}} reads N bytes
 * {{{readlines()}}} reads the file with linebreaks
 * {{{readline()}}} reads only the next line

== Pickle ==
The [[http://docs.python.org/lib/module-pickle.html|pickle module]] implements an algorithm for serializing and de-serializing a Python object structure. ''Pickling'' is the process whereby a Python object hierarchy is converted into a byte stream, and ''unpickling'' is the inverse operation, whereby a byte stream is converted back into an object hierarchy. The [[http://docs.python.org/lib/module-cPickle.html|cPickle module]] is a much faster implementation and should be preferred.

{{{
#!python

a={'A':1}# a python object
pickle.dump(a,open('test.dat','w')) # writes object to file

b=pickle.load(open('test.dat','r')) # reads the object from file
}}}
== Comma Separated Values ==
The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets (Excel) and databases.  The [[http://docs.python.org/lib/module-csv.html|csv module]] enables CSV file reading and writing

== Pylab ==

{{{
X = load('test.dat')  # data in two columns
t = X[:,0]
y = X[:,1]
}}}


= Interaction with the operating system =

The modules sys and os provide the basic interface to the operating system. 
The module os creates a portable abstraction layer which is used by high-level modules like glob, socket, thred, time, fcntl.


== Module sys ==
The module [[http://docs.python.org/lib/module-sys.html|sys]] provides access to system-specific parameters  
by the interpreter.

=== Example argv ===
{{{#!python
#system1.py
import sys
print sys.argv
}}}

{{{
run system1.py parameter1 parameter2
['system1.py', 'parameter1', 'parameter2']
}}}
The script prints the command line arguments that are passed to the script. argv[0] is the script name

A more sophisticated way of evaluating command line arguments is provided by the module 
[[http://docs.python.org/lib/module-optparse.html|optparse]]

== Module os ==
The module [[http://docs.python.org/lib/module-os.html|os]] is a portable
operating system interface.

Some examples:

 * {{{os.system()}}} Executes the command (a string) in a subshell
 * {{{os.mkdir()}}} Creates a directory
 * {{{os.remove()}}} Deletes a file
 * {{{os.path.isdir()}}} Test if directory
 * {{{os.path.isfile()}}} Test if file
 * {{{os.path.exists()}}} Test if file or directory exists
 * {{{os.path.getsize()}}} Size of a file
 * {{{os.path.basename()}}} Base name of pathname
 * {{{os.walk()}}} Directory tree generator
 
== Module fnmatch ==

The module [[http://docs.python.org/library/fnmatch.html|fnmatch]] provides support for Unix shell-style wildcards

== Module glob ==

The module [[http://docs.python.org/lib/module-glob.html|glob]] finds all the pathnames matching a specified pattern according to the rules used by the Unix shell.

The following example looks for all pdf files in the current working directory and converts them into postscript files.
{{{#!python
#!/usr/bin/env python
import os,glob

filelist=glob.glob('*.pdf')
for f in filelist:
    psfilename=f.split('.')[0]+'.ps'
    cmd='pdftops '+f+' '+psfilename
    print cmd
    os.system(cmd)
}}}
== Module shutil — High-level file operations ==
The [[http://docs.python.org/library/shutil.html|shutil]] module offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal.

== Unix Specific Services ==
Features that are unique to the Unix operating system are for example shell pipelines (data streams) that ''pipe'' the output of one program to another. The pipeline symbol is {{{|}}}. For example, the command {{{ls -s | sort -rg}}} ''pipes'' the output of {{{ls -s}}} to the {{{sort}}} program. The result is a list of filenames sorted by its size 

A python pipeline to a Unix programm can be established using the module [[http://docs.python.org/lib/module-pipes.html|pipes]]

= System programming: walk example =

The following script walks through a directory tree and looks for all files with the matching extension:
{{{#!python
#!/usr/bin/env python
import os,fnmatch,sys

# Usage:
# ./walkdir.py directory extension

dir,ext=sys.argv[1],sys.argv[2]
for root, dirs, files in os.walk(dir):
    f=fnmatch.filter(files,'*.'+ext)
    if type(f)==type([]):
        for fi in f:
            print root+fi
}}}

Save the file as {{{walkdir.py}}} and use {{{chmod +x walkdir.py}}} to set the execution permissions of the file. The first magic line starts the python interpreter. The script can be exectuted on the bash shell using:

{{{
./walkdir.py $HOME/subdir ps
}}}

Without the magic line, the script has to be run like this:
{{{
python walkdir.py $HOME/sync/ ps
}}}
Or within {{{ipython}}} using {{{run}}}


= Exercise 1 =

Read in the following meta-data and extract the field ACQUISITION_DATE
[[attachment:L5058011_01120090712_MTL.txt]]

[[/Solution1]]

= Exercise 2 =

{{{
Dear User,

according to our records you use too much disk space in your home directory.
Please delete at least 862312312.29 Megabytes in the next 1.2 days.

If you have any questions do not hesitate to send a mail to
help-it@zmaw.de.

best regards

Central IT Services - ZMAW
}}}

Have you ever received such an email? If not, perfect! Otherwise you should look at least for the largest files.

 * Write a function that walks through a directory tree and collects the pathnames and sizes of all files.
 * Write a function that sorts the results of the first function according to the size of the files.
 * Combine both functions in a script which can be called from the bash.
 * Optional: limit the resulting list to the top10 sizes
 * Optional: automatically compress the largest files of a certain type.

[[/Solution2]]