Reading different data formats

This lesson deals with the ways of reading and writing data in different formats

Basic Python

The file object can be used for reading and writing plain text as well as unformatted binary data. The following code writes a message in the file with the name out.txt, reads and print the data

   1 file('out.txt','w').write('Hallo Datentraeger')
   2 print file('out.txt').read()

write() writes a string to the file
read() reads complete file
read(N) reads N bytes
readlines() reads the file with linebreaks
readline() reads only the next line

Pickle

The pickle module implements an algorithm for serializing and de-serializing a Python object structure. Pickling is the process whereby a Python object hierarchy is converted into a byte stream, and unpickling is the inverse operation, whereby a byte stream is converted back into an object hierarchy. The cPickle module is a much faster implementation and should be preferred.

   1 a={'A':1}# a python object
   2 pickle.dump(a,open('test.dat','w')) # writes object to file
   3 
   4 b=pickle.load(open('test.dat','r')) # reads the object from file

Comma Separated Values

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets (Excel) and databases. The csv module enables CSV file reading and writing

Pylab

X = load('test.dat')  # data in two columns
t = X[:,0]
y = X[:,1]

Interaction with the operating system

The modules sys and os provide the basic interface to the operating system. The module os creates a portable abstraction layer which is used by high-level modules like glob, socket, thred, time, fcntl.

Module sys

The module sys provides access to system-specific parameters by the interpreter.

Example argv

   1 #system1.py
   2 import sys
   3 print sys.argv

run system1.py parameter1 parameter2
['system1.py', 'parameter1', 'parameter2']

The script prints the command line arguments that are passed to the script. argv[0] is the script name

A more sophisticated way of evaluating command line arguments is provided by the module optparse

Module os

The module os is a portable operating system interface.

Some examples:

os.system() Executes the command (a string) in a subshell
os.mkdir() Creates a directory
os.remove() Deletes a file
os.path.isdir() Test if directory
os.path.isfile() Test if file
os.path.exists() Test if file or directory exists
os.path.getsize() Size of a file
os.path.basename() Base name of pathname
os.walk() Directory tree generator

Module fnmatch

The module fnmatch provides support for Unix shell-style wildcards

Module glob

The module glob finds all the pathnames matching a specified pattern according to the rules used by the Unix shell.

The following example looks for all pdf files in the current working directory and converts them into postscript files.

   1 #!/usr/bin/env python
   2 import os,glob
   3 
   4 filelist=glob.glob('*.pdf')
   5 for f in filelist:
   6     psfilename=f.split('.')[0]+'.ps'
   7     cmd='pdftops '+f+' '+psfilename
   8     print cmd
   9     os.system(cmd)

Module shutil — High-level file operations

The shutil module offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal.

Unix Specific Services

Features that are unique to the Unix operating system are for example shell pipelines (data streams) that pipe the output of one program to another. The pipeline symbol is |. For example, the command ls -s | sort -rg pipes the output of ls -s to the sort program. The result is a list of filenames sorted by its size

A python pipeline to a Unix programm can be established using the module pipes

System programming: walk example

The following script walks through a directory tree and looks for all files with the matching extension:

   1 #!/usr/bin/env python
   2 import os,fnmatch,sys
   3 
   4 # Usage:
   5 # ./walkdir.py directory extension
   6 
   7 dir,ext=sys.argv[1],sys.argv[2]
   8 for root, dirs, files in os.walk(dir):
   9     f=fnmatch.filter(files,'*.'+ext)
  10     if type(f)==type([]):
  11         for fi in f:
  12             print root+fi

Save the file as walkdir.py and use chmod +x walkdir.py to set the execution permissions of the file. The first magic line starts the python interpreter. The script can be exectuted on the bash shell using:

./walkdir.py $HOME/subdir ps

Without the magic line, the script has to be run like this:

python walkdir.py $HOME/sync/ ps

Or within ipython using run

LehreWiki: OpenSource2010/Lesson4 (last edited 2010-11-08 11:24:58 by anonymous)

-  ⇤ ← Revision 1 as of 2009-11-09 12:44:53 → 
  Size: 3932
  Editor: anonymous
  Comment:
+   ← Revision 2 as of 2009-11-09 12:47:09 → ⇥
  Size: 5691
  Editor: anonymous
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 9:
+= Reading different data formats =
This lesson deals with the ways of reading and writing data in different formats

= Basic Python =
The [[http://docs.python.org/lib/bltin-file-objects.html|file object]] can be used for reading and  writing plain text as well as unformatted binary data. The following code writes a message in the file with the name ''out.txt'', reads and print the data

{{{
#!python

file('out.txt','w').write('Hallo Datentraeger')
print file('out.txt').read()
}}}
 * {{{write()}}} writes a string to the file
 * {{{read()}}} reads complete file
 * {{{read(N)}}} reads N bytes
 * {{{readlines()}}} reads the file with linebreaks
 * {{{readline()}}} reads only the next line

== Pickle ==
The [[http://docs.python.org/lib/module-pickle.html|pickle module]] implements an algorithm for serializing and de-serializing a Python object structure. ''Pickling'' is the process whereby a Python object hierarchy is converted into a byte stream, and ''unpickling'' is the inverse operation, whereby a byte stream is converted back into an object hierarchy. The [[http://docs.python.org/lib/module-cPickle.html|cPickle module]] is a much faster implementation and should be preferred.

{{{
#!python

a={'A':1}# a python object
pickle.dump(a,open('test.dat','w')) # writes object to file

b=pickle.load(open('test.dat','r')) # reads the object from file
}}}
== Comma Separated Values ==
The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets (Excel) and databases.  The [[http://docs.python.org/lib/module-csv.html|csv module]] enables CSV file reading and writing

= Pylab =

{{{
X = load('test.dat')  # data in two columns
t = X[:,0]
y = X[:,1]
}}}