Input/Output

This lesson deals with the ways of reading and writing data

Basic Python

The file object can be used for reading and writing plain text as well as unformatted binary data. The following code writes a message in the file with the name out.txt, reads and print the data

   1 file('out.txt','w').write('Hallo Datentraeger')
   2 print file('out.txt').read()

write() writes a string to the file
read() reads complete file
read(N) reads N bytes
readlines() reads the file with linebreaks
readline() reads only the next line

Pickle

The pickle module implements an algorithm for serializing and de-serializing a Python object structure. Pickling is the process whereby a Python object hierarchy is converted into a byte stream, and unpickling is the inverse operation, whereby a byte stream is converted back into an object hierarchy. The cPickle module is a much faster implementation and should be preferred.

   1 a={'A':1}# a python object
   2 pickle.dump(a,open('test.dat','w')) # writes object to file
   3 
   4 b=pickle.load(open('test.dat','r')) # reads the object from file

Comma Separated Values

The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets (Excel) and databases. The csv module enables CSV file reading and writing

NumPy/SciPy

HDF

NASA's standard file format, the http://hdf.ncsa.uiuc.edu/index.html is a self-describing data format. HDF files can contain binary data and allow direct access to parts of the file without first parsing the entire contents.

The HDF versions 4 and 5 are not compatible.

Different modules are available for reading and writing HDF files

pyhdf

pyhdf is a python interface to the NCSA HDF4 library.

The following example demonstrates how to read level-3 data from the Multi-angle Imaging Spectral Radiometer (MISR) on the Terra Satellite

   1 from scipy import array
   2 from pylab import imshow,colorbar,title,savefig
   3 from pyhdf.SD import SD
   4 
   5 f=SD('MISR_AM1_CGLS_MAY_2007_F04_0025.hdf')
   6 print f.datasets().keys()
   7 data=array(f.select('NDVI average').get())
   8 data[data<0]=0
   9 
  10 imshow(data,interpolation='nearest',cmap=cm.YlGn)
  11 colorbar(shrink=0.5)
  12 title('Normalized Difference Vegetation Index')

Line 5 opens the HDF file object. line 6 prints the keywords of the included datasets. From this one can identify the keyword for the desired parameter. Line 7 reads the data in a SciPy array. Line 8 selects the negative (bad and missing) data and sets them to zero.

netCDF

Various Satellite data formats

LehreWiki: SiaProgrammingPythonIo (last edited 2008-04-21 12:10:37 by anonymous)

-  ⇤ ← Revision 2 as of 2008-04-16 17:37:43 → 
  Size: 329
  Editor: anonymous
  Comment:
+   ← Revision 8 as of 2008-04-17 15:49:40 → ⇥
  Size: 3293
  Editor: anonymous
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 13:
-== Basic Python ==
+= Basic Python =
 Line 15:
-== NumPy/SciPy ==
+The [[http://docs.python.org/lib/bltin-file-objects.html|file object]] can be used for reading and 
writing plain text as well as unformatted binary data. The following code
writes a message in the file with the name ''out.txt'', reads and print the data
{{{#!python
file('out.txt','w').write('Hallo Datentraeger')
print file('out.txt').read()
}}}

 * {{{write()}}} writes a string to the file
 * {{{read()}}} reads complete file
 * {{{read(N)}}} reads N bytes
 * {{{readlines()}}} reads the file with linebreaks
 * {{{readline()}}} reads only the next line

== Pickle ==

The [[http://docs.python.org/lib/module-pickle.html|pickle module]] implements an algorithm for serializing and de-serializing a Python object structure. ''Pickling'' is the process whereby a Python object hierarchy is converted into a byte stream, and ''unpickling'' is the inverse operation, whereby a byte stream is converted back into an object hierarchy. The [[http://docs.python.org/lib/module-cPickle.html|cPickle module]] is a much faster implementation and should be preferred.

{{{#!python
a={'A':1}# a python object
pickle.dump(a,open('test.dat','w')) # writes object to file

b=pickle.load(open('test.dat','r')) # reads the object from file
}}}
== Comma Separated Values ==
The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets (Excel) and databases. 
The [[http://docs.python.org/lib/module-csv.html|csv module]] enables CSV file reading and writing

= NumPy/SciPy =
-Line 18:
+Line 46:
+NASA's standard file format, the [[http://hdf.ncsa.uiuc.edu/index.html||Hierarchical Data Format (HDF)]] is a self-describing data format. HDF files can contain binary data and allow direct access to parts of the file without first parsing the entire contents. 

The HDF versions 4 and 5 are not compatible.

Different modules are available for reading and writing HDF files
=== pyhdf ===
[[http://pysclint.sourceforge.net/pyhdf/|pyhdf]] is a python interface to the NCSA HDF4 library. 

The following example demonstrates how to read [[http://eosweb.larc.nasa.gov/PRODOCS/misr/level3/download_data.html|level-3 data]] from the [[http://www-misr.jpl.nasa.gov/|Multi-angle Imaging Spectral Radiometer (MISR)]] on the [[http://terra.nasa.gov/|Terra Satellite]]
 
{{{#!python
from scipy import array
from pylab import imshow,colorbar,title,savefig
from pyhdf.SD import SD

f=SD('MISR_AM1_CGLS_MAY_2007_F04_0025.hdf')
print f.datasets().keys()
data=array(f.select('NDVI average').get())
data[data<0]=0

imshow(data,interpolation='nearest',cmap=cm.YlGn)
colorbar(shrink=0.5)
title('Normalized Difference Vegetation Index')
}}}

Line 5 opens the HDF file object. line 6 prints the keywords of the included datasets. From this one can identify the keyword for the desired parameter. Line 7 reads the data in a SciPy array. Line 8 selects the negative (bad and missing) data and sets them to zero.
 
{{attachment:ndvi.png}}