#acl AdminGroup:read,write EditorGroup:read All:read #format wiki #language en #pragma section-numbers off SiaProgrammingPython <> = Input/output and data formats = This lesson deals with the ways of reading and writing data in different formats = Basic Python = The [[http://docs.python.org/lib/bltin-file-objects.html|file object]] can be used for reading and writing plain text as well as unformatted binary data. The following code writes a message in the file with the name ''out.txt'', reads and print the data {{{#!python file('out.txt','w').write('Hallo Datentraeger') print file('out.txt').read() }}} * {{{write()}}} writes a string to the file * {{{read()}}} reads complete file * {{{read(N)}}} reads N bytes * {{{readlines()}}} reads the file with linebreaks * {{{readline()}}} reads only the next line == Pickle == The [[http://docs.python.org/lib/module-pickle.html|pickle module]] implements an algorithm for serializing and de-serializing a Python object structure. ''Pickling'' is the process whereby a Python object hierarchy is converted into a byte stream, and ''unpickling'' is the inverse operation, whereby a byte stream is converted back into an object hierarchy. The [[http://docs.python.org/lib/module-cPickle.html|cPickle module]] is a much faster implementation and should be preferred. {{{#!python a={'A':1}# a python object pickle.dump(a,open('test.dat','w')) # writes object to file b=pickle.load(open('test.dat','r')) # reads the object from file }}} == Comma Separated Values == The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets (Excel) and databases. The [[http://docs.python.org/lib/module-csv.html|csv module]] enables CSV file reading and writing = NumPy/SciPy = Arrays can be read from a file and written to a file using the function and method: *{{{fromfile(...)}}} *{{{.tofile(...)}}} = HDF = NASA's standard file format, the [[http://hdf.ncsa.uiuc.edu/index.html||Hierarchical Data Format (HDF)]] is a self-describing data format. HDF files can contain binary data and allow direct access to parts of the file without first parsing the entire contents. The HDF versions 4 and 5 are not compatible. Different modules are available for reading and writing HDF files == HDF4 pyhdf == [[http://pysclint.sourceforge.net/pyhdf/|pyhdf]] is a python interface to the NCSA HDF4 library. The following example demonstrates how to read [[http://eosweb.larc.nasa.gov/PRODOCS/misr/level3/download_data.html|level-3 data]] from the [[http://www-misr.jpl.nasa.gov/|Multi-angle Imaging Spectral Radiometer (MISR)]] on the [[http://terra.nasa.gov/|Terra Satellite]] {{{#!python from scipy import array from pylab import imshow,colorbar,title,savefig from pyhdf.SD import SD f=SD('MISR_AM1_CGLS_MAY_2007_F04_0025.hdf') print f.datasets().keys() data=array(f.select('NDVI average').get()) data[data<0]=0 imshow(data,interpolation='nearest',cmap=cm.YlGn) colorbar(shrink=0.5) title('Normalized Difference Vegetation Index') }}} Line 5 opens the HDF file object. line 6 prints the keywords of the included datasets. From this one can identify the keyword for the desired parameter. Line 7 reads the data in a SciPy array. Line 8 selects the negative (bad and missing) data and sets them to zero. {{attachment:ndvi.png}} == HDF5 == [[http://www.pytables.org/moin|PyTables]] is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. = netCDF = [[http://www.pyngl.ucar.edu/|PyNIO]] is a Python package that allows read and/or write access to a variety of data formats using an interface modelled on [[http://www.unidata.ucar.edu/software/netcdf/|netCDF]]. The following example demonstrates how to use the [[http://www.pyngl.ucar.edu/|PyNGL]] module to read and display sea ice concentration data. {{{#!python import os from scipy import array,arange,nan from PyNGL import Ngl from PyNGL import Nio def Ngl_map(C,lat,lon,psfile): rlist = Ngl.Resources() rlist.wkColorMap = 'posneg_1' wks_type = "ps" wks = Ngl.open_wks(wks_type,psfile,rlist) resources = Ngl.Resources() resources.sfXArray = lon[:,:] resources.sfYArray = lat[:,:] resources.mpProjection = "Stereographic" resources.mpDataBaseVersion = "MediumRes" resources.mpLimitMode = "LatLon" resources.mpMinLonF = 0 resources.mpMaxLonF = 360 resources.mpMinLatF = 65 resources.mpMaxLatF = 90 resources.mpCenterLonF = 0. resources.mpFillOn = True igray = Ngl.new_color(wks,0.7,0.7,0.7) resources.mpFillColors = [0,-1,igray,-1] resources.cnLineDrawOrder = "Predraw" resources.cnFillOn = True resources.cnFillDrawOrder = "Predraw" resources.cnLineLabelsOn = False resources.nglSpreadColorStart = 8 resources.nglSpreadColorEnd = -2 resources.cnLevelSelectionMode = "ExplicitLevels" # Define own levels. resources.cnLevels = arange(0.,100,10) resources.lbTitleString = 'Concentration [%]' resources.lbOrientation = "Horizontal" resources.cnFillMode = "RasterFill" resources.cnLinesOn = False resources.tiMainString = "~F22~Arctic Sea Ice Coverage~C~~F21~September average from SSM/I" map = Ngl.contour_map(wks,C[:,:],resources) grid = Nio.open_file('grid_north_12km.nc') lat=array(grid.variables['latitude']) lon=array(grid.variables['longitude']) nc = Nio.open_file('climatology_09.nc') C=array(nc.variables['concentration'])[0,:,:].astype(float) Ngl_map(C,lat,lon,'map') os.system('gv map.ps &') }}} The data files can be downloaded from the [[ftp://ftp.ifremer.fr/ifremer/cersat/products/gridded/psi-concentration/data|ftp server]] of the [[http://cersat.ifremer.fr/|Center for Satellite Exploitation and Research (CERSAT)]] which is one of the major world data centers for oceanography. The September mean sea ice concentration values derived from the [[http://nsidc.org/data/docs/daac/ssmi_instrument.gd.html|Special Sensor Microwave Imager (SSM/I)]] are stored in the netCDF file [[ftp://ftp.ifremer.fr/ifremer/cersat/products/gridded/psi-concentration/data/arctic/climatology/netcdf/climatology_09.nc.Z|climatology_09.nc]] . Compressed data with the extension {{{.Z}}} or {{{.gz}}} can be uncompressed using {{{uncompress}}} or {{{gunzip}}}, respectively. A description of the data and the algorithm can be found [[ftp://ftp.ifremer.fr/ifremer/cersat/products/gridded/psi-drift/documentation/ssmi.pdf|here]]. The main program that reads the data starts in line 40. At first the coordinates of the corresponding pixels are read into the variables {{{lat}}} and {{{lon}}}. The ice concentration data are read in the variable {{{C}}}. The function {{{Ngl_map}}} creates a postscript output in the file {{{map.ps}}} which is displayed using the command {{{gv}}}. {{attachment:september.png}} = Various Satellite data formats = The [[http://www.gdal.org/|Geospatial Data Abstraction Library (GDAL)]] has a [[http://www.gdal.org/gdal_datamodel.html|single abstract data model]] for all [[http://www.gdal.org/formats_list.html|supported formats]] The following code demonstrates how to read in an Envisat ASAR image using the gdal module. {{{#!python import gdal, struct from scipy import array,empty,uint16 from pylab import imshow filename='/pf/u/u242023/ASA_IMM_1PNPDK20080410_095538_000001462067_00337_31954_4832.N1' f=gdal.Open(filename) a=f.GetRasterBand(1) img=empty((a.YSize,a.XSize),dtype=uint16) for yi in xrange(a.YSize): scanline=struct.unpack("H"*a.XSize,a.ReadRaster(0,yi,a.XSize,1,a.XSize,1,gdal.GDT_UInt16)) img[yi,:]=array(scanline).astype(uint16) imshow(img,vmin=0,vmax=3000,interpolation='nearest',origin='lower') }}} {{attachment:asar_weser_elbe.png}}