Python data model

the copy module

numpy overview (numpy 1.5 currently installed at ZMAW)

Python 3 vs Python 2

http://ipython.org/IPython]]

a generic python script

Python Data Types

Python Data Model

Strings

There is no character type. Characters are strings of length 1.

In [3]: string = 'Hello wolrd! I\'m a string'

In [4]: print string
Hello wolrd! I'm a string

Numbers

In [6]: number_i = 3

In [7]: number_f = 3.141597

Lists

In [9]: list_name = ['A', 3, number_f, [12., 34.], (1.,2.), {'a':[1,2,3,4]}]

In [10]: print list_name[0]
A

In [11]: print list_name[3]
[12.0, 34.0]

In [17]: list_name.append(5.1234)

In [18]: print list_name
['A', 3, 3.141597, [12.0, 34.0], (1.0, 2.0), {'a': [1, 2, 3, 4]}, 5.1234]

Get help on objects

In [13]: list_name?
Type:       list
String Form:['A', 3, 3.141597, [12.0, 34.0], (1.0, 2.0), {'a': [1, 2, 3, 4]}]
Length:     6
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterable's items

In [14]: # help(list_name)

In [15]: # list_name. "double tab"

In [16]: # these two cannot be saved for the purpose here, but please try them out - these helps are very useful

Tuples

In [20]: tuple_name = ('A', 3, number_f, [12., 34.], (1.,2.), {'a':[1,2,3,4]})

What is the difference between list and tuples?

Dictionaries

In [25]: dict_name = {'key':'value', 'Th morning':'python course'}

In [26]: print dict_name.get('key')
value

In [27]: print dict_name.keys()
['Th morning', 'key']

In [28]: print dict_name.values()
['python course', 'value']

In [29]: dict_name.update({'participants':34})

In [30]: print dict_name
{'participants': 34, 'Th morning': 'python course', 'key': 'value'}

Copies of mutable objects

By assigning one variable to another, only references are created. If the variables are mutable and the first variable is changed, then the second variable is also changed (because is actually is the same one, only with a different reference).

To see if two variables acutally reference the same object you can use the is keyword. And to check the memory location of an object there is id. These two are good to play around with and get a feeling for how Python changes references after assignments. For examples see below at Copies of numpy arrays

Copies of lists

In order to create a real copy of a mutable object one needs to deepcopy the object if it consists of mutable values, otherwise a shallow copy is sufficient (if the single values of the mutable object are immutable).

In [32]: list_name_copy = list_name

In [33]: print list_name_copy
['A', 3, 3.141597, [12.0, 34.0], (1.0, 2.0), {'a': [1, 2, 3, 4]}, 5.1234]

In [34]: list_name.append('new value')

In [35]: print list_name
['A', 3, 3.141597, [12.0, 34.0], (1.0, 2.0), {'a': [1, 2, 3, 4]}, 5.1234, 'new value']

In [36]: print list_name_copy # has also changed
['A', 3, 3.141597, [12.0, 34.0], (1.0, 2.0), {'a': [1, 2, 3, 4]}, 5.1234, 'new value']

In [37]: import copy

In [38]: list_name_deepcopy = copy.deepcopy(list_name)

In [39]: list_name.append('next new value')

In [40]: print list_name
['A', 3, 3.141597, [12.0, 34.0], (1.0, 2.0), {'a': [1, 2, 3, 4]}, 5.1234, 'new value', 'next new value']

In [41]: print list_name_copy # has changed again
['A', 3, 3.141597, [12.0, 34.0], (1.0, 2.0), {'a': [1, 2, 3, 4]}, 5.1234, 'new value', 'next new value']

In [42]: print list_name_deepcopy # did not change
['A', 3, 3.141597, [12.0, 34.0], (1.0, 2.0), {'a': [1, 2, 3, 4]}, 5.1234, 'new value']

For simple lists (that do not contain other lists) instead of using the copy module, you can also use slicing:

In [19]: list = ['astring', 12, 3.14259]

In [20]: list_copy = list

In [21]: list is list_copy
Out[21]: True

In [22]: list_copy = list[:]

In [23]: list_copy is list
Out[23]: False

but

In [34]: list = ['astring', 12, 3.14259, ['another', 'list']]

In [35]: list_copy = list[:]

In [36]: list_copy is list
Out[36]: False

In [37]: list_copy[3] is list[3]
Out[37]: True

Copies of numpy arrays

Remember that numpy arrays are also mutable objects.

In [22]: arr = n.array([1,2,3,4], dtype=n.float64)

In [23]: brr = arr

In [24]: brr is arr
Out[24]: True

In [25]: id(arr)
Out[25]: 25589568

In [26]: id(brr)
Out[26]: 25589568

In [27]: id(arr[1])
Out[27]: 26177776

In [28]: id(brr[1])
Out[28]: 26177776

In [29]: brr[1] = 10.

In [30]: arr
Out[30]: array([  1.,  10.,   3.,   4.])

Assigning the result of an operation is of course safe.

In [30]: arr
Out[30]: array([  1.,  10.,   3.,   4.])

In [31]: brr = 2 * arr

In [32]: brr is arr
Out[32]: False

In [44]: brr[0]=20.

In [45]: arr
Out[45]: array([  1.,  10.,   3.,   4.])

In [49]: brr
Out[49]: array([ 20.,  20.,   6.,   8.])

To make confusion complete, slicing is not safe for arrays! Slicing an array creates a so called view. To copy an array use the copy method of the array.

In short: For numpy arrays, always use the copy method if you want a real copy!

In [65]: arr=n.array([1., 2., 3., 4.])

In [66]: brr = arr[:]

In [67]: brr is arr
Out[67]: False

In [68]: id(brr)
Out[68]: 47212608

In [69]: id(arr)
Out[69]: 52434720

In [70]: brr[1]=20.

In [71]: arr
Out[71]: array([  1.,  20.,   3.,   4.])

In [72]: brr = arr.copy()

In [73]: brr[0] = 30.

In [74]: arr
Out[74]: array([  1.,  20.,   3.,   4.])

In [75]: brr
Out[75]: array([ 30.,  20.,   3.,   4.])

Array creation

http://docs.scipy.org/doc/numpy/reference/routines.array-creation.html

Import numpy and give it a shorter name

In [7]: import numpy as n

Create an array from an existing list

The most precise data type of the list's elements will be used (ie. if there is one float, the array will contain floats) unless you specify a type.

In [9]: list = [1, 2, 3, 4]

In [10]: arr = n.array(list)

In [11]: arr
Out[11]: array([1, 2, 3, 4])

In [12]: arr_float = n.array(list, dtype=n.float64)

In [13]: arr_float
Out[13]: array([ 1.,  2.,  3.,  4.])

In [14]: arr_float = n.array([1., 2., 3., 4.])

In [15]: arr_float
Out[15]: array([ 1.,  2.,  3.,  4.])

Create particular arrays

In [16]: arr = n.ones((3,3)) 

In [17]: arr
Out[17]: 
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])

In [18]: arr = n.zeros((3,3))

In [19]: arr
Out[19]: 
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [20]: arr = n.empty((3,3))

In [21]: arr
Out[21]: 
array([[  0.00000000e+000,   6.95281578e-310,   1.62438192e-316],
       [  1.16068313e-316,   1.16068313e-316,   1.63041663e-322],
       [  6.95281481e-310,   6.95281481e-310,   1.58101007e-322]])

In [23]: arr = n.eye(3)

In [24]: arr
Out[24]: 
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])

In [26]: arr = n.eye(3,5)

In [27]: arr
Out[27]: 
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.]])

Create an array with evenly spaced numbers

In [28]: arr = n.arange(10)

In [29]: arr
Out[29]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [30]: arr = n.arange(10, 20, 2.5)

In [31]: arr
Out[31]: array([ 10. ,  12.5,  15. ,  17.5])

Create an array that looks like another one (same shape)

In [32]: brr = n.ones_like(arr)

In [33]: brr
Out[33]: array([ 1.,  1.,  1.,  1.])

Create an array that equally divides an interval

In [39]: arr = n.linspace(0, 10, 26)

In [40]: arr
Out[40]: 
array([  0. ,   0.4,   0.8,   1.2,   1.6,   2. ,   2.4,   2.8,   3.2,
         3.6,   4. ,   4.4,   4.8,   5.2,   5.6,   6. ,   6.4,   6.8,
         7.2,   7.6,   8. ,   8.4,   8.8,   9.2,   9.6,  10. ])

Manipulate/reshape arrays

http://docs.scipy.org/doc/numpy/reference/routines.array-manipulation.html

In [54]: arr = n.arange(24)

In [55]: arr
Out[55]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

In [56]: arr.reshape((4,6)) # default is ordering of elements like in C
Out[56]: 
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23]])

This does not change arr, to do this, we need to assign the result of reshape to arr.

Here we also change the order of the elements from C (row major) to Fortran (column major) order.

In [58]: arr = arr.reshape((4,6), order='F')

In [59]: arr
Out[59]: 
array([[ 0,  4,  8, 12, 16, 20],
       [ 1,  5,  9, 13, 17, 21],
       [ 2,  6, 10, 14, 18, 22],
       [ 3,  7, 11, 15, 19, 23]])

In [60]: arr.reshape((12,2))
Out[60]: 
array([[ 0,  4],
       [ 8, 12],
       [16, 20],
       [ 1,  5],
       [ 9, 13],
       [17, 21],
       [ 2,  6],
       [10, 14],
       [18, 22],
       [ 3,  7],
       [11, 15],
       [19, 23]])

Make a 1D array

In [61]: arr.flatten()
Out[61]: 
array([ 0,  4,  8, 12, 16, 20,  1,  5,  9, 13, 17, 21,  2,  6, 10, 14, 18,
       22,  3,  7, 11, 15, 19, 23])

Transpose

In [63]: arr
Out[63]: 
array([[ 0,  4,  8, 12, 16, 20],
       [ 1,  5,  9, 13, 17, 21],
       [ 2,  6, 10, 14, 18, 22],
       [ 3,  7, 11, 15, 19, 23]])

In [64]: n.transpose(arr)
Out[64]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

In [65]: arr.T
Out[65]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

Combine two arrays into a new one

In [89]: a = n.array([1,2,3])

In [90]: b = n.array([4,5,6])

In [91]: n.hstack((a,b))
Out[91]: array([1, 2, 3, 4, 5, 6])

In [92]: n.vstack((a,b))
Out[92]: 
array([[1, 2, 3],
       [4, 5, 6]])

In [93]: a = n.array([[1,2,3],[4,5,6]])

In [94]: b = n.array([[7,8,9],[10,11,12]])

In [95]: n.hstack((a,b))
Out[95]: 
array([[ 1,  2,  3,  7,  8,  9],
       [ 4,  5,  6, 10, 11, 12]])

In [96]: n.vstack((a,b))
Out[96]: 
array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

Read netCDF file and ASCII file

Use the pyNIO class for reading binary data in many common file formats like netCDF, HDF, grib, etc (Mikhail will introduce that package).

Use the loadtxt function of the numpy module to read ASCII files.

In [48]: import numpy as np

In [49]: import Nio

In [50]: filename_netcdf = 'CEIP_EC_L3_DEHai_ebc.nc'

In [51]: filename_ascii = 'DE-Hai.2005.synth.hourly.allvars.csv'

In [96]: # read netcdf with pyNIO package

In [97]: F = Nio.open_file(filename_netcdf, mode='r')
n
In [98]: temperature = F.variables['Ta'][:]

In [99]: F.close()

In [100]: # read ascii file with numpy.loadtxt

In [101]: data = np.loadtxt(filename_ascii, delimiter=',', skiprows=1) # comma separated file; first line is header

Get information about arrays

In [102]: type(data)
Out[102]: numpy.ndarray

In [103]: type(temperature)
Out[103]: numpy.ma.core.MaskedArray

In [104]: data.dtype
Out[104]: dtype('float64')

In [105]: temperature.dtype
Out[105]: dtype('float32')

In [106]: data.shape
Out[106]: (17520, 103)

In [107]: temperature.shape
Out[107]: (140256, 1, 1)

Array statistics

http://docs.scipy.org/doc/numpy/reference/routines.statistics.html

In [126]: n.mean(temperature)
Out[126]: 8.3047627793191499

In [127]: n.std(temperature)
Out[127]: 7.7843985224597105

In [128]: n.min(temperature)
Out[128]: -16.67

In [137]: t1=temperature[:10000].flatten()

In [138]: t2=temperature2[:10000].flatten()

In [139]: n.corrcoef([t1,t2])
Out[139]: 
array([[ 1.        ,  0.00285879],
       [ 0.00285879,  1.        ]])

Indexing and slicing an array

http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

In [113]: print temperature[100:222]
[ 0.41999999  0.50999999  0.56999999  0.64999998  0.81        0.94999999
  1.09000003  1.17999995  1.35000002  1.41999996  1.41999996  1.57000005
  1.66999996  1.79999995  1.86000001  1.90999997  1.95000005  1.99000001
  2.0999999   2.31999993  2.52999997  2.48000002  2.3900001   2.51999998
  2.68000007  2.70000005  2.70000005  2.6099999   2.45000005  2.3599999
  2.32999992  2.22000003  2.06999993  1.96000004  2.00999999  2.00999999
  2.05999994  2.1400001   2.1099999   2.0999999   2.06999993  2.08999991
  2.1099999   2.0999999   2.0999999   2.00999999  2.04999995  2.13000011
  2.0999999   2.19000006  2.26999998  2.31999993  2.33999991  2.24000001
  2.          1.94000006  2.06999993  2.03999996  2.02999997  2.06999993
  2.07999992  1.94000006  1.91999996  1.96000004  2.01999998  2.05999994
  2.1099999   2.25999999  2.31999993  2.33999991  2.30999994  2.3599999
  2.45000005  2.53999996  2.58999991  2.70000005  2.9000001   3.03999996
  3.16000009  3.32999992  3.3900001   3.69000006  4.05000019  4.21999979
  4.19000006  4.21999979  4.36000013  4.48999977  4.63999987  4.73000002
  4.82000017  4.82000017  4.67000008  4.59000015  4.44999981  4.55000019
  4.73999977  4.75        4.76000023  4.61999989  4.51000023  4.11000013
  3.76999998  3.3599999   2.9000001   2.6400001   2.44000006  2.02999997
  1.61000001  1.24000001  1.10000002  1.05999994  0.94999999  1.07000005
  1.44000006  1.83000004  2.05999994  2.27999997  2.57999992  2.61999989
  2.82999992  2.92000008]

In [158]: temperature[:10] # first 10 elements
Out[158]: 
array([-3.04999995, -0.60000002, -0.64999998, -0.57999998, -0.50999999,
       -0.49000001, -0.40000001, -0.36000001, -0.34999999, -0.28      ], dtype=float32)

In [159]: temperature[-10:] # last 10 elements, temperature[-1] gives you the last one
Out[159]: 
array([-0.14      , -0.23999999, -0.33000001, -0.34      , -0.43000001,
       -0.47      , -0.40000001, -0.47999999, -0.74000001, -0.81      ], dtype=float32)

In [160]: temperature[0:10:2] # every 2nd of the first 10 elements
Out[160]: array([-3.04999995, -0.64999998, -0.50999999, -0.40000001, -0.34999999], dtype=float32)

Where function: very helpful and powerful

http://docs.scipy.org/doc/numpy/reference/routines.statistics.html Set all missing values (-999.) to NaN and everything else to the value from temperature:

In [115]: print np.where(temperature == -9999.)
(array([ 9960, 10010, 10293, 10969, 11302, 12649, 16227, 26818, 28628,
       31754, 34538, 42551, 50044, 67411, 67420, 67421, 67422, 69171,
       69172, 69173, 69174, 69175, 69176, 69177, 69178, 69179, 69180,
       69181, 69182, 69183, 69184, 69185, 69395, 69396, 69397, 69398,
       69406, 69914, 69915, 69916, 69917, 75428, 75429, 75430, 75431,
       75432, 75433, 75434, 75435]),)

In [116]: print type(np.where(temperature == -9999.))
<type 'tuple'>

In [117]: print type(np.where(temperature == -9999.)[0])
<type 'numpy.ndarray'>

1 But: The much better way is to use masked arrays

In [118]: temp_nan = np.where(temperature == -9999., np.nan, temperature)

In [119]: print temp_nan[9960]
nan

In [122]: print np.min(temp_nan) # doesn't work
nan

Write out into ASCII file format

In [132]: temp_ascii = data[:,20]

In [133]: np.savetxt('temp_ascii_out.csv', temp_ascii, delimiter=',')

A quick plot

Alex will introduce plotting, but since you asked: here it is.

In [129]: from matplotlib import pyplot as mpp

In [130]: mpp.plot(data[:,20])
Out[130]: [<matplotlib.lines.Line2D at 0xb1c1c38c>]

In [131]: mpp.show()

Elementwise functions

Many common functions are available as numpy routines.

Always use numpy routines when working on an array to profit from the performance of the underlying C implementation.

If a desired operation cannot be achieved by combining several numpy techniques (clever indexing, slicing, functions), write your own function that operates on a single element and use numpy.vectorize

In [97]: arr = n.arange(1, 10, 0.4)

In [98]: arr
Out[98]: 
array([ 1. ,  1.4,  1.8,  2.2,  2.6,  3. ,  3.4,  3.8,  4.2,  4.6,  5. ,
        5.4,  5.8,  6.2,  6.6,  7. ,  7.4,  7.8,  8.2,  8.6,  9. ,  9.4,
        9.8])

In [99]: n.log(arr)
Out[99]: 
array([ 0.        ,  0.33647224,  0.58778666,  0.78845736,  0.95551145,
        1.09861229,  1.22377543,  1.33500107,  1.43508453,  1.5260563 ,
        1.60943791,  1.68639895,  1.75785792,  1.82454929,  1.88706965,
        1.94591015,  2.00148   ,  2.05412373,  2.10413415,  2.1517622 ,
        2.19722458,  2.24070969,  2.28238239])

In [100]: arr**2
Out[100]: 
array([  1.  ,   1.96,   3.24,   4.84,   6.76,   9.  ,  11.56,  14.44,
        17.64,  21.16,  25.  ,  29.16,  33.64,  38.44,  43.56,  49.  ,
        54.76,  60.84,  67.24,  73.96,  81.  ,  88.36,  96.04])

In [101]: n.sqrt(arr)
Out[101]: 
array([ 1.        ,  1.18321596,  1.34164079,  1.4832397 ,  1.61245155,
        1.73205081,  1.84390889,  1.94935887,  2.04939015,  2.14476106,
        2.23606798,  2.32379001,  2.40831892,  2.48997992,  2.56904652,
        2.64575131,  2.7202941 ,  2.79284801,  2.86356421,  2.93257566,
        3.        ,  3.06594194,  3.13049517])

Array arithmetic

Add a constant

In [102]: arr + 100.0 # add constant
Out[102]: 
array([ 101. ,  101.4,  101.8,  102.2,  102.6,  103. ,  103.4,  103.8,
        104.2,  104.6,  105. ,  105.4,  105.8,  106.2,  106.6,  107. ,
        107.4,  107.8,  108.2,  108.6,  109. ,  109.4,  109.8])

Add a constant row/column to all rows/columns

In [103]: arr = n.ones((4,4))

In [104]: arr
Out[104]: 
array([[ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.]])

In [105]: brr = n.arange(1, 5)

In [106]: brr
Out[106]: array([1, 2, 3, 4])

In [107]: arr + brr
Out[107]: 
array([[ 2.,  3.,  4.,  5.],
       [ 2.,  3.,  4.,  5.],
       [ 2.,  3.,  4.,  5.],
       [ 2.,  3.,  4.,  5.]])

In [110]: brr = brr.reshape((4,1))

In [111]: brr
Out[111]: 
array([[1],
       [2],
       [3],
       [4]])

In [112]: arr + brr
Out[112]: 
array([[ 2.,  2.,  2.,  2.],
       [ 3.,  3.,  3.,  3.],
       [ 4.,  4.,  4.,  4.],
       [ 5.,  5.,  5.,  5.]])

Add/multiply elementwise

In [114]: arr = n.arange(25).reshape((5,5))

In [115]: arr
Out[115]: 
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24]])

In [120]: brr = n.arange(10, 260, 10).reshape((5,5))

In [121]: brr
Out[121]: 
array([[ 10,  20,  30,  40,  50],
       [ 60,  70,  80,  90, 100],
       [110, 120, 130, 140, 150],
       [160, 170, 180, 190, 200],
       [210, 220, 230, 240, 250]])

In [122]: arr + brr 
Out[122]: 
array([[ 10,  21,  32,  43,  54],
       [ 65,  76,  87,  98, 109],
       [120, 131, 142, 153, 164],
       [175, 186, 197, 208, 219],
       [230, 241, 252, 263, 274]])

In [123]: arr * brr
Out[123]: 
array([[   0,   20,   60,  120,  200],
       [ 300,  420,  560,  720,  900],
       [1100, 1320, 1560, 1820, 2100],
       [2400, 2720, 3060, 3420, 3800],
       [4200, 4620, 5060, 5520, 6000]])

Linear algebra

http://docs.scipy.org/doc/numpy/reference/routines.linalg.html

Dot product

In [124]: arr = n.random.rand(5,5)

In [125]: symm_arr = n.do
n.dot     n.double  

In [125]: symm_arr = n.dot(arr, n.transpose(arr))

In [126]: symm_arr
Out[126]: 
array([[ 1.58587206,  0.84716432,  1.14919924,  1.28189122,  1.19506914],
       [ 0.84716432,  0.65720114,  1.00722336,  1.0351238 ,  0.59206034],
       [ 1.14919924,  1.00722336,  1.62035768,  1.59009221,  0.79137279],
       [ 1.28189122,  1.0351238 ,  1.59009221,  2.14853304,  1.39284491],
       [ 1.19506914,  0.59206034,  0.79137279,  1.39284491,  1.68863531]])

Eigenvectors, eigenvalues

In [127]: eval, evec = n.linalg.eig(symm_arr)

In [128]: eval
Out[128]: 
array([  6.08869334e+00,   9.75455082e-01,   5.38114193e-01,
         9.38976138e-02,   4.43900059e-03])

In [129]: evec
Out[129]: 
array([[-0.44727346, -0.19413401, -0.76916546,  0.36518519,  0.19308733],
       [-0.30802641,  0.25639165, -0.14501538, -0.06950575, -0.90195484],
       [-0.45918029,  0.55669694, -0.05749307, -0.58276662,  0.36921484],
       [-0.56134928,  0.12795817,  0.58475995,  0.56423789,  0.09058196],
       [-0.42320992, -0.75517583,  0.20519597, -0.45147248, -0.06833777]])

In [130]: b = n.dot(symm_arr, evec[:, 0])

In [131]: b/eval[0]
Out[131]: array([-0.44727346, -0.30802641, -0.45918029, -0.56134928, -0.42320992])

Masked arrays

http://docs.scipy.org/doc/numpy/reference/routines.ma.html

Creation from existing array

Set the data of the masked array and then manipulate the mask.

In [132]: arr = n.array([[4,3,1],[3,1,4],[4,4,2]])

In [133]: marr = n.ma.array(arr)

In [134]: marr
Out[134]: 
masked_array(data =
 [[4 3 1]
 [3 1 4]
 [4 4 2]],
             mask =
 False,
       fill_value = 999999)


In [135]: marr.mask = [[True, True, False], [False, False, False], [True, False, True]]

In [136]: marr
Out[136]: 
masked_array(data =
 [[-- -- 1]
 [3 1 4]
 [-- 4 --]],
             mask =
 [[ True  True False]
 [False False False]
 [ True False  True]],
       fill_value = 999999)

Creation with mask set everywhere

In [138]: marrall = n.ma.masked_all((4,4))

In [139]: marrall
Out[139]: 
masked_array(data =
 [[-- -- -- --]
 [-- -- -- --]
 [-- -- -- --]
 [-- -- -- --]],
             mask =
 [[ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]],
       fill_value = 1e+20)


In [140]: marrall[3,2] = 2

In [141]: marrall[0:2,2:] = 5

In [142]: marrall
Out[142]: 
masked_array(data =
 [[-- -- 5.0 5.0]
 [-- -- 5.0 5.0]
 [-- -- -- --]
 [-- -- 2.0 --]],
             mask =
 [[ True  True False False]
 [ True  True False False]
 [ True  True  True  True]
 [ True  True False  True]],
       fill_value = 1e+20)

Creation from existing array and mask set by a condition

In [143]: arr = n.random.normal(0, 1, 25).reshape((5,5))

In [144]: arr
Out[144]: 
array([[ 0.17329858,  0.74527996,  0.26280218,  0.72841021, -0.42252451],
       [-0.82555426,  0.34412862, -1.62284074,  0.70672411,  0.40375032],
       [ 0.86688535,  0.39196647, -0.46354791, -0.75160129,  0.20977493],
       [-1.5837415 , -0.58776249, -0.22477697, -0.52870726, -1.47657261],
       [ 0.71056709,  1.48120984, -0.8241472 ,  0.01575747,  0.0033042 ]])

In [145]: marr = n.ma.masked_less(arr, 0)

In [146]: marr
Out[146]: 
masked_array(data =
 [[0.173298577831 0.745279962312 0.262802177587 0.728410213258 --]
 [-- 0.34412861582 -- 0.706724108565 0.403750315142]
 [0.86688535478 0.391966467504 -- -- 0.209774932895]
 [-- -- -- -- --]
 [0.710567094906 1.48120984326 -- 0.0157574720741 0.00330419672626]],
             mask =
 [[False False False False  True]
 [ True False  True False False]
 [False False  True  True False]
 [ True  True  True  True  True]
 [False False  True False False]],
       fill_value = 1e+20)

In [151]: n.ma.count(marr)
Out[151]: 14

In [152]: n.ma.count_masked(marr)
Out[152]: 11

In [159]: arr = n.array([[1, 2, n.nan],[n.inf, 3, 4]])

In [160]: arr
Out[160]: 
array([[  1.,   2.,  nan],
       [ inf,   3.,   4.]])

In [162]: marr = n.ma.fix_invalid(arr)

In [163]: marr
Out[163]: 
masked_array(data =
 [[1.0 2.0 --]
 [-- 3.0 4.0]],
             mask =
 [[False False  True]
 [ True False False]],
       fill_value = 1e+20)

Fill values

In [164]: marr.fill_value = -9999

In [165]: marr.filled()
Out[165]: 
array([[  1.00000000e+00,   2.00000000e+00,  -9.99900000e+03],
       [ -9.99900000e+03,   3.00000000e+00,   4.00000000e+00]])

Compression to 1D array

In [166]: marr.compressed()
Out[166]: array([ 1.,  2.,  3.,  4.])

Example with JSBACH init file

To run this example, change into a directory with a netcdf file that contains a variable 'slm' (like a JSBACH init file) or adapt the file name below. In that directory, start ipython and copy the code into the ipython shell. To run this code as a script, add

 #!/usr/bin/env python

in the beginning and save it to a file. Boolean indexing is used here.

import Nio
import matplotlib.pyplot as p
import numpy as n
f = Nio.open_file('jsbach_T63GR15_11tiles_2005.nc')
slm = f.variables['slm'][:].copy()
vegratiomax = f.variables['veg_ratio_max'][:].copy()
elevation = f.variables['elevation'][:].copy()
f.close()

### unpack vector with land points to map
land = n.where(slm>0, True, False)  # land mask is True where slm>0, else False
data1d = n.random.rand(6222)        # a random 1D vector representing some data
data2d = n.ma.masked_all((96, 192)) 
data2d[land] = data1d               # the boolean array land can be used to index another 2D array, the result is a 1D array with the points where land==True
p.imshow(data2d)
p.show()

p.imshow(vegratiomax)
p.show()
maskedvrm = n.ma.masked_equal(vegratiomax, 0)
p.imshow(maskedvrm)
p.show()
maskedvrm = n.ma.masked_less(vegratiomax, 0.5)
p.imshow(maskedvrm)
p.show()
maskedvrm = n.ma.masked_outside(vegratiomax, 0.2, 0.4)
p.imshow(maskedvrm)
p.show()

print vegratiomax.mean()
print maskedvrm.mean()

highpoints = n.where(elevation>3000)
highpointsmap = n.ma.masked_all((96, 192))
highpointsmap.mask[highpoints] = False

highpointsmap = n.ma.masked_all((96, 192))
highpointsmap[highpoints] = elevation[highpoints]
highpointsmapfilled = highpointsmap.filled(0)
p.imshow(highpointsmap)
p.show()

The plotting is kept simple. Wait for the next seminar...

LehreWiki: PythonCourse/PythonLES/SeminarLog1 (last edited 2012-07-20 13:12:25 by GernotGeppert)