Attachment 'TestSpearman.py'

Download

   1 # -*- coding: utf-8 -*-
   2 # <nbformat>2</nbformat>
   3 
   4 # <markdowncell>
   5 
   6 # How to do a proper Spearman Rank correlation in python manually ?
   7 # =================================================================
   8 # 
   9 # This little example shows how to make a proper Sparman Rank correlation calculation in python. For definition of Spearman Rank correlation and the example used, please look here: http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
  10 # 
  11 # The actual implementation is basically taken from http://stackoverflow.com/questions/5284646/rank-items-in-an-array-using-python-numpy
  12 
  13 # <codecell>
  14 
  15 #imports
  16 from scipy import stats
  17 
  18 #generate sample data from Wikipedia example
  19 X=np.asarray([106,7,86,0,100,27,101,50,99,28,103,29,97,20,113,12,112,6,110,17]).astype('float')
  20 X.shape = (10,2)
  21 
  22 # <markdowncell>
  23 
  24 # Now we caluculate the Spearman correlation coefficient either manually or by using the stats.spearmanr function.
  25 # 
  26 # The reference solution should be: ρ = −0.175757575... with a P-value = 0.6864058 (using the t distribution)
  27 
  28 # <codecell>
  29 
  30 #assign data to x/y
  31 x = X[:,0]; y=X[:,1]
  32 
  33 #sort the x-data
  34 ox = x.argsort() #order
  35 rx = np.arange(len(ox)) #assign ranks 0 ... n-1 to x-data
  36 
  37 #now sort the y-data in accordance to the x-data sorting which gives a new y-data array
  38 yn = y[ox]
  39 
  40 #... and now (this is the trick) we identify the rank of the y-data by first estimating the indices of the order using argsort() and then estimating the original index by using argsort again on the indices
  41 oy = yn.argsort() #order THIS IS THE KEY!!!!
  42 ry = oy.argsort()
  43 
  44 print '*** Ranked ordered data ***'
  45 print x[ox]
  46 print yn[oy]
  47 
  48 print '*** RANKS ***'
  49 print rx
  50 print ry
  51 
  52 # <markdowncell>
  53 
  54 # This result looks now similar to the the result in the reference solution and we can now go ahead with calculating the correlation between the ranks and validate the accuracy of the calculations.
  55 
  56 # <codecell>
  57 
  58 print 'Solution by manually calculating Spearman Ranke correlation: ', np.corrcoef(rx,ry)[0,1]
  59 
  60 print 'Results from stats.spearmanr                               : ', stats.spearmanr(X[:,0],X[:,1])
  61 print 'Results from stats.mstats.spearmanr                        : ', stats.mstats.spearmanr(X[:,0],X[:,1])
  62 
  63 # <markdowncell>
  64 
  65 # Thus results of all three methods are consistent. *q.e.d*
  66 #  

Attached Files

To refer to attachments on a page, use attachment:filename, as shown below in the list of files. Do NOT use the URL of the [get] link, since this is subject to change and can break easily.
  • [get | view] (2012-11-09 19:15:27, 206.0 KB) [[attachment:ParallelPython_pdf]]
  • [get | view] (2012-11-09 19:06:05, 590.0 KB) [[attachment:ParallelPython_ppt]]
  • [get | view] (2012-12-19 13:11:36, 71.9 KB) [[attachment:Spearman.pdf]]
  • [get | view] (2012-12-19 13:11:55, 2.3 KB) [[attachment:TestSpearman.py]]
  • [get | view] (2012-12-05 23:43:40, 99.6 KB) [[attachment:map.ipynb]]
 All files | Selected Files: delete move to page copy to page

You are not allowed to attach a file to this page.