# -*- coding: utf-8 -*-
# <nbformat>2</nbformat>

# <markdowncell>

# How to do a proper Spearman Rank correlation in python manually ?
# =================================================================
# 
# This little example shows how to make a proper Sparman Rank correlation calculation in python. For definition of Spearman Rank correlation and the example used, please look here: http://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient
# 
# The actual implementation is basically taken from http://stackoverflow.com/questions/5284646/rank-items-in-an-array-using-python-numpy

# <codecell>

#imports
from scipy import stats

#generate sample data from Wikipedia example
X=np.asarray([106,7,86,0,100,27,101,50,99,28,103,29,97,20,113,12,112,6,110,17]).astype('float')
X.shape = (10,2)

# <markdowncell>

# Now we caluculate the Spearman correlation coefficient either manually or by using the stats.spearmanr function.
# 
# The reference solution should be: ρ = −0.175757575... with a P-value = 0.6864058 (using the t distribution)

# <codecell>

#assign data to x/y
x = X[:,0]; y=X[:,1]

#sort the x-data
ox = x.argsort() #order
rx = np.arange(len(ox)) #assign ranks 0 ... n-1 to x-data

#now sort the y-data in accordance to the x-data sorting which gives a new y-data array
yn = y[ox]

#... and now (this is the trick) we identify the rank of the y-data by first estimating the indices of the order using argsort() and then estimating the original index by using argsort again on the indices
oy = yn.argsort() #order THIS IS THE KEY!!!!
ry = oy.argsort()

print '*** Ranked ordered data ***'
print x[ox]
print yn[oy]

print '*** RANKS ***'
print rx
print ry

# <markdowncell>

# This result looks now similar to the the result in the reference solution and we can now go ahead with calculating the correlation between the ranks and validate the accuracy of the calculations.

# <codecell>

print 'Solution by manually calculating Spearman Ranke correlation: ', np.corrcoef(rx,ry)[0,1]

print 'Results from stats.spearmanr                               : ', stats.spearmanr(X[:,0],X[:,1])
print 'Results from stats.mstats.spearmanr                        : ', stats.mstats.spearmanr(X[:,0],X[:,1])

# <markdowncell>

# Thus results of all three methods are consistent. *q.e.d*
#  

