皮尔逊相关度

更多关于度量的算法见:http://zh.wikipedia.org/wiki/%E5%BA%A6%E9%87%8F

皮尔逊相关度(Pearson)的公式如下:

\begin{equation} \rho_{xy} = \frac{\sum_{i=1}^{n} x_{i}y_{i} - \frac{\sum_{i=1}^{n}x_{i} \sum_{i=1}^{n}y_{i}}{n}} { \sqrt{\sum_{i=1}^{n}x_{i}^2 - \frac{(\sum_{i=1}^{n}x_{i})^2}{n}} \sqrt{\sum_{i=1}^{n}y_{i}^2 - \frac{(\sum_{i=1}^{n}y_{i})^2}{n}} } \end{equation}

根据公式,相关 Python 代码如下:

#!/usr/bin/env python
#coding=utf-8

from math import sqrt

x = [1,2,3]
y = [1,2,3]

n = 3

sum1 = sum(x)
sum2 = sum(y)

sum1Sq = sum([pow(i,2) for i in x])
sum2Sq = sum([pow(i,2) for i in y])

pSum = sum(x[i]*y[i] for i in xrange(n))

num = pSum-(sum1*sum2/n)
den = sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))

if den == 0: print 0

r = num/den

print r