皮尔逊相关度
更多关于度量的算法见:http://zh.wikipedia.org/wiki/%E5%BA%A6%E9%87%8F
皮尔逊相关度(Pearson)的公式如下:
\begin{equation} \rho_{xy} = \frac{\sum_{i=1}^{n} x_{i}y_{i} - \frac{\sum_{i=1}^{n}x_{i} \sum_{i=1}^{n}y_{i}}{n}} { \sqrt{\sum_{i=1}^{n}x_{i}^2 - \frac{(\sum_{i=1}^{n}x_{i})^2}{n}} \sqrt{\sum_{i=1}^{n}y_{i}^2 - \frac{(\sum_{i=1}^{n}y_{i})^2}{n}} } \end{equation}根据公式,相关 Python 代码如下:
#!/usr/bin/env python #coding=utf-8 from math import sqrt x = [1,2,3] y = [1,2,3] n = 3 sum1 = sum(x) sum2 = sum(y) sum1Sq = sum([pow(i,2) for i in x]) sum2Sq = sum([pow(i,2) for i in y]) pSum = sum(x[i]*y[i] for i in xrange(n)) num = pSum-(sum1*sum2/n) den = sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n)) if den == 0: print 0 r = num/den print r