Scipy Spatial Distance子模块拒绝Numpy Array



我有一个名为"df"的数据帧,有4列。三列是自变量:x1、x2和x3。另一个变量y是因变量

我想计算因变量和每个因变量之间的距离"pdist",所以我首先将每列转换为numpy数组,如下所示:

y = df[["y"]].values
x1 = df[["x1"]].values
x2 = df[["x2"]].values
x3 = df[["x3"]].values

当我通过这个编码管道输入这些数组时,我从Github:得到了

import numpy as np
from scipy.spatial.distance import pdist
def distance_correlation(Xval, Yval, pval=True, nruns=500):    
X, Y = np.atleast_1d(Xval),np.atleast_1d(Yval)     
if np.prod(X.shape) == len(X):X = X[:, None]     
if np.prod(Y.shape) == len(Y):Y = Y[:, None]     
X, Y = np.atleast_2d(X),np.atleast_2d(Y)    
n = X.shape[0]     
if Y.shape[0] != X.shape[0]:raise ValueError('Number of samples must match')     
a, b = squareform(pdist(X)),squareform(pdist(Y))    
A = a - a.mean(axis=0)[None, :] - a.mean(axis=1)[:, None] + a.mean()    
B = b - b.mean(axis=0)[None, :] - b.mean(axis=1)[:, None] + b.mean()     
dcov2_xy = (A * B).sum() / float(n * n)     
dcov2_xx = (A * A).sum() / float(n * n)     
dcov2_yy = (B * B).sum() / float(n * n)     
dcor = np.sqrt(dcov2_xy) / np.sqrt(np.sqrt(dcov2_xx) * np.sqrt(dcov2_yy))     
if pval:         
greater = 0         
for i in range(nruns):             
Y_r = copy.copy(Yval)             
np.random.shuffle(Y_r)          
if distcorr(Xval, Y_r, pval=False) > dcor:                 
greater += 1         
return (dcor, greater / float(nruns))     
else:         
return dcor
distance_correlation(x1, y, pval=True, nruns=500)

我得到这个错误:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-c720c9df4e97> in <module>
----> 1 distance_correlation(bop_sp500, price, pval=True, nruns=500)
<ipython-input-17-e0b3aea12c32> in distance_correlation(Xval, Yval, pval, nruns)
9     n = X.shape[0]
10     if Y.shape[0] != X.shape[0]:raise ValueError('Number of samples must match')
---> 11     a, b = squareform(pdist(X)),squareform(pdist(Y))
12     A = a - a.mean(axis=0)[None, :] - a.mean(axis=1)[:, None] + a.mean()
13     B = b - b.mean(axis=0)[None, :] - b.mean(axis=1)[:, None] + b.mean()
~Anaconda3libsite-packagesscipyspatialdistance.py in pdist(X, metric, *args, **kwargs)
1997     s = X.shape
1998     if len(s) != 2:
-> 1999         raise ValueError('A 2-dimensional array must be passed.')
2000 
2001     m, n = s
ValueError: A 2-dimensional array must be passed..

有人能认出我哪里出了问题吗?我知道这个错误源于我创建numpy数组的方式。但是,我没有修复它的线索。

请用我的变量定义举例说明。我是Python 的新手

好的,所以我终于找到了我面临的问题的原因:

被馈送到helper函数中的Numpy数组是一个2d数组。

而辅助函数需要一个"Numpy vector";即1d Numpy阵列。

创建它的最佳方法是使用numpy.rave((函数。因此,对于我的数据集,代码如下(为了简单起见,我分解了步骤(:

# Create Arrays
y = df[["y"]].values
x1 = df[["x1"]].values
x2 = df[["x2"]].values
x3 = df[["x3"]].values
# Ravel Them
y = y.ravel()
x1 = x1.ravel()
x2 = x2.ravel()
x3 = x3.ravel()

最新更新