将数据帧行与矩阵相乘

我正在尝试将数据框与由数据框中的项目组成的矩阵相乘。

我可以用for循环来解决这个问题，但是对于一个大的数据帧，它需要很长时间。

df = pd.DataFrame({"A": [1, 2, 3, 4],
"B": [5, 6, 7, 8],
"C": [9, 10, 11, 12],
"D": [1, 1, 1, 1]})
l = []
for index, row in df.iterrows():
l.append(df.loc[index].dot(np.array([[np.sin(df["A"].loc[index]), 0, 0, np.sin(df["A"].loc[index])],
[0, np.sign(df["B"].loc[index]), 0, np.abs(df["C"].loc[index])],
[np.sign(df["C"].loc[index]), 0, np.sign(df["C"].loc[index]), 0],
[1, 2, 0, np.tan(df["C"].loc[index])]])))
df[["U", "V", "W", "X"]] = l
print(df)

谢谢你的帮助。

使用数组可能比使用数据框架更容易。索引将会简单得多

帧的numpy值:

In [46]: df.values
Out[46]: 
array([[ 1,  5,  9,  1],
[ 2,  6, 10,  1],
[ 3,  7, 11,  1],
[ 4,  8, 12,  1]], dtype=int64)

对于一行，2d数组为:

In [47]: index = 0    
In [48]: np.array([[np.sin(df["A"].loc[index]), 0, 0, np.sin(df["A"].loc[index])],
...:                                      [0, np.sign(df["B"].loc[index]), 0, np.abs(df["C"].loc[index])],
...:                                      [np.sign(df["C"].loc[index]), 0, np.sign(df["C"].loc[index]), 0],
...:                                      [1, 2, 0, np.tan(df["C"].loc[index])]])                                  
Out[48]: 
array([[ 0.84147098,  0.        ,  0.        ,  0.84147098],
[ 0.        ,  1.        ,  0.        ,  9.        ],
[ 1.        ,  0.        ,  1.        ,  0.        ],
[ 1.        ,  2.        ,  0.        , -0.45231566]])
In [52]: Out[46][0].dot(Out[48])
Out[52]: array([10.84147098,  7.        ,  9.        , 45.38915533])

与你的应用程序比较

In [51]: l
Out[51]: 
[array([10.84147098,  7.        ,  9.        , 45.38915533]),
array([12.81859485,  8.        , 10.        , 62.46695568]),
array([  12.42336002,    9.        ,   11.        , -148.52748643]),
array([ 9.97279002, 10.        , 12.        , 92.33693009])]

在数组术语中，2d数组为:

In [53]: x = df.values
In [56]: index=0
In [57]: np.array([[np.sin(x[index,0]), 0, 0, np.sin(x[index,0])],
...:                                      [0, np.sign(x[index,1]), 0, np.abs(x[index,2])],
...:                                      [np.sign(x[index,2]), 0, np.sign(x[index,2]), 0],
...:                                      [1, 2, 0, np.tan(x[index,2])]])
Out[57]: 
array([[ 0.84147098,  0.        ,  0.        ,  0.84147098],
[ 0.        ,  1.        ,  0.        ,  9.        ],
[ 1.        ,  0.        ,  1.        ,  0.        ],
[ 1.        ,  2.        ,  0.        , -0.45231566]])

为了更快地完成此操作，我们需要一次为x的所有行构建这样一个数组。

在einsum矩阵乘法项中，行运算为:

np.einsum('j,jk->k',x,A)

一般来说，我们需要一个3d数组，使得

np.einsum('ij,ijk->ik',x,A)

我们可以在index上迭代生成三维A。我们不能简单地将标量index替换为切片或arange。

通过定义几个变量，我们可以构造3dA:

In [64]: Z = np.zeros(4); index=np.arange(4)
In [65]: A=np.array([[np.sin(x[index,0]), Z, Z, np.sin(x[index,0])],
...:                                      [Z, np.sign(x[index,1]), Z, np.abs(x[index,2])],
...:                                      [np.sign(x[index,2]), Z, np.sign(x[index,2]), Z],
...:                                      [Z+1, Z+2, Z, np.tan(x[index,2])]])
In [66]: A.shape
Out[66]: (4, 4, 4)

这将把index维度放在最后。

In [67]: A[:,:,0]
Out[67]: 
array([[ 0.84147098,  0.        ,  0.        ,  0.84147098],
[ 0.        ,  1.        ,  0.        ,  9.        ],
[ 1.        ,  0.        ,  1.        ,  0.        ],
[ 1.        ,  2.        ,  0.        , -0.45231566]])

所以einsum需要是:

In [68]: res=np.einsum('ij,jki->ik',x,A)
In [69]: res
Out[69]: 
array([[  10.84147098,    7.        ,    9.        ,   45.38915533],
[  12.81859485,    8.        ,   10.        ,   62.46695568],
[  12.42336002,    9.        ,   11.        , -148.52748643],
[   9.97279002,   10.        ,   12.        , 92.33693009]])

匹配你的l值。

3dA可以用其他方式构建，但我选择了这种方式，因为它需要最少的编辑。

相关内容

最新更新

热门标签：