Panda 应用于数据帧列以返回多个带有后缀的列



需要在每列返回 2 列、col_sin 和 col_cos 的列上进行 sin 和 cos 转换

def transform(data, var):
sin_ = np.sin(data - var)
cos_ = np.cos(data - var)
return pd.Series([sin_, cos_], index=['sin', 'cos']
d = {'col1': [0, 15, 30, 45, 60], 'col2': [0, 60, 180, 240, 300]}
df = pd.DataFrame(data=d)
df = df.apply(transform, axis=0, var=0)

返回(数字不正确,因为列与实际传递的列不同(:

+-----+----------------------------+----------------------------+
|     | col1                       | col2                       |
|-----+----------------------------+----------------------------|
| sin | 0    0.000000e+00          | 0    0.000000e+00          |
|     | 1    1.000000e+00          | 1   -1.133108e-15          |
|     | 2    5.665539e-16          | 2   -7.347881e-16          |
|     | 3   -1.000000e+00          | 3   -4.532431e-15          |
|     | 4   -1.133108e-15          | 4   -1.224647e-15          |
|     | Name: col1, dtype: float64 | Name: col2, dtype: float64 |
| cos | 0    1.000000e+00          | 0    1.0                   |
|     | 1    2.832769e-16          | 1    1.0                   |
|     | 2   -1.000000e+00          | 2    1.0                   |
|     | 3   -1.836970e-16          | 3    1.0                   |
|     | 4    1.000000e+00          | 4    1.0                   |
|     | Name: col1, dtype: float64 | Name: col2, dtype: float64 |
+-----+----------------------------+----------------------------+

预期输出应包含 4 列:col1_sin、col1_cos、col2_sin 和 col2_cos

我怎样才能做到这一点?

还有没有办法将 var 作为列表/元组传递,其中 var[0] 用于 col1,var[1] 用于 col2? 像这样:

df = df.apply(transform, axis=0, var=[0, 60])

有没有办法用 raw=True 来加快速度?这样的东西不起作用

def transform(data, var):
sin_ = np.sin(data - var)
cos_ = np.cos(data - var)
return np.column_stack((sin_, cos_))

谢谢!

使用DataFrame.pipe传递所有DataFrame,如果var是具有相同大小的列表,例如列数是可能的,请减去它们,将数据帧连接在一起并返回带有新列名称的数据帧:

def transform(data, var):
sin_ = np.sin(data - var)
cos_ = np.cos(data - var)
arr =  np.column_stack((sin_, cos_))
c = (data.columns + '_sin').tolist() + (data.columns + '_cos').tolist()
return pd.DataFrame(arr, index=df.index, columns=c)
d = {'col1': [0, 15, 30, 45, 60], 'col2': [0, 60, 180, 240, 300]}
df = pd.DataFrame(data=d)
df = df.pipe(transform, var=[0, 60])
print (df)
col1_sin  col2_sin  col1_cos  col2_cos
0  0.000000  0.304811  1.000000 -0.952413
1  0.650288  0.000000 -0.759688  1.000000
2 -0.988032  0.580611  0.154251  0.814181
3  0.850904 -0.801153  0.525322 -0.598460
4 -0.304811  0.945445 -0.952413  0.325781

这里不需要apply。应传递整个数据帧。我们可以concatadd_suffix来正确命名。使用np.broadcast_to我们可以处理单个偏移量或正确形状的列表/数组:

import pandas as pd
import numpy as np
def transform(data, var, degrees=True):
"""
data : pd.DataFrame
var : numeric, or list/array of numerics. Should be 
broadcastable to data.shape
"""
data = data - np.broadcast_to(var, data.shape)
# data = data - var # also works for compatible shapes         
if degrees:
data = np.radians(data)
return pd.concat([np.sin(data).add_suffix('_sin'),
np.cos(data).add_suffix('_cos')],
axis=1)

transform(df, var=[45, 0], degrees=True)
col1_sin      col2_sin  col1_cos  col2_cos
0 -0.707107  0.000000e+00  0.707107       1.0
1 -0.500000  8.660254e-01  0.866025       0.5
2 -0.258819  1.224647e-16  0.965926      -1.0
3  0.000000 -8.660254e-01  1.000000      -0.5
4  0.258819 -8.660254e-01  0.965926       0.5

简单的 for 循环

结果可以通过沿列名使用简单的 for 循环并添加 sin/cos 列来获得。我测试了一百万列,不到一秒钟就完成了。

df = pd.DataFrame(np.random.uniform(low=0, high=3.14,size=(1000000, 2)), columns=['column1','column2'])
var = [0, .5]
for idx, column in enumerate(df.columns):
df[column + '_sin'] = np.sin(df[column] - var[idx])
df[column + '_cos'] = np.cos(df[column] - var[idx])
df.head()

它为您提供如下输出

column1     column2     column1_sin     column1_cos     column2_sin     column2_cos
0   1.977094    0.705613    0.918590    -0.395211   0.648500    0.761214
1   2.138289    2.246560    0.843252    -0.537519   0.780229    -0.625493
2   2.947415    1.716964    0.192960    -0.981207   0.989336    -0.145648
3   1.738969    0.748142    0.985892    -0.167381   0.680278    0.732954
4   1.136741    1.190389    0.907268    0.420554    0.928513    0.371299

另一种选择

更改axis=1并返回 PD。系列。 示例代码为

d = {'col1': [0, 15, 30, 45, 60], 'col2': [0, 60, 180, 240, 300]}
df = pd.DataFrame(data=d)
def transform(data, var):
return np.sin(data-var).add_suffix('_sin').append(np.cos(data-var).add_suffix('_cos'))
df.apply(transform, axis=1, var=[10,20])

给你输出

col1_sin    col2_sin    col1_cos    col2_cos
0   0.544021    -0.912945   -0.839072   0.408082
1   -0.958924   0.745113    0.283662    -0.666938
2   0.912945    0.219425    0.408082    -0.975629
3   -0.428183   0.088399    -0.903692   0.996085
4   -0.262375   -0.387809   0.964966    -0.921740

最新更新