如何将逐行函数应用于pandas数据帧及其本身的移位版本



我有一个pandas数据帧,我想对每一行应用一个简单的符号和乘法运算,并将行向后两个索引(偏移2(。例如,如果我们有

row_a = np.array([0.45, -0.78, 0.92])
row_b = np.array([1.2, -0.73, -0.46])
sgn_row_a = np.sign(row_a)
sgn_row_b = np.sign(row_b)
result = sgn_row_a * sgn_row_b
result
>>> array([1., 1., -1.])

我尝试过的

import pandas as pd
import numpy as np
np.random.seed(42)
df = pd.DataFrame(np.random.normal(0, 1, (100, 5)), columns=["a", "b", "c", "d", "e"])
def kernel(row_a, row_b):
"""Take the sign of both rows and multiply them"""
sgn_a = np.sign(row_a)
sgn_b = np.sign(row_b)
return sgn_a * sgn_b
def func(data):
"""Apply 'kernel' to the dataframe row-wise, axis=1"""
out = data.apply(lambda x: kernel(x, x.shift(2)), axis=1)
return out

但当我运行该函数时,我会得到以下输出,这是不正确的。它似乎是在移动列而不是行。但当我在轮班操作中尝试不同的axis时,我只得到了错误(ValueError: No axis named 1 for object type Series(

out = func(df)
out
>>>
a   b    c    d    e
0   NaN NaN  1.0 -1.0 -1.0
1   NaN NaN -1.0 -1.0  1.0
2   NaN NaN -1.0  1.0 -1.0
3   NaN NaN -1.0  1.0 -1.0
4   NaN NaN  1.0  1.0 -1.0
..   ..  ..  ...  ...  ...

我所期望的是

out = func(df)
out
>>>
a   b    c    d    e
0    -1.  1.   1.  -1.   1.
1     1. -1.   1.   1.  -1.
2    -1.  1.   1.   1.   1.
3    -1.  1.   1.   1.   1.
4    -1. -1.  -1.   1.  -1.
..   ..  ..  ...  ...  ...

我如何实现上面概述的移位行操作?

似乎最简单的方法是

df.apply(np.sign) * df.shift(2).apply(np.sign)
>>>
a    b    c    d    e
0    NaN  NaN  NaN  NaN  NaN
1    NaN  NaN  NaN  NaN  NaN
2   -1.0  1.0  1.0 -1.0  1.0
3    1.0 -1.0  1.0  1.0 -1.0
4   -1.0  1.0  1.0  1.0  1.0
..   ...  ...  ...  ...  ...

只要在这个转变上加一个负号,就可以转变成另一种方式。

apply用于逐列循环,这里可以将DataFrame传递给np.sign函数:

df = np.sign(df) * np.sign(df.shift(2))
print (df)
a    b    c    d    e
0   NaN  NaN  NaN  NaN  NaN
1   NaN  NaN  NaN  NaN  NaN
2  -1.0  1.0  1.0 -1.0  1.0
3   1.0 -1.0  1.0  1.0 -1.0
4  -1.0  1.0  1.0  1.0  1.0
..  ...  ...  ...  ...  ...
95  1.0  1.0  1.0 -1.0 -1.0
96  1.0  1.0  1.0  1.0 -1.0
97  1.0 -1.0 -1.0  1.0  1.0
98  1.0 -1.0 -1.0 -1.0 -1.0
99 -1.0  1.0  1.0 -1.0 -1.0
[100 rows x 5 columns]

则如果需要移除第一个NaN的行:

#df = df.dropna()
df = df.iloc[2:]
print (df)
a    b    c    d    e
2  -1.0  1.0  1.0 -1.0  1.0
3   1.0 -1.0  1.0  1.0 -1.0
4  -1.0  1.0  1.0  1.0  1.0
5  -1.0  1.0  1.0  1.0  1.0
6  -1.0 -1.0 -1.0  1.0 -1.0
..  ...  ...  ...  ...  ...
95  1.0  1.0  1.0 -1.0 -1.0
96  1.0  1.0  1.0  1.0 -1.0
97  1.0 -1.0 -1.0  1.0  1.0
98  1.0 -1.0 -1.0 -1.0 -1.0
99 -1.0  1.0  1.0 -1.0 -1.0
[98 rows x 5 columns]

最新更新