Pandas,基于前几行为每一行应用函数



我有一个这样的DataFrame:

date  compound_score  negativity_score  positive_score  
0   2017-12-10        0.361400          0.339500        0.311000   
1   2017-12-11        0.639950          0.216000        0.476000   
2   2017-12-12        0.554286          0.262000        0.464000   
3   2017-12-13        0.715275          0.232250        0.423750   
4   2017-12-14        0.760940          0.221600        0.476200   
5   2017-12-15        0.503886          0.241429        0.391000   
6   2017-12-16        0.372300          0.345333        0.356667   
7   2017-12-17        0.700900          0.163000        0.458000   
8   2017-12-18        0.369733          0.220667        0.364222   
9   2017-12-19        0.176000          0.304000        0.362000   
10  2017-12-20        0.474322          0.262222        0.426778   
11  2017-12-21        0.623620          0.224000        0.435200   
12  2017-12-22        0.488125          0.211375        0.438000   
13  2017-12-23        0.226900          0.121500        0.341500   
14  2017-12-24        0.461800          0.233000        0.545000   
15  2017-12-25        0.686040          0.270800        0.458600   
16  2017-12-26        0.760525          0.212750        0.527250   
17  2017-12-27        0.627575          0.122250        0.463500   
18  2017-12-28        0.579173          0.210182        0.381909   
19  2017-12-29        0.378815          0.239000        0.339846   
20  2017-12-30        0.428200          0.328000        0.349000   
21  2017-12-31       -0.116800          0.507000        0.295000   
22  2018-01-01        0.515433          0.315000        0.417000   
23  2018-01-02        0.380250          0.298250        0.366250   
24  2018-01-03        0.609657          0.277000        0.458714   
25  2018-01-04        0.751067          0.251667        0.465000   
26  2018-01-05        0.207000          0.255750        0.324500   
27  2018-01-06        0.853200          0.127000        0.253000   
28  2018-01-07        0.506800          0.284500        0.350500   
29  2018-01-08        0.499586          0.262571        0.388571   
neutral_score  compound_diff  consecutive_compound  
0        0.349500            NaN                     0  
1        0.308000       0.278550                     1  
2        0.274143      -0.085664                     0  
3        0.344000       0.160989                     1  
4        0.302200       0.045665                     1  
5        0.367429      -0.257054                     0  
6        0.298000      -0.131586                     0  
7        0.379000       0.328600                     1  
8        0.415111      -0.331167                     0  
9        0.333800      -0.193733                     0  
10       0.311000       0.298322                     1  
11       0.340800       0.149298                     1  
12       0.350375      -0.135495                     0  
13       0.537500      -0.261225                     0  
14       0.222000       0.234900                     1  
15       0.270800       0.224240                     1  
16       0.260000       0.074485                     1  
17       0.414000      -0.132950                     0  
18       0.407909      -0.048402                     0  
19       0.420923      -0.200357                     0  
20       0.323000       0.049385                     1  
21       0.197000      -0.545000                     0  
22       0.268000       0.632233                     1  
23       0.335250      -0.135183                     0  
24       0.264429       0.229407                     1  
....

我想在数据帧上应用一个计算函数,该函数取决于每行前面的14行。

我试图从行本身传递一个移位的数据帧,但我无法完全理解如何将函数传递到当前行,并在函数中移回14天。

我尝试了以下操作,全部返回楠或引发错误:

def get_up_down_pct_ratio(df):
up_days_pct = df.loc[df[COMPOUND_DIFF] > 0, COMPOUND_DIFF].sum()
fall_days_pct = df.loc[df[COMPOUND_DIFF] < 0, COMPOUND_DIFF].sum()
total = up_days_pct + fall_days_pct
return percent(up_days_pct, total)
d['up_down_ratio'] = d.apply(lambda x: get_up_down_pct_ratio(x.shift(14)),axis=1)

这刚刚把楠分配到列

def get_up_down_pct_ratio(row):
up_days_pct = row[row['compound_diff'] > 0, 'compound_diff'].sum()
fall_days_pct = row[row['compound_diff'] > 0, 'compound_diff'].sum()
total = up_days_pct + fall_days_pct
return percent(up_days_pct, total)
a['up_down_pct_ration'] = a.apply(lambda row: get_up_down_pct_ratio(row))

出现错误:

ValueError: key of type tuple not found and not a MultiIndex

有几件事需要注意。

  1. apply((上需要轴=1
  2. 需要处理NaN案例

下面是不同的方法。Ie.创建了一个类来累积14天的周期并处理所有情况:UP天vsFALL天。

class accumulate(object):
def __init__(self):
self.accumList = [0 for n in range(14)]
def newDate(self, v, up=True):
self.accumList[0:13] = self.accumList[1:]
v = float(v)
if (v+0.0) != v:
# remove NaN 
v = 0.0
elif up and (v < 0) :
# Value > 0
v = 0.0
elif (not up) and (v > 0) :
# track Value < 0
v = 0.0
self.accumList[13] = v
return sum(self.accumList)
a = accumulate()
df['up'] = df.apply(lambda r: a.newDate(r.compound_diff), axis=1)
a = accumulate() # restart rolling amounts
df['fall'] = df.apply(lambda r: a.newDate(r.compound_diff, up=False), axis=1)
df['pct'] = df.up / (df.up + df.fall)
df.head()

@frankr6591的回答并没有让我到达需要的地方,它确实让我朝着正确的方向前进。

我需要在这个数据帧上以多种方式应用这个逻辑,所以我创建了一个更简单、更通用的函数:它确实需要更多的优化,但目前,它可以很好地处理传递给它的不同列

def calculate_two_weeks_data(new_col_name, col_to_run_on):
def calculate_ratio_value(row, df_, col):
index = row['index']
start_idx = index - 14
if start_idx < 0:
return None
else:
prev_rows = df_.iloc[start_idx:index]
col_to_list = prev_rows[col].tolist()
up_values = 0
down_values = 0
for value in col_to_list:
if value > 0:
up_values += value
else:
down_values += value
up_ratio = up_values / (up_values + down_values)
return up_ratio
df.reset_index(inplace=True)
df[new_col_name] = df.apply(calculate_ratio_value, args=[df, col_to_run_on], axis=1)
df.dropna(inplace=True)
return df

相关内容

  • 没有找到相关文章

最新更新