查找Python DataFrame中某个值与特定行值之间的行数

我有以下数据帧df，我想在其中添加"距离"列，这样：

	日期	活动
2022年9月1日	1	0
2022年9月2日	0	1
2022年9月5日	0	2
2022年9月6日	0	3
2022年9月7日	0	4
2022年9月8日	1	0
2022年9月9日	0	1

通过比较1和Series.cumsum创建组，并通过GroupBy.cumcount:对其进行累积计数

df['distance'] = df.groupby(df['active'].eq(1).cumsum()).cumcount()
print (df)
date  active  distance
0  01/09/2022       1         0
1  02/09/2022       0         1
2  05/09/2022       0         2
3  06/09/2022       0         3
4  07/09/2022       0         4
5  08/09/2022       1         0
6  09/09/2022       0         1

您的列可以完全从；活动的"；柱您的公式与相同

count_up = pd.Series(np.arange(len(df)), index=df.index)
distance = count_up - count_up.where(df.active).ffill()

使用cumsum标记活动组。

g = (df['active']==1).cumsum()
df.assign(distance=g.groupby(g).transform(lambda x: range(len(x))))
print(df)

结果

date  active  distance
0  01/09/2022        1         0
1  02/09/2022        0         1
2  05/09/2022        0         2
3  06/09/2022        0         3
4  07/09/2022        0         4
5  08/09/2022        1         0
6  09/09/2022        0         1

肯定有无数种方法都会得到相同的结果。这里有六个：

# ======================================================================
# ----------------------------------------------------------------------
# Provided in another answers (and fixed if necessary)
# Using merely pandas own methods:
df['distance'] = df.groupby(df['active'].eq(1).cumsum()).cumcount()
#     nice pure pandas and short one - in my eyes the best choice
print(df)
# -------------------------------
cnt = pd.Series(np.arange(df.shape[0]), index=df.index)
distance = (cnt-cnt.where(df.active.astype(bool)).ffill()).astype(int)
df['distance'] = distance
#     a much longer pure pandas one
print(df)
# -------------------------------
g = (df['active']==1).cumsum()
df.assign(distance=g.groupby(g).transform(lambda x: range(len(x))))
#     using in addition a function as replacement for .cumcount()
print(df)
# ======================================================================
# ----------------------------------------------------------------------
# Using a loop over values in column 'active':
d=[];c=-1
for i in df['active']:
c+=1 
if i: c = 0
d.append(c)
df["distance"] = d
print(df)
# ----------------------------------------------------------------------
# Using a function  
c = -1
def f(i):
global c    
if i: c=0 
else: c+=1; 
return c
# -------------------------------
# with a list comprehension:
df['distance'] = [ f(i) for i in df['active'] ]
print(df)
# -------------------------------
# or pandas apply() function: 
df['distance'] = df['active'].apply(f)
print(df)

下面是其中一个，包括完整的代码和数据：

import pandas as pd
import numpy  as np
df_print = """
date     active
01/09/2022   1  
02/09/2022   0  
05/09/2022   0  
06/09/2022   0  
07/09/2022   0  
08/09/2022   1  
09/09/2022   0"""
open('df_print', 'w').write(df_print)
df = pd.read_table('df_print', sep=r'sss*' ) # index_col = 0)
print(df)
distance =  []
counter   = -1
for index, row in df.iterrows():
if row['active']:
counter = 0
distance.append(counter)
continue
counter +=1
distance.append(counter)
df["distance"] = distance
print(df)

给出：

date  active
0  01/09/2022       1
1  02/09/2022       0
2  05/09/2022       0
3  06/09/2022       0
4  07/09/2022       0
5  08/09/2022       1
6  09/09/2022       0
date  active  distance
0  01/09/2022       1         0
1  02/09/2022       0         1
2  05/09/2022       0         2
3  06/09/2022       0         3
4  07/09/2022       0         4
5  08/09/2022       1         0
6  09/09/2022       0         1

相关内容

最新更新

热门标签：