我希望通过df进行解析,并根据条件返回特定的行。具体来说,对于等于A
的ColString
,我希望返回等于任一'B','C' or 'D'
的下一行。如果在下一个A之前这些值都不存在,那么我希望返回一条消息,声明No item found for:
IDString或类似的内容。
注:此处不适用isin
:
Item = ['A','B','C','D']
df = df[df['String'].isin(Item)]
因为我只想返回出现在A
之后的'B','C' or 'D'
的第一个字符串。我不希望所有这些字符串出现在任何位置。我还想确定A是否存在,但在下一个A
之前没有'B','C' or 'D'
。
import pandas as pd
df1 = pd.DataFrame({
'String' : ['A','F','B','C','D','A','X','C','B','D','A','Y','A','C','A','D','C','B'],
'ID' : [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18],
})
mask = df1.loc[df1['String'] == 'A', 'ID']
# Next value required
# Something like. where mask == True find next row equal to val in item list.
# If None: print(No item found for: ID)
item = ['B','C','D']
预期输出:
String ID
0 A 1
2 B 3
5 A 6
7 C 8
10 A 11
12 A 13
13 C 14
14 A 15
15 D 16
No item found for: ID 11
这里有一个解决方案,需要几个步骤来澄清:
df1["is_a"] = df1.String == "A"
df1["a_group"] = df1.is_a.cumsum()
def foo(s):
matches = s[s.String.isin(["B", "C", "D"]) ]
if len(matches):
return s.loc[[s.index[0], matches.index[0]]]
else:
print(f"No match for A in location {s.index[0]}")
df1.groupby("a_group").apply(foo)
输出为:
No match for A in location 10
String ID is_a a_group
a_group
1 0 A 1 True 1
2 B 3 False 1
2 5 A 6 True 2
7 C 8 False 2
4 12 A 13 True 4
13 C 14 False 4
5 14 A 15 True 5
15 D 16 False 5
使用shift
的解决方案
item = ['A', 'B','C','D']
# filter df
df1 = df1.loc[df1.String.isin(item)]
# find missing ID's using shift
for line in df1[(df1.String == 'A') & (df1.String.shift(-1) =='A')].iterrows():
print(f'No item found for: ID {line[1].ID}')
# find final df using shift
print(df1.loc[(df1.String == 'A') | ((df1.String.shift(1) == 'A') & (df1.String.isin(item)))])
#prints
No item found for: ID 11
String ID
0 A 1
2 B 3
5 A 6
7 C 8
10 A 11
12 A 13
13 C 14
14 A 15
15 D 16