如何搜索整个数据帧并返回下一个匹配的值



我有以下示例数据集:

Protocol Number:    xx-yzm2 

Section Major       Task                        Budget
1                   Study Setup                 25303.18
2                   Study Setup Per-Location    110037.8
3                   Site Identified by CRO      29966.25
4                   Pre-study Site Visit (PSSV) 130525.92

我想用contains搜索整个数据帧,并传递关键字"protocol"并返回其旁边的值。

理论上,表单可能会更改,因此我无法按列进行筛选。这可能和熊猫有关吗?

输入关键字为:protocol输出为xx-yzm2

您可以尝试以下操作:

import pandas as pd
import numpy as np
data = {0: ['Protocol Number:', np.nan, 'Section Major', '1', '2', '3', '4'],
1: ['xx-yzm2', np.nan, 'Task', 'Study Setup', 'Study Setup Per-Location', 
'Site Identified by CRO', 'Pre-study Site Visit (PSSV)'],
2: [np.nan, np.nan, 'Budget', '25303.18', '110037.8', '29966.25', '130525.92']}
df = pd.DataFrame(data)
0                            1          2
0  Protocol Number:                      xx-yzm2        NaN
1               NaN                          NaN        NaN
2     Section Major                         Task     Budget
3                 1                  Study Setup   25303.18
4                 2     Study Setup Per-Location   110037.8
5                 3       Site Identified by CRO   29966.25
6                 4  Pre-study Site Visit (PSSV)  130525.92
keyword = 'protocol'
# case-insensitive: case=False
# row: array([0], dtype=int64), col: array([0], dtype=int64)
row, col = np.where(df.apply(lambda x: x.astype(str).str.
contains(keyword, case=False)))
result = df.iat[row[0],col[0]+1]
print(result)
# xx-yzm2

如果您有多个匹配项,则以上操作将仅为第一个匹配项。如果您想获得所有匹配项,只需使用循环。在这种情况下,可能会添加一些检查来错误处理边界情况。

for i in range(len(row)):
if not col[i]+1 == len(df.columns):
print(df.iat[row[i],col[i]+1])
else:
# error handle, you're keyword was found in last column, 
# i.e. there is no `next` col
pass

相关内容

最新更新