pandas findall()能返回str而不是list吗



我有一个pandas数据帧,其中包含许多变量:

df.columns
Out[0]: 
Index(['COUNADU_SOIL_P_NUMBER_16_DA_B_VE_count_nr_lesion_PRATZE',
'COUNEGG_SOIL_P_NUMBER_50_DA_B_VT_count_nr_lesion_PRATZE',
'COUNJUV_SOIL_P_NUMBER_128_DA_B_V6_count_nr_lesion_PRATZE',
'COUNADU_SOIL_P_SAUDPC_150_DA_B_V6_lesion_saudpc_PRATZE',
'CONTRO_SOIL_P_pUNCK_150_DA_B_V6_lesion_p_control_PRATZE',
'COUNJUV_SOIL_P_p_0_100_16_DA_B_V6_lesion_incidence_PRATZE',
'COUNADU_SOIL_P_p_0_100_50_DA_B_VT_lesion_incidence_PRATZE',
'COUNEGG_SOIL_P_p_0_100_128_DA_B_VT_lesion_incidence_PRATZE',
'COUNEGG_SOIL_P_NUMBER_50_DA_B_V6_count_nr_spiral_HELYSP',
'COUNJUV_SOIL_P_NUMBER_128_DA_B_V10_count_nr_spiral_HELYSP', # and so on

我只想保留后面跟着DA的数字,所以第一列是16_DA。我一直在使用熊猫功能findall():

df.columns.str.findall(r'[0-9]*_DA')
Out[595]: 
Index([ ['16_DA'],  ['50_DA'], ['128_DA'], ['150_DA'], ['150_DA'],
['16_DA'],  ['50_DA'], ['128_DA'],  ['50_DA'], ['128_DA'], ['150_DA'],
['150_DA'],  ['50_DA'], ['128_DA'],

但这会返回一个列表,我希望避免它,这样我最终会得到一个列索引,如下所示:

df.columns
Out[595]: 
Index('16_DA',  '50_DA', '128_DA', '150_DA', '150_DA',
'16_DA',  '50_DA', '128_DA',  '50_DA', '128_DA', '150_DA',

有更顺畅的方法吗?

您可以使用.str.join(", ")用逗号和空格连接所有找到的匹配项:

df.columns.str.findall(r'd+_DA').str.join(", ")

或者,只需使用str.extract即可获得第一个匹配:

df.columns.str.extract(r'(d+_DA)', expand=False)
from typing import List

pattern = r'[0-9]*_DA'
flattened: List[str] = sum(df.columns.str.findall(pattern), [])
output: str = ",".join(flattened)

最新更新