我有这样的数据帧
ID Series
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')]
1500 [('forgot data pages info', 0, 22, 'NP')]
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')]
我正在尝试将名为 Series
的列中的文本解析为名为 Series1
Series2
等的不同列,直到解析的最大文本数。
df_parsed = df['Series'].str[1:-1].str.split(', ', expand = True)
像这样:
ID Series Series1 Series2 Series3
1102 [('taxi instructions', 13, 30, 'NP'), ('consistent basis', 31, 47, 'NP'), ('the atc taxi clearance', 89, 111, 'NP')] taxi instructions consistent basis the atc taxi clearance
1500 [('forgot data pages info', 0, 22, 'NP')] forgot data pages info
649 [('hud', 0, 3, 'NP'), ('correctly fotr approach', 12, 35, 'NP')] hud correctly fotr approach
最终结果的格式不容易理解,但也许您可以按照这个概念创建新列:
def process(ls):
return ' '.join([x[0] for x in ls])
df['Series_new'] = df['Series'].apply(lambda x: process(x))
如果你想创建N个新列(N = max_len(Series_list)
(,我想你可以先计算N。然后,按照上述概念正确填写 NaN 以创建 N 个新列。