根据 Pandas 数据框中的另一列文本拆分一列的文本



我的数据帧中有两列,"主题"one_answers"描述"。我正试图通过从"主题"列中拆分文本上的数据来清理"描述"列,因为它包含在"描述"的所有行中。

以下是主题栏的一个片段:

Subject
1     Question about the program   
2  Technical issue with the site    

描述栏:

Description 
1  An HTML only email was received and a rough conversion is below. 
Please refer to the Emails related list for the HTML contents of the 
message. Question about the program Hello Hello I was wondering if there 
is going to be a product review coming up soon?
2  An HTML only email was received and a rough conversion is below. 
Please refer to the Emails related list for the HTML contents of the 
message. Technical issue with the site Reviews I received emails stating 
that I need to rewrite two of my reviews    

例如,在第1行,我希望在描述列的第一行拆分"关于程序的问题",并只捕获该字符串之后的文本。

我试过了df['Description'] = df.apply(lambda x: x['Description'].split(x['Subject'], 1), axis=1)['Description']但我运气不好,在一个描述中不包含标题的索引上得到了错误"TypeError:('must be str or None,not float'("。如何处理不包含该确切文本的行,同时拆分包含该文本的行?

如有任何帮助,我们将不胜感激。非常感谢。

我也尝试了建议的响应,但出现了这个错误。IndexError: ('list index out of range', 'occurred at index 1')

您需要将df['Description']中的字符串与Subject中的特定值进行拆分,并在拆分后取后面的部分。

df.apply(lambda x: x['Description'].split(x['Subject'])[1], axis=1)

输出:

0     Hello Hello I was wondering if there is going...
1     Reviews I received emails stating that I need...

相关内容

  • 没有找到相关文章

最新更新