熊猫解析文本列



我有一个csv表,其中有一列包含聊天日志中的文本。每个文本行遵循相同的格式消息的人名和时间(带有额外的前后空格填充(,然后是消息内容。文本列的单行示例:

'  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.'

我想把这个单一的字符串列转换成多个列(列的数量取决于消息的数量(,每个单独的消息有一列,如下所示:

  • Siri (3:15pm) Hello how can I help you
  • John Wayne (3:17pm) what day of the week is today
  • Siri (3:18pm) it is Monday

如何解析pandas数据帧列中的文本,将聊天日志分隔为单独的消息列?

如果您有这个数据帧:

                                               Messages
0  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.

那么你可以做:

x = df["Messages"].str.split(r"s{2,}").explode()
out = (x[::2] + " " + x[1::2]).to_frame()
print(out)

打印:

Messages
0            Siri (3:15pm) Hello how can I help you?
0  John Wayne (3:17pm) what day of the week is today
0                        Siri (3:18pm) it is Monday.

注意:只有在Name和Text之间有2个以上空格时才有效

我就是这样做的,花了我一段时间,但我们做到了!

s = pd.Series(['  Siri (3:15pm)  Hello how can I help you?  John Wayne (3:17pm)  what day of the week is today  Siri (3:18pm)  it is Monday.'])
s = s.str.split(r"  ", expand=True)
s = s.drop(labels=[0], axis=1)
s = s.transpose()
for i in s.index:
list_1 = list(s[0])
odd_i = []
even_i = []
for i in range(0, len(list_1)):
if i % 2:
even_i.append(list_1[i])
else :
odd_i.append(list_1[i])
d = {'Name': odd_i, 'Message': even_i}
df = pd.DataFrame(data=d)
df
Output:
Name                               Message
0         Siri (3:15pm)             Hello how can I help you?
1   John Wayne (3:17pm)         what day of the week is today
2         Siri (3:18pm)                         it is Monday.

相关内容

  • 没有找到相关文章

最新更新