我有以下数据帧:
pandas as pd
df = pd.DataFrame({'Text': ['Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.',
'Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.',
'Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.']})
我想做的是,首先,拆分数据帧,然后,将列表转换为数据帧,并创建一列来跟踪一个文本中有多少句子。
分割文本我做:
df.Text.str.split('</p>')
df.Text.str.split('</p>')[0]
正如您所看到的,原始数据帧中的每个元素都包含我拆分的4个句子。我现在想创建一个数据帧,如下所示:
ID Text
1.1 Hello, I have some text.
1.2 I would like to split it into sentences.
1.3 However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
1.4 I also need to convert lists in df which is tricky.
2.1 Hello, I have some text.
2.2 I would like to split it into sentences.
2.3 However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
2.4 I also need to convert lists in df which is tricky.
3.1 Hello, I have some text.
3.2 I would like to split it into sentences.
3.3 However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
3.4 I also need to convert lists in df which is tricky.
有人能帮我做吗?
谢谢!
PS。在真实的例子中,句子并没有像上面那样平均分开。
您可以使用split
来拆分字符串,然后使用explode
来创建新行,最后重新生成索引:
df2 = (df.assign(Text=df['Text'].str.split('</p>'))
.explode('Text')
)
idx = df2.index.to_series().add(1).astype(str)
idx2 = idx.groupby(idx).cumcount().add(1).astype(str)
df2.index = idx+'.'+idx2
输出:
Text
1.1 Hello, I have some text.
1.2 I would like to split it into sentences.
1.3 However, when it comes to splitting I want se...
1.4 I also need to convert lists in df which is t...
2.1 Hello, I have some text.
2.2 I would like to split it into sentences.
2.3 However, when it comes to splitting I want se...
2.4 I also need to convert lists in df which is t...
3.1 Hello, I have some text.
3.2 I would like to split it into sentences.
3.3 However, when it comes to splitting I want se...
3.4 I also need to convert lists in df which is t...