如何在Python中为数据帧中的列表元素建立索引



我有以下数据帧:

pandas as pd
df = pd.DataFrame({'Text': ['Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.',
'Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.',
'Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.']})

我想做的是,首先,拆分数据帧,然后,将列表转换为数据帧,并创建一列来跟踪一个文本中有多少句子。

分割文本我做:

df.Text.str.split('</p>')
df.Text.str.split('</p>')[0]

正如您所看到的,原始数据帧中的每个元素都包含我拆分的4个句子。我现在想创建一个数据帧,如下所示:

ID    Text
1.1   Hello, I have some text.
1.2  I would like to split it into sentences. 
1.3  However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
1.4  I also need to convert lists in df which is tricky.
2.1  Hello, I have some text.
2.2  I would like to split it into sentences. 
2.3  However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
2.4 I also need to convert lists in df which is tricky.
3.1  Hello, I have some text.
3.2  I would like to split it into sentences. 
3.3 However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
3.4 I also need to convert lists in df which is tricky.

有人能帮我做吗?

谢谢!

PS。在真实的例子中,句子并没有像上面那样平均分开。

您可以使用split来拆分字符串,然后使用explode来创建新行,最后重新生成索引:

df2 = (df.assign(Text=df['Text'].str.split('</p>'))
.explode('Text')
)
idx = df2.index.to_series().add(1).astype(str)
idx2 = idx.groupby(idx).cumcount().add(1).astype(str)
df2.index = idx+'.'+idx2

输出:

Text
1.1                           Hello, I have some text.
1.2          I would like to split it into sentences. 
1.3   However, when it comes to splitting I want se...
1.4   I also need to convert lists in df which is t...
2.1                           Hello, I have some text.
2.2          I would like to split it into sentences. 
2.3   However, when it comes to splitting I want se...
2.4   I also need to convert lists in df which is t...
3.1                           Hello, I have some text.
3.2          I would like to split it into sentences. 
3.3   However, when it comes to splitting I want se...
3.4   I also need to convert lists in df which is t...

最新更新