Python:在数据帧中每三个字拆分一个字符串

我已经四处寻找了一段时间，但似乎找不到这个小问题的答案。

我有这样的代码，应该在每三个单词后分割字符串：

import pandas as pd
import numpy as np
df1 = {
'State':['Arizona AZ asdf hello abc','Georgia GG asdfg hello def','Newyork NY asdfg hello ghi','Indiana IN asdfg hello jkl','Florida FL ASDFG hello mno']}
df1 = pd.DataFrame(df1,columns=['State'])
df1
def splitTextToTriplet(df):
text = df['State'].str.split()
n = 3
grouped_words = [' '.join(str(text[i:i+n]) for i in range(0,len(text),n))]
return grouped_words
splitTextToTriplet(df1)

目前的输出如下：

['0     [Arizona, AZ, asdf, hello, abc]n1    [Georgia, GG, asdfg, hello, def]nName: State, dtype: object 2    [Newyork, NY, asdfg, hello, ghi]n3    [Indiana, IN, asdfg, hello, jkl]nName: State, dtype: object 4    [Florida, FL, ASDFG, hello, mno]nName: State, dtype: object']

但我实际上期望在数据帧上的5行、一列中输出：

['Arizona AZ asdf', 'hello abc']
['Georgia GG asdfg', 'hello def']
['Newyork NY asdfg', 'hello ghi']
['Indiana IN asdfg', 'hello jkl']
['Florida FL ASDFG', 'hello mno']

如何更改正则表达式，使其产生预期的输出？

为了提高效率，您可以使用正则表达式和str.extractall+groupby/agg:

(df1['State']
.str.extractall(r'((?:w+bs*){1,3})')[0]
.groupby(level=0).agg(list)
)

输出：

0     [Arizona AZ asdf , hello abc]
1    [Georgia GG asdfg , hello def]
2    [Newyork NY asdfg , hello ghi]
3    [Indiana IN asdfg , hello jkl]
4    [Florida FL ASDFG , hello mno]

正则表达式：

(             # start capturing
(?:w+bs*)  # words
{1,3}         # the maximum, up to three
)             # end capturing

你可以做：

def splitTextToTriplet(row):
text = row['State'].split()
n = 3
grouped_words = [' '.join(text[i:i+n]) for i in range(0,len(text),n)]
return grouped_words
df1.apply(lambda row: splitTextToTriplet(row), axis=1)

其输出以下数据帧：

	0
0	[Arizona AZ asdf'，'hello abc']
1	['Georgia GG asdfg'，'hello-def']
2	[‘纽约纽约asdfg’，‘你好ghi’]
3	[印度输入asdfg'，'hellojkl']
4	[佛罗里达州佛罗里达州ASDFG'，'hello mno']

相关内容

最新更新

热门标签：