将列表列表转换为字符串熊猫数据帧



Background

我有以下玩具df,其中包含BeforeAfter列中的列表,如下所示

import pandas as pd
before = [list(['in', 'the', 'bright', 'blue', 'box']), 
list(['because','they','go','really','fast']), 
list(['to','ride','and','have','fun'])]
after = [list(['there', 'are', 'many', 'different']), 
list(['i','like','a','lot','of', 'sports']), 
list(['the','middle','east','has','many'])]
df= pd.DataFrame({'Before' : before, 
'After' : after,
'P_ID': [1,2,3], 
'Word' : ['crayons', 'cars', 'camels'],
'N_ID' : ['A1', 'A2', 'A3']
})

输出

After                Before                     N_ID P_ID   Word
0   [in, the, bright, blue, box]        [there, are, many, different]   A1  1   crayons
1   [because, they, go, really, fast]   [i, like, a, lot, of, sports ]  A2  2   cars
2   [to, ride, and, have, fun]        [the, middle, east, has, many]    A3  3   camels

问题

使用以下代码块:

df.loc[:, ['After', 'Before']] = df[['After', 'Before']].apply(lambda x: x.str[0].str.replace(',', ''))取自删除逗号和取消列出数据帧会产生以下输出:

接近我想要但不完全是的输出

After   Before  N_ID  P_ID  Word
0   in      there    A1    1    crayons
1   because  i       A2    2    cars
2   to      the      A3    3    camels

这个输出很接近,但不是我想要的,因为AfterBefore列只有一个单词输出(例如there)当我想要的输出如下所示时:

期望的输出

After                           Before               N_ID  P_ID  Word
0 in the bright blue box        there are many different  A1    1   crayons
1 because they go really fast   i like a lot of sports    A2    2   cars
2 to ride and have fun         the middle east has many   A3    3   camels

问题

如何获得所需的输出

agg+join. 逗号不存在于列表中,它们只是列表__repr__的一部分。


str_cols = ['Before', 'After']
d = {k: ' '.join for k in str_cols}
df.agg(d).join(df.drop(str_cols, 1))

Before                     After  P_ID     Word N_ID
0       in the bright blue box  there are many different     1  crayons   A1
1  because they go really fast    i like a lot of sports     2     cars   A2
2         to ride and have fun  the middle east has many     3   camels   A3

如果您希望就地(更快):

df[str_cols] = df.agg(d)

applymap

排队

具有所需结果的数据帧的新副本

df.assign(**df[['After', 'Before']].applymap(' '.join))
Before                     After  P_ID     Word N_ID
0       in the bright blue box  there are many different     1  crayons   A1
1  because they go really fast    i like a lot of sports     2     cars   A2
2         to ride and have fun  the middle east has many     3   camels   A3

到位

改变现有df

df.update(df[['After', 'Before']].applymap(' '.join))
df
Before                     After  P_ID     Word N_ID
0       in the bright blue box  there are many different     1  crayons   A1
1  because they go really fast    i like a lot of sports     2     cars   A2
2         to ride and have fun  the middle east has many     3   camels   A3

stackstr.join

我们可以以类似的"内联"和"就地"方式使用此结果,如上所示。

df[['After', 'Before']].stack().str.join(' ').unstack()
After                       Before
0  there are many different       in the bright blue box
1    i like a lot of sports  because they go really fast
2  the middle east has many         to ride and have fun

我们可以指定要转换为字符串的列表,然后在 for 循环中使用.apply

lst_cols = ['Before',  'After']
for col in lst_cols:
df[col] = df[col].apply(' '.join)
Before                     After  P_ID     Word N_ID
0       in the bright blue box  there are many different     1  crayons   A1
1  because they go really fast    i like a lot of sports     2     cars   A2
2         to ride and have fun  the middle east has many     3   camels   A3

最新更新