通过复制pandas数据帧中的其他列来堆叠值



我有类似thi:的数据帧

df

ID              Brands          Age           Gender           City
1             BMW_Audi_VW       50             M               Milano
2               VW_BMW          45             F               SF

我想在"_"上拆分Brands列,并希望复制除City之外的所有列

我可以这样做基于ID栏:

df = df.set_index('ID').stack().str.split('_', expand=True).unstack(-1).stack(0).reset_index()

但它只复制ID列。我需要所有栏,但不需要"城市">

这是我正在寻找的所需输出

ID              Brands          Age           Gender           City
1             BMW               50             M               Milano
1             Audi              50             M               None
1             VW                50             M               None
2             VW                45             F               SF
2             BMW               45             F               None

通过Series.str.split使用具有拆分列值的DataFrame.explode,然后通过DataFrame.mask:设置None

df = df.assign(Brands = df['Brands'].str.split('_')).explode('Brands')
include = ['ID','Brands','Age','Gender']
cols = df.columns.difference(include)
df[cols] = df[cols].mask(df.index.to_series().duplicated(), None)
df = df.reset_index(drop=True)
print (df)
ID Brands  Age Gender    City
0   1    BMW   50      M  Milano
1   1   Audi   50      M    None
2   1     VW   50      M    None
3   2     VW   45      F      SF
4   2    BMW   45      F    None

编辑:

检查差异:

#Brands column is assigned to Brands column (to same column)
df1= df.assign(Brands = df['Brands '].str.split('_')).explode('Brands')
print (df1)
ID Brands  Age Gender    City
0   1    BMW   50      M  Milano
0   1   Audi   50      M  Milano
0   1     VW   50      M  Milano
1   2     VW   45      F      SF
1   2    BMW   45      F      SF
#Brands column is assigned to Brands1 column (to another column)
df2 = df.assign(Brands1 = df['Brands'].str.split('_')).explode('Brands')
print (df2)
ID       Brands  Age Gender    City          Brands1
0   1  BMW_Audi_VW   50      M  Milano  [BMW, Audi, VW]
1   2       VW_BMW   45      F      SF        [VW, BMW]

最新更新