我有类似thi:的数据帧
df
ID Brands Age Gender City
1 BMW_Audi_VW 50 M Milano
2 VW_BMW 45 F SF
我想在"_"上拆分Brands
列,并希望复制除City
之外的所有列
我可以这样做基于ID栏:
df = df.set_index('ID').stack().str.split('_', expand=True).unstack(-1).stack(0).reset_index()
但它只复制ID列。我需要所有栏,但不需要"城市">
这是我正在寻找的所需输出:
ID Brands Age Gender City
1 BMW 50 M Milano
1 Audi 50 M None
1 VW 50 M None
2 VW 45 F SF
2 BMW 45 F None
通过Series.str.split
使用具有拆分列值的DataFrame.explode
,然后通过DataFrame.mask
:设置None
df = df.assign(Brands = df['Brands'].str.split('_')).explode('Brands')
include = ['ID','Brands','Age','Gender']
cols = df.columns.difference(include)
df[cols] = df[cols].mask(df.index.to_series().duplicated(), None)
df = df.reset_index(drop=True)
print (df)
ID Brands Age Gender City
0 1 BMW 50 M Milano
1 1 Audi 50 M None
2 1 VW 50 M None
3 2 VW 45 F SF
4 2 BMW 45 F None
编辑:
检查差异:
#Brands column is assigned to Brands column (to same column)
df1= df.assign(Brands = df['Brands '].str.split('_')).explode('Brands')
print (df1)
ID Brands Age Gender City
0 1 BMW 50 M Milano
0 1 Audi 50 M Milano
0 1 VW 50 M Milano
1 2 VW 45 F SF
1 2 BMW 45 F SF
#Brands column is assigned to Brands1 column (to another column)
df2 = df.assign(Brands1 = df['Brands'].str.split('_')).explode('Brands')
print (df2)
ID Brands Age Gender City Brands1
0 1 BMW_Audi_VW 50 M Milano [BMW, Audi, VW]
1 2 VW_BMW 45 F SF [VW, BMW]