我在Panda数据帧中存储了以下数据库格式
ID Block
MGKfdkldr Product 1
MGKfdkldr Product 2
MGKfdkldr Product 3
GLOsdasd Product 2
GLOsdasd Product 3
NewNew Product 1
OldOld Product 4
OldOld Product 8
以下是示例数据帧代码
df1 = pd.DataFrame({'ID':['MGKfdkldr','MGKfdkldr','MGKfdkldr','GLOsdasd','GLOsdasd','NewNew','OldOld','OldOld'],'Block':['Product 1','Product 2','Product 3','Product 2','Product 3','Product 1','Product 4','Product 8']})
我正在寻找以下数据格式(预期输出(:
ID Block-1 Block-2 Block-3
MGKfdkldr Product 1 Product 2 Product 3
GLOsdasd Product 2 Product 3
NewNew Product 1
OldOld Product 4 Product 8
我试着用pd.melt
函数来融化它,但它只是将数据转换到列标题,但我正在寻找比特差异。有没有其他方法可以让我得到预期的输出?
有人能帮我吗?请
您要查找的函数是pivot
而不是melt
。您还需要提供一个";计数器";列,只需对重复的"ID"s
进行计数即可使所有内容正确对齐。
df1["Block_id"] = df1.groupby("ID").cumcount() + 1
new_df = (df1.pivot("ID", "Block_id", "Block") # reshapes our data
.add_prefix("Block-") # adds "Block-" to our column names
.rename_axis(columns=None) # fixes funky column index name
.reset_index()) # inserts "ID" as a regular column instead of an Index
print(new_df)
ID Block-1 Block-2 Block-3
0 GLOsdasd Product 2 Product 3 NaN
1 MGKfdkldr Product 1 Product 2 Product 3
2 NewNew Product 1 NaN NaN
3 OldOld Product 4 Product 8 NaN
如果您想要实际的空格(例如空字符串""
(而不是NaN
,则可以使用new_df.fillna("")