>我有一个最初从 excel 读入的数据帧,如下所示 输入: 输入表
我想将其转换如下,
输出: 输出表
你能帮我怎么用熊猫做到这一点吗?
DACbuilder,
下面是一个快速工作示例:
df = pd.DataFrame(data=[
['heading', 'toys', ],
['item', 'car', 4],
['item', 'bike', 3],
['heading', 'food', ],
['item', 'apple', 2],
['item', 'orange', 5]],
columns=('type', 'name', 'cnt')) # sample data
df.loc[df.type=='heading','heading'] = df.name # create column with headings
df.heading.fillna(method='ffill', inplace=True) # fill headings down for corresponding items
df.loc[df.type=='item', 'name'] = df.heading + ': ' + df.name # combine headings with names
df.loc[df.type=='item', ['name','cnt']] # this is the desired output
name cnt
1 toys: car 4.0
2 toys: bike 3.0
4 food: apple 2.0
5 food: orange 5.0
您可以使用的另一种可能性是根据标题创建行组。下面是示例:
import pandas as pd
df = pd.DataFrame({
"Type": [ "Heading", "item", "item", "Heading", "item", "item" ],
"Name": [ "construction", "cement", "sand", "fruit", "apple", "orange" ],
"Count": [ 0, 3, 0, 0, 7, 78 ]
})
# Creating groups based on Type == "Heading"
df["group_id"] = ( df.Type == "Heading" ).cumsum()
# Output:
# Type Name Count group_id
# 0 Heading construction 0 1
# 1 item cement 3 1
# 2 item sand 0 1
# 3 Heading fruit 0 2
# 4 item apple 7 2
# 5 item orange 78 2
# For each group, set to all rows the respective heading
df = df.groupby("group_id")
df = df.apply(lambda p: p.assign(Heading=p.Name[ p.Type == "Heading" ].values[0]))
df.reset_index(drop=True, inplace=True)
# Output:
# Type Name Count group_id Heading
# 0 Heading construction 0 1 construction
# 1 item cement 3 1 construction
# 2 item sand 0 1 construction
# 3 Heading fruit 0 2 fruit
# 4 item apple 7 2 fruit
# 5 item orange 78 2 fruit
# Transforming the name, and selecting right rows
df["Name"] = df.Heading + ': ' + df.Name
df.loc[ df.Type != "Heading", ["Type", "Name", "Count"]]
# Output:
# Type Name Count
# 1 item construction: cement 3
# 2 item construction: sand 0
# 4 item fruit: apple 7
# 5 item fruit: orange 78
管道版本:
df
.assign(group_id=lambda p: ( p.Type == "Heading" ).cumsum())
.groupby("group_id")
.apply(lambda p: p.assign(Heading=p.Name[ p.Type == "Heading" ].values[0]))
.reset_index(drop=True)
.assign(Name=lambda p: p.Heading + ': ' + p.Name)
.query("Type != 'Heading'")
.get([ "Type", "Name", "Count" ])