合并 Panda 数据帧中所选前几行的字符串



>我有一个最初从 excel 读入的数据帧,如下所示 输入: 输入表

我想将其转换如下,

输出: 输出表

你能帮我怎么用熊猫做到这一点吗?

DACbuilder,

下面是一个快速工作示例:

df = pd.DataFrame(data=[
['heading', 'toys', ],
['item', 'car', 4],
['item', 'bike', 3],
['heading', 'food', ],
['item', 'apple', 2],
['item', 'orange', 5]],
columns=('type', 'name', 'cnt'))  # sample data
df.loc[df.type=='heading','heading'] = df.name  # create column with headings
df.heading.fillna(method='ffill', inplace=True)  # fill headings down for corresponding items
df.loc[df.type=='item', 'name'] = df.heading + ': ' + df.name  # combine headings with names
df.loc[df.type=='item', ['name','cnt']]  # this is the desired output
name  cnt
1     toys: car  4.0
2    toys: bike  3.0
4   food: apple  2.0
5  food: orange  5.0

您可以使用的另一种可能性是根据标题创建行组。下面是示例:

import pandas as pd
df = pd.DataFrame({
"Type": [ "Heading", "item", "item", "Heading", "item", "item" ],
"Name": [ "construction", "cement", "sand", "fruit", "apple", "orange" ],
"Count": [ 0, 3, 0, 0, 7, 78 ]
})
# Creating groups based on Type == "Heading"
df["group_id"] = ( df.Type == "Heading" ).cumsum()
# Output:
#       Type          Name  Count  group_id
# 0  Heading  construction      0         1
# 1     item        cement      3         1
# 2     item          sand      0         1
# 3  Heading         fruit      0         2
# 4     item         apple      7         2
# 5     item        orange     78         2
# For each group, set to all rows the respective heading
df = df.groupby("group_id")
df = df.apply(lambda p: p.assign(Heading=p.Name[ p.Type == "Heading" ].values[0]))
df.reset_index(drop=True, inplace=True)
# Output:
#       Type          Name  Count  group_id       Heading
# 0  Heading  construction      0         1  construction
# 1     item        cement      3         1  construction
# 2     item          sand      0         1  construction
# 3  Heading         fruit      0         2         fruit
# 4     item         apple      7         2         fruit
# 5     item        orange     78         2         fruit
# Transforming the name, and selecting right rows
df["Name"] = df.Heading + ': ' + df.Name
df.loc[ df.Type != "Heading", ["Type", "Name", "Count"]]
# Output:
#    Type                  Name  Count
# 1  item  construction: cement      3
# 2  item    construction: sand      0
# 4  item          fruit: apple      7
# 5  item         fruit: orange     78

管道版本:

df 
.assign(group_id=lambda p: ( p.Type == "Heading" ).cumsum()) 
.groupby("group_id") 
.apply(lambda p: p.assign(Heading=p.Name[ p.Type == "Heading" ].values[0])) 
.reset_index(drop=True) 
.assign(Name=lambda p: p.Heading + ': ' + p.Name) 
.query("Type != 'Heading'") 
.get([ "Type", "Name", "Count" ])

最新更新