合并新旧工作 #s 的总计



>我有一个包含费用和发票值的df。有些工作 #s 实际上有2个工作 #s - 一个旧的和新的工作#。我需要对同时具有新旧作业 #s 的工作 #s 的费用和发票值求和;结果需要为一行。然后,我需要删除具有新作业#条目的行。

我有一个Excel文件,列出了每个工作的费用和发票总额。在我的代码中,这是crcy:

Job#    Expenses    Invoice
1          5            2
2         10            27
3         15            33
10        60            4
20        57            21
12         9            36
22        11            18

然后我有一个列出新旧工作 #s 的 excel 文件:

Job#    New Job#
10      20
11      21
12      22

我不太确定在这里使用哪种 Pandas 操作,所以我不知道该尝试什么。任何建议都非常感谢。

import pandas as pd
# Pull in excel data
crcy = pd.read_excel('crcy1.xlsx')
jobs = pd.read_excel('jobs.xlsx')
# Merge on job#
df3 = crcy.merge(jobs, on='Job#', how='outer')
# Drop rows where new job # is pulled in but 
# has no Expense/Invoice entries.
df3 = df3.dropna(thresh=3)
print(df3)

实际结果:

Job#  Expenses  Invoice  New Job#
1       5.0      2.0       0.0
2      10.0     27.0       0.0
3      15.0     33.0       0.0
10     60.0      4.0      20.0
20     57.0     21.0       0.0
12      9.0     36.0      22.0
22     11.0     18.0       0.0

期望的结果:

Job#  Expenses  Invoice  New Job#
1       5.0      2.0       0.0
2      10.0     27.0       0.0
3      15.0     33.0       0.0
10    117.0     25.0      20.0
12     20.0     54.0      22.0
# left merge crcy with jobs to get them on the same row
df = crcy.merge(jobs, how='left')
print(df)
Job#  Expenses  Invoice  New Job#
0     1         5        2       NaN
1     2        10       27       NaN
2     3        15       33       NaN
3    10        60        4      20.0
4    20        57       21       NaN
5    12         9       36      22.0
6    22        11       18       NaN
# then left merge with jobs again but this time match Job# on the left with New Job# on the right
df = df.merge(jobs, left_on='Job#', right_on='New Job#', how='left')
print(df)
Job#_x  Expenses  Invoice  New Job#_x  Job#_y  New Job#_y
0       1         5        2         NaN     NaN         NaN
1       2        10       27         NaN     NaN         NaN
2       3        15       33         NaN     NaN         NaN
3      10        60        4        20.0     NaN         NaN
4      20        57       21         NaN    10.0        20.0
5      12         9       36        22.0     NaN         NaN
6      22        11       18         NaN    12.0        22.0
# fill the missing Job#_y with Job#_x
# you'll use this for grouping the old job # and summing it's expenses and invoices
df['Job#_y'].fillna(df['Job#_x'], inplace=True)
print(df)
Job#_x  Expenses  Invoice  New Job#_x  Job#_y  New Job#_y
0       1         5        2         NaN     1.0         NaN
1       2        10       27         NaN     2.0         NaN
2       3        15       33         NaN     3.0         NaN
3      10        60        4        20.0    10.0         NaN
4      20        57       21         NaN    10.0        20.0
5      12         9       36        22.0    12.0         NaN
6      22        11       18         NaN    12.0        22.0
# do the same with the New Job#_x
df['New Job#_x'].fillna(df['Job#_x'], inplace=True)
print(df)
Job#_x  Expenses  Invoice  New Job#_x  Job#_y  New Job#_y
0       1         5        2         1.0     1.0         NaN
1       2        10       27         2.0     2.0         NaN
2       3        15       33         3.0     3.0         NaN
3      10        60        4        20.0    10.0         NaN
4      20        57       21        20.0    10.0        20.0
5      12         9       36        22.0    12.0         NaN
6      22        11       18        22.0    12.0        22.0
# group on the Job#_y and New Job#_x since they have no nulls and sum the expenses and invoics
# then reset the index
df = df.groupby([
'Job#_y',
'New Job#_x',
])[[
'Expenses',
'Invoice',
]].sum().reset_index()
print(df)
Job#_y  New Job#_x  Expenses  Invoice
0     1.0         1.0         5        2
1     2.0         2.0        10       27
2     3.0         3.0        15       33
3    10.0        20.0       117       25
4    12.0        22.0        20       54
# replace the _letters using .str.replace and regex=True
df.columns = df.columns.str.replace(r'_[a-z]', '', regex=True)
print(df)
Job#  New Job#  Expenses  Invoice
0   1.0       1.0         5        2
1   2.0       2.0        10       27
2   3.0       3.0        15       33
3  10.0      20.0       117       25
4  12.0      22.0        20       54
# set the New Job# to 0 if it equals the old Job#
df['New Job#'].mask(df['Job#'] == df['New Job#'], 0, inplace=True)
print(df)
Job#  New Job#  Expenses  Invoice
0   1.0       0.0         5        2
1   2.0       0.0        10       27
2   3.0       0.0        15       33
3  10.0      20.0       117       25
4  12.0      22.0        20       54

试试这个:

# Rename the columns for easier reference
jobs.columns = ['Old Job#', 'New Job#']
# For each job, find if it has an old Job#
crcy = crcy.merge(jobs, left_on='Job#', right_on='New Job#', how='left')
# The Job# that goes into the report is the Old Job#, if it has that
crcy['Report Job#'] = crcy['Old Job#'].combine_first(crcy['Job#'])
crcy.groupby('Report Job#').agg({
'Expenses': 'sum',
'Invoice': 'sum',
'Old Job#': 'first'
})

结果:

Expenses  Invoice  Old Job#
Report Job#                             
1.0                 5        2       NaN
2.0                10       27       NaN
3.0                15       33       NaN
10.0              117       25      10.0
12.0               20       54      12.0

相关内容

最新更新