Python -有人介意帮助解决这个Pandas数据框架问题吗?紧迫



我在使用Pandas的合并功能时遇到了一些困难。我正在寻找某种Vlookup公式来帮助我。然而,我无法解决我的问题。我的数据是巨大的,我不能在这里分享,因为保密。然而,我试图在这里得出类似的数据。

新代码名称tbody> <<tr>
旧代码发票日期
1001011NA奶酪蛋糕02/02/2021
1001012NA咖啡03/05/2021
1001011NA奶酪蛋糕30/05/2021
NA2002093茉莉花茶21/08/2021
NA2002042饼干31/12/2021
NA2002080咖啡09/01/2022
NA2002093茉莉花茶05/05/2022
NA2002058奶酪蛋糕07/06/2022

合并时需要在两边都有月份和代码号,所以:

  1. 在发票数据框中创建与成本表
  2. 一致的年-月列
  3. 合并两个成本表
  4. 分别合并新代码和旧代码

import pandas as pd
import io
import datetime
invoice_data_text = '''Old Code New Code    Name    Invoice Date
1001011 NA  Cheese Cake 02/02/2021
1001012 NA  Coffee  03/05/2021
1001011 NA  Cheese Cake 30/05/2021
NA  2002093 Jasmine Tea 21/08/2021
NA  2002042 Cookies 31/12/2021
NA  2002080 Coffee  09/01/2022
NA  2002093 Jasmine Tea 05/05/2022
NA  2002058 Cheese Cake 07/06/2022
'''
cost_2021_text = '''
Old Code    New Code    Name    Jan-21  Feb-21  Mar-21  Apr-21  May-21  June-21 Jul-21  Aug-21  Sep-21  Oct-21  Nov-21  Dec-21
1001011 2002058 Cheese Cake 50  51  50  53  54  52  55  53  50  52  53  53
1001012 2002080 Coffee  5   6   5   6   6   5   7   5   6   5   6   6
1001015 2002093 Jasmine Tea 4   3   3   4   4   3   5   3   3   3   3   4
1001020 2002042 Cookies 20  20  21  20  22  20  21  20  22  20  21  22
'''
cost_2022_text = '''
Old Code    New Code    Name    Jan-22  Feb-22  Mar-22  Apr-22  May-22  June-22 Jul-22  Aug-22  Sep-22  Oct-22  Nov-22  Dec-22
1001011 2002058 Cheese Cake 52  52  55  55  56  52  NA  NA  NA  NA  NA  NA
1001012 2002080 Coffee  5   6   5   6   6   6.5 NA  NA  NA  NA  NA  NA
1001015 2002093 Jasmine Tea 4   3   3   5   5   5.5 NA  NA  NA  NA  NA  NA
1001020 2002042 Cookies 22  22  23  23  23.5    23  NA  NA  NA  NA  NA  NA
'''
# Prepare
invoice_df = pd.read_csv(io.StringIO(invoice_data_text),sep="t",parse_dates=["Invoice Date"])
cost21 = pd.read_csv(io.StringIO(cost_2021_text),sep='t')
cost22 =  pd.read_csv(io.StringIO(cost_2022_text),sep='t')
# Create Month column for merging
invoice_df["Month"] = invoice_df["Invoice Date"].map(lambda x:datetime.datetime.strftime(x,"%b-%y"))
# Combine two cost tables
cost21_stack = cost21.set_index(list(cost21.columns[:3])).stack().reset_index(name="Cost")
cost22_stack = cost22.set_index(list(cost22.columns[:3])).stack().reset_index(name="Cost")
cost_table = pd.concat([cost21_stack,cost22_stack]).rename({"level_3":"Month"},axis=1)
# Merge with new code and old code respectively
old_code_result = pd.merge(invoice_df[pd.isna(invoice_df["Old Code"]) == False], cost_table[["Old Code","Month","Cost"]], on=["Old Code","Month"] ,how="left")
new_code_result = pd.merge(invoice_df[pd.isna(invoice_df["New Code"]) == False], cost_table[["New Code","Month","Cost"]], on=["New Code","Month"] ,how="left")
# Combine result
pd.concat([old_code_result,new_code_result])

最新更新