我在使用Pandas的合并功能时遇到了一些困难。我正在寻找某种Vlookup公式来帮助我。然而,我无法解决我的问题。我的数据是巨大的,我不能在这里分享,因为保密。然而,我试图在这里得出类似的数据。
旧代码 | 新代码名称发票日期 | 1001011 | NA | 奶酪蛋糕 | 02/02/2021 |
---|---|---|---|
1001012 | NA | 咖啡 | 03/05/2021 |
1001011 | NA | 奶酪蛋糕 | 30/05/2021 |
NA | 2002093 | 茉莉花茶 | 21/08/2021 |
NA | 2002042 | 饼干 | 31/12/2021 |
NA | 2002080 | 咖啡 | 09/01/2022 |
NA | 2002093 | 茉莉花茶 | 05/05/2022 |
NA | 2002058 | 奶酪蛋糕 | 07/06/2022 |
合并时需要在两边都有月份和代码号,所以:
- 在发票数据框中创建与成本表 一致的年-月列
- 合并两个成本表
- 分别合并新代码和旧代码
import pandas as pd
import io
import datetime
invoice_data_text = '''Old Code New Code Name Invoice Date
1001011 NA Cheese Cake 02/02/2021
1001012 NA Coffee 03/05/2021
1001011 NA Cheese Cake 30/05/2021
NA 2002093 Jasmine Tea 21/08/2021
NA 2002042 Cookies 31/12/2021
NA 2002080 Coffee 09/01/2022
NA 2002093 Jasmine Tea 05/05/2022
NA 2002058 Cheese Cake 07/06/2022
'''
cost_2021_text = '''
Old Code New Code Name Jan-21 Feb-21 Mar-21 Apr-21 May-21 June-21 Jul-21 Aug-21 Sep-21 Oct-21 Nov-21 Dec-21
1001011 2002058 Cheese Cake 50 51 50 53 54 52 55 53 50 52 53 53
1001012 2002080 Coffee 5 6 5 6 6 5 7 5 6 5 6 6
1001015 2002093 Jasmine Tea 4 3 3 4 4 3 5 3 3 3 3 4
1001020 2002042 Cookies 20 20 21 20 22 20 21 20 22 20 21 22
'''
cost_2022_text = '''
Old Code New Code Name Jan-22 Feb-22 Mar-22 Apr-22 May-22 June-22 Jul-22 Aug-22 Sep-22 Oct-22 Nov-22 Dec-22
1001011 2002058 Cheese Cake 52 52 55 55 56 52 NA NA NA NA NA NA
1001012 2002080 Coffee 5 6 5 6 6 6.5 NA NA NA NA NA NA
1001015 2002093 Jasmine Tea 4 3 3 5 5 5.5 NA NA NA NA NA NA
1001020 2002042 Cookies 22 22 23 23 23.5 23 NA NA NA NA NA NA
'''
# Prepare
invoice_df = pd.read_csv(io.StringIO(invoice_data_text),sep="t",parse_dates=["Invoice Date"])
cost21 = pd.read_csv(io.StringIO(cost_2021_text),sep='t')
cost22 = pd.read_csv(io.StringIO(cost_2022_text),sep='t')
# Create Month column for merging
invoice_df["Month"] = invoice_df["Invoice Date"].map(lambda x:datetime.datetime.strftime(x,"%b-%y"))
# Combine two cost tables
cost21_stack = cost21.set_index(list(cost21.columns[:3])).stack().reset_index(name="Cost")
cost22_stack = cost22.set_index(list(cost22.columns[:3])).stack().reset_index(name="Cost")
cost_table = pd.concat([cost21_stack,cost22_stack]).rename({"level_3":"Month"},axis=1)
# Merge with new code and old code respectively
old_code_result = pd.merge(invoice_df[pd.isna(invoice_df["Old Code"]) == False], cost_table[["Old Code","Month","Cost"]], on=["Old Code","Month"] ,how="left")
new_code_result = pd.merge(invoice_df[pd.isna(invoice_df["New Code"]) == False], cost_table[["New Code","Month","Cost"]], on=["New Code","Month"] ,how="left")
# Combine result
pd.concat([old_code_result,new_code_result])