如何从一个列拉值当几个列匹配在两个数据帧?

我正在尝试编写一个脚本，该脚本将根据表2中概述的产品/地区/年份规格搜索类似于表1中的数据库。计划是在表1中搜索与表2中概述的规范的匹配，然后提取观测值，如表2所示，并得到结果。

我需要这段代码运行几个循环，其中年份标准是宽松的。例如，循环1将在Product_L1、Geography_L1和Year中搜索匹配项，循环2将在Product_L1、Geography_L1和Year-1中搜索匹配项，以此类推。

表1

tbody> <<tr>瑞士欧洲南非非洲

产品一级	产品二级	区域一级	区域二级	年份	对象。值
波特兰水泥	水泥道明>	南美	2021	1
波特兰水泥	水泥			2021	2
波特兰水泥	水泥道明>	北美	2021	3
波特兰水泥	水泥	巴西	南美	2021	4
波特兰水泥	水泥			2021	5
波特兰水泥	水泥	印度	亚洲	2021	6
波特兰水泥	水泥	巴西	南美	2020	7

您可以执行合并操作并提供您希望从Table_1中获得的列列表。

import pandas as pd
Table_1 = pd.DataFrame({
"Product_L1":["Portland cement", "Portland cement", "Portland cement", "Portland cement", "Portland cement", "Portland cement", "Portland cement"],
"Product_L2":["Cement", "Cement", "Cement", "Cement", "Cement", "Cement", "Cement"],
"Geography_L1": ["Peru", "Switzerland", "USA", "Brazil", "South Africa", "India", "Brazil"],
"Geography_L2": ["South America", "Europe", "North America", "South America", "Africa", "Asia", "South America"],
"Year": [2021, 2021, 2021, 2021, 2021, 2021, 2020],
"obs_value": [1, 2, 3, 4, 5, 6, 7]
})
Table_2 = pd.DataFrame({
"Product_L1":["Portland cement", "Portland cement"],
"Product_L2":["Cement", "Cement"],
"Geography_L1": ["Brazil", "Switzerland"],
"Geography_L2": ["South America", "Europe"],
"Year": [2021, 2021]
})
columns_list = ['Product_L1','Product_L2','Geography_L1','Geography_L2','Year','obs_value']
result = pd.merge(Table_2, Table_1[columns_list], how='left')

result是一个新的数据帧:

Product_L1 Product_L2 Geography_L1   Geography_L2  Year  obs_value
0  Portland cement     Cement       Brazil  South America  2021          4
1  Portland cement     Cement  Switzerland         Europe  2021          2

编辑:基于更新的问题，我认为你要做的是可以实现使用set_index和unstack。这将创建一个新的数据框，其中的观测值列在"Year_2020"、"Year_2021"等列中。

index_columns = ['Product_L1','Product_L2','Geography_L1','Geography_L2', 'Year']
edit_df = Table_1.set_index(index_columns)['obs_value'].unstack().add_prefix('Year_').reset_index()

相关内容

最新更新

热门标签：