用于合并两个电子表格的 Python 代码,类似于 Excel 中的"Index-Match-Match"函数



我有两个独立的电子表格,并使用panda将它们转换为数据帧。我希望从电子表格2中提取信息,并将这些信息合并到电子表格1中。我知道如何通过使用"索引匹配匹配"函数在excel中提取信息,但我在Python中很难理解这一点,因为它似乎不适合合并或联接方法。。。或者我只是错过了连接。

电子表格1如下所示:电子表格1

电子表格2如下所示:电子表格2

最终结果的负责人会这样说:

地区分支主产品名称检索#

  • North,Fresno,产品1:5060
  • North,Fresno,产品2:22429
  • North,Sacramento,产品3:405
  • North,Fresno,产品6:10

程序将插入一个新列,并根据对列和行进行索引来插入数据,以获得单个数据点。因此,对于我的示例中的第一行,即Fresno和Product 1……它将在电子表格2中查找最上面的一行,直到它到达Product 1,然后向下查找,直到它找到Fre斯诺和插入5060。

在当前的情况下,拥有Region列并不重要,但我可以设想拥有基于多个标准进行检索的能力。

IIUC您可以从Spreadsheet 2创建stack数据帧df2。然后您需要从Master Product Name行转换成rename列。

如果需要按列RegionMaster Product NameBranch排列merge,则必须将这两列Branch都小写才能匹配。

df1 = pd.read_csv('s1.csv', sep=";", index_col=None)
print df1
   Region         Branch Master Product Name
0   North         FRESNO           Product 1
1   North         FRESNO           Product 2
2   North     SACRAMENTO           Product 2
3   North         FRESNO          Product 12
4   North  INLAND EMPIRE          Product 13
5   North         FRESNO           Product 4
6   North         FRESNO           Product 5
7   North      SAN DIEGO           Product 3
8   North         FRESNO           Product 3
9   North         FRESNO           Product 3
10  North    LOS ANGELES           Product 3
11  North         FRESNO           Product 6
12  North         FRESNO           Product 7
13  North         FRESNO           Product 8
14  North  SAN FRANCISCO           Product 8
15  North         FRESNO          Product 11
16  North         FRESNO           Product 9
17  South    LOS ANGELES          Product 12
18  South    LOS ANGELES           Product 3
19  North  SAN FRANCISCO           Product 1
20  North  SAN FRANCISCO           Product 2
21  North  SAN FRANCISCO          Product 12
22  North  SAN FRANCISCO          Product 13
23  North  SAN FRANCISCO           Product 4
24  North  SAN FRANCISCO           Product 5
25  South       SAN JOSE           Product 3
26  South       SAN JOSE           Product 3
df2 = pd.read_csv('s2.csv', sep=";", index_col=None)
print df2
          Branch Region Product 1 Product 2  Product 3  Product 4  Product 5  
0         Fresno  North     5 060     2 429        675        540         10   
1     Sacramento  North     5 534     2 344        405        324        118   
2  San Francisco  North     2 924     1 167        111         89         32   
3       San Jose  North     4 039     2 019        179        143        102   
4    Los Angeles  South     4 453     2 273        302        298         -2   
5  Inland Empire  South     4 680     2 091        404        400         21   
6  Orange County  South     3 167     1 493         95         94         41   
7      San Diego  South     5 468     2 439        282        279         16   
   Product 6  
0         10  
1        117  
2         32  
3        102  
4         76  
5         41  
6         67  
7        188  
df2 = df2.set_index(['Region', 'Branch'])
df2 = df2.stack()
df2 = df2.reset_index(name = 'Count')
df2 = df2.rename(columns={'level_2':'Master Product Name'})
print df2
   Region         Branch Master Product Name  Count
0   North         Fresno           Product 1  5 060
1   North         Fresno           Product 2  2 429
2   North         Fresno           Product 3    675
3   North         Fresno           Product 4    540
4   North         Fresno           Product 5     10
5   North         Fresno           Product 6     10
6   North     Sacramento           Product 1  5 534
7   North     Sacramento           Product 2  2 344
8   North     Sacramento           Product 3    405
9   North     Sacramento           Product 4    324
10  North     Sacramento           Product 5    118
11  North     Sacramento           Product 6    117
12  North  San Francisco           Product 1  2 924
13  North  San Francisco           Product 2  1 167
14  North  San Francisco           Product 3    111
15  North  San Francisco           Product 4     89
16  North  San Francisco           Product 5     32
17  North  San Francisco           Product 6     32
18  North       San Jose           Product 1  4 039
19  North       San Jose           Product 2  2 019
20  North       San Jose           Product 3    179
21  North       San Jose           Product 4    143
22  North       San Jose           Product 5    102
23  North       San Jose           Product 6    102
24  South    Los Angeles           Product 1  4 453
25  South    Los Angeles           Product 2  2 273
26  South    Los Angeles           Product 3    302
27  South    Los Angeles           Product 4    298
28  South    Los Angeles           Product 5     -2
29  South    Los Angeles           Product 6     76
30  South  Inland Empire           Product 1  4 680
31  South  Inland Empire           Product 2  2 091
32  South  Inland Empire           Product 3    404
33  South  Inland Empire           Product 4    400
34  South  Inland Empire           Product 5     21
35  South  Inland Empire           Product 6     41
36  South  Orange County           Product 1  3 167
37  South  Orange County           Product 2  1 493
38  South  Orange County           Product 3     95
39  South  Orange County           Product 4     94
40  South  Orange County           Product 5     41
41  South  Orange County           Product 6     67
42  South      San Diego           Product 1  5 468
43  South      San Diego           Product 2  2 439
44  South      San Diego           Product 3    282
45  South      San Diego           Product 4    279
46  South      San Diego           Product 5     16
47  South      San Diego           Product 6    188
#for metching by columns Branch - string to lowercase
df1['Branch'] = df1['Branch'].str.lower()
df2['Branch'] = df2['Branch'].str.lower()
df = pd.merge(df1, df2, on=['Region','Master Product Name', 'Branch'])
print df
   Region         Branch Master Product Name  Count
0   North         fresno           Product 1  5 060
1   North         fresno           Product 2  2 429
2   North     sacramento           Product 2  2 344
3   North         fresno           Product 4    540
4   North         fresno           Product 5     10
5   North         fresno           Product 3    675
6   North         fresno           Product 3    675
7   North         fresno           Product 6     10
8   South    los angeles           Product 3    302
9   North  san francisco           Product 1  2 924
10  North  san francisco           Product 2  1 167
11  North  san francisco           Product 4     89
12  North  san francisco           Product 5     32

相关内容

最新更新