我有两个独立的电子表格,并使用panda将它们转换为数据帧。我希望从电子表格2中提取信息,并将这些信息合并到电子表格1中。我知道如何通过使用"索引匹配匹配"函数在excel中提取信息,但我在Python中很难理解这一点,因为它似乎不适合合并或联接方法。。。或者我只是错过了连接。
电子表格1如下所示:电子表格1
电子表格2如下所示:电子表格2
最终结果的负责人会这样说:
地区分支主产品名称检索#
- North,Fresno,产品1:5060
- North,Fresno,产品2:22429
- North,Sacramento,产品3:405
- North,Fresno,产品6:10
程序将插入一个新列,并根据对列和行进行索引来插入数据,以获得单个数据点。因此,对于我的示例中的第一行,即Fresno和Product 1……它将在电子表格2中查找最上面的一行,直到它到达Product 1,然后向下查找,直到它找到Fre斯诺和插入5060。
在当前的情况下,拥有Region列并不重要,但我可以设想拥有基于多个标准进行检索的能力。
IIUC您可以从Spreadsheet 2
创建stack
数据帧df2
。然后您需要从Master Product Name
行转换成rename
列。
如果需要按列Region
、Master Product Name
和Branch
排列merge
,则必须将这两列Branch
都小写才能匹配。
df1 = pd.read_csv('s1.csv', sep=";", index_col=None)
print df1
Region Branch Master Product Name
0 North FRESNO Product 1
1 North FRESNO Product 2
2 North SACRAMENTO Product 2
3 North FRESNO Product 12
4 North INLAND EMPIRE Product 13
5 North FRESNO Product 4
6 North FRESNO Product 5
7 North SAN DIEGO Product 3
8 North FRESNO Product 3
9 North FRESNO Product 3
10 North LOS ANGELES Product 3
11 North FRESNO Product 6
12 North FRESNO Product 7
13 North FRESNO Product 8
14 North SAN FRANCISCO Product 8
15 North FRESNO Product 11
16 North FRESNO Product 9
17 South LOS ANGELES Product 12
18 South LOS ANGELES Product 3
19 North SAN FRANCISCO Product 1
20 North SAN FRANCISCO Product 2
21 North SAN FRANCISCO Product 12
22 North SAN FRANCISCO Product 13
23 North SAN FRANCISCO Product 4
24 North SAN FRANCISCO Product 5
25 South SAN JOSE Product 3
26 South SAN JOSE Product 3
df2 = pd.read_csv('s2.csv', sep=";", index_col=None)
print df2
Branch Region Product 1 Product 2 Product 3 Product 4 Product 5
0 Fresno North 5 060 2 429 675 540 10
1 Sacramento North 5 534 2 344 405 324 118
2 San Francisco North 2 924 1 167 111 89 32
3 San Jose North 4 039 2 019 179 143 102
4 Los Angeles South 4 453 2 273 302 298 -2
5 Inland Empire South 4 680 2 091 404 400 21
6 Orange County South 3 167 1 493 95 94 41
7 San Diego South 5 468 2 439 282 279 16
Product 6
0 10
1 117
2 32
3 102
4 76
5 41
6 67
7 188
df2 = df2.set_index(['Region', 'Branch'])
df2 = df2.stack()
df2 = df2.reset_index(name = 'Count')
df2 = df2.rename(columns={'level_2':'Master Product Name'})
print df2
Region Branch Master Product Name Count
0 North Fresno Product 1 5 060
1 North Fresno Product 2 2 429
2 North Fresno Product 3 675
3 North Fresno Product 4 540
4 North Fresno Product 5 10
5 North Fresno Product 6 10
6 North Sacramento Product 1 5 534
7 North Sacramento Product 2 2 344
8 North Sacramento Product 3 405
9 North Sacramento Product 4 324
10 North Sacramento Product 5 118
11 North Sacramento Product 6 117
12 North San Francisco Product 1 2 924
13 North San Francisco Product 2 1 167
14 North San Francisco Product 3 111
15 North San Francisco Product 4 89
16 North San Francisco Product 5 32
17 North San Francisco Product 6 32
18 North San Jose Product 1 4 039
19 North San Jose Product 2 2 019
20 North San Jose Product 3 179
21 North San Jose Product 4 143
22 North San Jose Product 5 102
23 North San Jose Product 6 102
24 South Los Angeles Product 1 4 453
25 South Los Angeles Product 2 2 273
26 South Los Angeles Product 3 302
27 South Los Angeles Product 4 298
28 South Los Angeles Product 5 -2
29 South Los Angeles Product 6 76
30 South Inland Empire Product 1 4 680
31 South Inland Empire Product 2 2 091
32 South Inland Empire Product 3 404
33 South Inland Empire Product 4 400
34 South Inland Empire Product 5 21
35 South Inland Empire Product 6 41
36 South Orange County Product 1 3 167
37 South Orange County Product 2 1 493
38 South Orange County Product 3 95
39 South Orange County Product 4 94
40 South Orange County Product 5 41
41 South Orange County Product 6 67
42 South San Diego Product 1 5 468
43 South San Diego Product 2 2 439
44 South San Diego Product 3 282
45 South San Diego Product 4 279
46 South San Diego Product 5 16
47 South San Diego Product 6 188
#for metching by columns Branch - string to lowercase
df1['Branch'] = df1['Branch'].str.lower()
df2['Branch'] = df2['Branch'].str.lower()
df = pd.merge(df1, df2, on=['Region','Master Product Name', 'Branch'])
print df
Region Branch Master Product Name Count
0 North fresno Product 1 5 060
1 North fresno Product 2 2 429
2 North sacramento Product 2 2 344
3 North fresno Product 4 540
4 North fresno Product 5 10
5 North fresno Product 3 675
6 North fresno Product 3 675
7 North fresno Product 6 10
8 South los angeles Product 3 302
9 North san francisco Product 1 2 924
10 North san francisco Product 2 1 167
11 North san francisco Product 4 89
12 North san francisco Product 5 32