为什么熊猫合并在这两个数据帧之间看起来像这样



下面有一个数据帧df_day,您可以看到

Day
2022-05-02  2
2022-05-03  3
2022-05-04  4
2022-05-05  5
2022-05-06  6
2022-05-07  7
2022-05-08  8
2022-05-09  9
2022-05-10  10
2022-05-11  11
2022-05-12  12
2022-05-13  13
2022-05-14  14
2022-05-15  15
2022-05-16  16
2022-05-17  17
2022-05-18  18
2022-05-19  19
2022-05-20  20
2022-05-21  21
2022-05-22  22
2022-05-23  23
2022-05-24  24
2022-05-25  25
2022-05-26  26
2022-05-27  27
2022-05-28  28
2022-05-29  29
2022-05-30  30
2022-05-31  31
2022-06-01  1
2022-06-02  2
2022-06-03  3
2022-06-04  4

以及下面的数据帧df_count,您也可以看到

Weekday Day Count
0   Mon     2   44
1   Tue     3   44
2   Wed     4   32
3   Thu     5   26
4   Fri     6   39
5   Sat     7   39
0   Mon     9   37
1   Tue    10   30
2   Wed    11   33
3   Thu    12   41
4   Fri    13   36
5   Sat    14   38
0   Mon    16   32
1   Tue    17   35
2   Wed    18   35
3   Thu    19   31
4   Fri    20   44
5   Sat    21   31
0   Mon    23   57
1   Tue    24   32
2   Wed    25   34
3   Thu    26   42
4   Fri    27   42
5   Sat    28   29
0   Mon    30   33
1   Tue    31   33
2   Wed    1    33
3   Thu    2    33
4   Fri    3    33
5   Sat    4    33

当我尝试使用df_merged = pd.merge(df_day, df_count, 'outer')合并它们时

我得到了这个结果数据帧,老实说,不应该是这样的

Day Weekday Count
0   2   Mon     44.0
1   2   Thu     33.0
2   2   Mon     44.0
3   2   Thu     33.0
4   3   Tue     44.0
5   3   Fri     33.0
6   3   Tue     44.0
7   3   Fri     33.0
8   4   Wed     32.0
9   4   Sat     33.0
10  4   Wed     32.0
11  4   Sat     33.0
12  5   Thu     26.0
13  6   Fri     39.0
14  7   Sat     39.0
15  8   NaN     NaN
16  9   Mon     37.0
17  10  Tue     30.0
18  11  Wed     33.0
19  12  Thu     41.0
20  13  Fri     36.0
21  14  Sat     38.0
22  15  NaN     NaN
23  16  Mon     32.0
24  17  Tue     35.0
25  18  Wed     35.0
26  19  Thu     31.0
27  20  Fri     44.0
28  21  Sat     31.0
29  22  NaN     NaN
30  23  Mon     57.0
31  24  Tue     32.0
32  25  Wed     34.0
33  26  Thu     42.0
34  27  Fri     42.0
35  28  Sat     29.0
36  29  NaN     NaN
37  30  Mon     33.0
38  31  Tue     33.0
39  1   Wed     33.0

这个数据帧df_merged的顶部是怎么回事?

0   2   Mon     44.0
1   2   Thu     33.0
2   2   Mon     44.0
3   2   Thu     33.0
4   3   Tue     44.0
5   3   Fri     33.0
6   3   Tue     44.0
7   3   Fri     33.0
8   4   Wed     32.0
9   4   Sat     33.0
10  4   Wed     32.0
11  4   Sat     33.0

[日期]应为2、3、4、5等。工作日应为周一至周二至周三至周四至周五。等

如果我做df_merged = pd.merge(df_day, df_count, left_on='Day', right_on='Day', how='right')

然后它看起来几乎是准确的,但仍然不完全准确:

Day Weekday Count
0   2   Mon     44
1   2   Mon     44
2   3   Tue     44
3   3   Tue     44
4   4   Wed     32
5   4   Wed     32
6   5   Thu     26
7   6   Fri     39
8   7   Sat     39
9   9   Mon     37
10  10  Tue     30
11  11  Wed     33
12  12  Thu     41
13  13  Fri     36
14  14  Sat     38...etc.

尝试指定合并类型和要加入的列:

df_merged = pd.merge(df_day, df_count, left_on = 'Day', right_on = 'Day', how = 'left')

如果此代码对您有帮助,请告诉我们。

最新更新