在Pandas中,两个数组如何通过一个公共列连接?



我有两个Pandas数组,如:

A = 
Date       Value
0    2022-03-01     50
1    2022-03-01     50
2    2022-03-03     50
3    2022-03-04     50
4    2022-03-04     50
5    2022-03-04     50
6    2022-03-04     50
7    2022-03-04     50
8    2022-03-15     50
9    2022-03-19     50
10   2022-03-23     50

B =
Value   Date
Date                                                                             
2022-03-01  0.0   2022-03-01
2022-03-02  0.0   2022-03-02
2022-03-03  0.0   2022-03-03
2022-03-04  0.0   2022-03-04
2022-03-05  0.0   2022-03-05
2022-03-06  0.0   2022-03-06
2022-03-07  0.0   2022-03-07
2022-03-08  0.0   2022-03-08

另外,A和B列表的长度不同。

我需要形成一个列表'C=',其中日期将按顺序排列(如列表'B='),并且每个日期的'值'等于列表' a ='中此日期的'值'的总和…列表'A='中未包含在列表'B='中日期的所有'值'都不应出现在列表'C='中。

我是python的新手我已经尝试过很多选择了……但一切都不工作:

for i in range(len(A)):
for n in range(len(B)):
if A['Date'][i] == B['Date'][i]:
C['Date'] == B['Date'][i]
C['Value'] == A['Value'][i]
else:
C['Date'] == B['Date'][i]
C['Value'] == B['Value'][i]
结果数组应该如下所示:
B =
Value   Date                                                                             
1  100.0   2022-03-01
2  0.0     2022-03-02
3  50.0    2022-03-03
4  250.0   2022-03-04
5  0.0     2022-03-05
6  0.0     2022-03-06
7  0.0     2022-03-07
8  0.0     2022-03-08

/////////真正的数组是:

Date:           Value:
0   2022-05-06       0.000000e+00
1   2022-05-04      -4.042888e+06
2   2022-04-15      -2.742782e+06
3   2022-04-15      -9.152107e+05
4   2022-04-01      -3.608648e+07
5   2022-04-01      -1.072268e+07
6   2022-04-01      -1.885982e+07
7   2022-04-01      -1.087363e+07
8   2022-03-04       0.000000e+00
9   2022-03-04       0.000000e+00
10  2022-03-04       0.000000e+00
11  2022-03-04       0.000000e+00
12  2022-03-04       0.000000e+00
13  2022-03-04       0.000000e+00
14  2022-03-04       0.000000e+00
15  2022-03-04       0.000000e+00
16  2022-02-03      -4.369548e+06
17  2022-02-01       0.000000e+00
18  2022-02-01       0.000000e+00
19  2022-02-01       0.000000e+00
20  2022-02-01      -3.441539e+05
21  2022-02-01       0.000000e+00
22  2022-02-01       0.000000e+00
23  2022-02-01       0.000000e+00
24  2022-02-01       0.000000e+00
25  2021-11-12      -1.350750e+06
26  2021-11-09       0.000000e+00
27  2021-11-09       0.000000e+00
28  2021-11-01      -3.716895e+06
29  2021-10-19      -2.464551e+07
30  2021-10-15      -5.662351e+06
31  2021-10-15      -1.243596e+06
32  2021-10-01      -4.992055e+07
33  2021-10-01      -2.695814e+07
34  2021-10-01      -2.689851e+07
35  2021-10-01      -2.689851e+07
36  2021-09-26       0.000000e+00
37  2021-09-26       0.000000e+00
38  2021-09-26       0.000000e+00
39  2021-09-26       0.000000e+00
40  2021-09-26       0.000000e+00
41  2021-09-26       0.000000e+00
42  2021-08-24      -7.515941e+08
43  2021-08-20       0.000000e+00
44  2021-08-05      -4.674707e+06
45  2021-08-02      -2.480048e+06
46  2021-05-03      -1.607934e+07
47  2021-05-03      -2.257631e+06
48  2021-04-15      -5.224200e+06
Value:   Date:
Date                                    
2021-11-01                0.0 2021-11-01
2021-11-02                0.0 2021-11-02
2021-11-03                0.0 2021-11-03
2021-11-04                0.0 2021-11-04
2021-11-05                0.0 2021-11-05
...                       ...        ...
2022-05-10                0.0 2022-05-10
2022-05-11                0.0 2022-05-11
2022-05-12                0.0 2022-05-12
2022-05-13                0.0 2022-05-13
2022-05-16                0.0 2022-05-16
[136 rows x 2 columns]

应用Nick描述的方法后,'C='中的所有'Values'都等于0.0

Date:               Value:
Date                                    
2021-11-01 2021-11-01                0.0
2021-11-02 2021-11-02                0.0
2021-11-03 2021-11-03                0.0
2021-11-04 2021-11-04                0.0
2021-11-05 2021-11-05                0.0
...               ...                ...
2022-05-10 2022-05-10                0.0
2022-05-11 2022-05-11                0.0
2022-05-12 2022-05-12                0.0
2022-05-13 2022-05-13                0.0
2022-05-16 2022-05-16                0.0
[136 rows x 2 columns]

您应该能够将joinBADate分组并求和:

C = B.join(A.groupby('Date').sum(), how='inner', lsuffix='_l').drop('Value_l', axis=1)

输出(样本数据):

Date  Value
Date
2022-03-01  2022-03-01    100
2022-03-03  2022-03-03     50
2022-03-04  2022-03-04    250

如果您想保留B的日期而不包含总和,则使用左连接(默认)代替fillna,并使用0:

C = B.join(A.groupby('Date').sum(), lsuffix='_l').drop('Value_l', axis=1).fillna(0.0)

输出:

Date  Value
Date
2022-03-01  2022-03-01  100.0
2022-03-02  2022-03-02    0.0
2022-03-03  2022-03-03   50.0
2022-03-04  2022-03-04  250.0
2022-03-05  2022-03-05    0.0
2022-03-06  2022-03-06    0.0
2022-03-07  2022-03-07    0.0
2022-03-08  2022-03-08    0.0

最后,如果你想删除Date作为索引,使用reset_index:

C = C.reset_index(drop=True)

输出:

Date  Value
0  2022-03-01  100.0
1  2022-03-02    0.0
2  2022-03-03   50.0
3  2022-03-04  250.0
4  2022-03-05    0.0
5  2022-03-06    0.0
6  2022-03-07    0.0
7  2022-03-08    0.0

最新更新