我有一个包含
的dictionary1{'A': Timestamp('2022-05-23 00:00:00'), 'L': Timestamp('2017-06-21 00:00:00'), 'S': Timestamp('2021-11-02 00:00:00'), 'D': Timestamp('2021-11-08 00:00:00')}
然后我有另一个字典2看起来像
{'A': [Timestamp('2022-01-16 00:00:00'),
Timestamp('2022-01-13 00:00:00'),
Timestamp('2022-01-12 00:00:00'),
Timestamp('2023-01-10 00:00:00')],
'L': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2023-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'S': [Timestamp('2021-01-16 00:00:00'),
Timestamp('2022-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'D': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2022-10-18 00:00:00')]}
我希望每个A, L, S, D
只具有GREATER的日期而不是字典里的日期
{'A': [Timestamp('2023-01-10 00:00:00')],
'L': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2023-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'S': [Timestamp('2022-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'D': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2022-10-18 00:00:00')]}
这是熊猫的解决方案
import pandas as pd
from pandas import Timestamp
d1 = {'A': Timestamp('2022-05-23 00:00:00'),
'L': Timestamp('2017-06-21 00:00:00'),
'S': Timestamp('2021-11-02 00:00:00'),
'D': Timestamp('2021-11-08 00:00:00')}
d2 = {'A': [Timestamp('2022-01-16 00:00:00'),
Timestamp('2022-01-13 00:00:00'),
Timestamp('2022-01-12 00:00:00'),
Timestamp('2023-01-10 00:00:00')],
'L': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2023-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'S': [Timestamp('2021-01-16 00:00:00'),
Timestamp('2022-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')],
'D': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2022-10-18 00:00:00')]}
# convert to a series and explode lists
s1 = pd.Series(d1, name='date')
s2 = pd.Series(d2, name='date').explode()
# merge your pd.Series together on the index
m = pd.merge(s2, s1, right_index=True, left_index=True, how='left')
# boolean indexing to filter your dates where s2 date > s1 date
new_df = m[m['date_x'] > m['date_y']]
date_x date_y
A 2023-01-10 2022-05-23
D 2023-01-16 2021-11-08
D 2022-10-18 2021-11-08
L 2023-01-16 2017-06-21
L 2023-01-13 2017-06-21
L 2023-01-12 2017-06-21
S 2022-01-13 2021-11-02
S 2023-01-12 2021-11-02
给定两个数据源,您可以使用推导式根据条件创建一个新列表:
import datetime
Timestamp = lambda s: datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S")
lookup = {
'A': Timestamp('2022-05-23 00:00:00'),
'L': Timestamp('2017-06-21 00:00:00'),
'S': Timestamp('2021-11-02 00:00:00'),
'D': Timestamp('2021-11-08 00:00:00')
}
data_in = {
'A': [
Timestamp('2023-01-10 00:00:00')
],
'L': [
Timestamp('2023-01-16 00:00:00'),
Timestamp('2023-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')
],
'S': [
Timestamp('2022-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')
],
'D': [
Timestamp('2023-01-16 00:00:00'),
Timestamp('2022-10-18 00:00:00')
]
}
data_out = [
{key: [v for v in value if v > lookup[key]]}
for key, value
in data_in.items()
]
print(data_out)
我不知道Timestamp
是什么,但如果它有一个函数,返回日期作为字符串(或任何其他数据结构与>
定义),你可以做
# This is some class that knows its stamp value (the "date")
class Timestamp:
def __init__(self, value):
self._value = value
@property
def value(self):
return self._value
# This is your reference dict.
d1 = {
'A': Timestamp('2022-05-23 00:00:00'),
'L': Timestamp('2017-06-21 00:00:00'),
'S': Timestamp('2021-11-02 00:00:00'),
'D': Timestamp('2021-11-08 00:00:00')
}
# This is the data you want to clean.
d2 = {
'A': [
Timestamp('2022-01-16 00:00:00'),
Timestamp('2022-01-13 00:00:00'),
Timestamp('2022-01-12 00:00:00'),
Timestamp('2023-01-10 00:00:00')
],
'L': [
Timestamp('2023-01-16 00:00:00'),
Timestamp('2023-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')
],
'S': [
Timestamp('2021-01-16 00:00:00'),
Timestamp('2022-01-13 00:00:00'),
Timestamp('2023-01-12 00:00:00')
],
'D': [Timestamp('2023-01-16 00:00:00'),
Timestamp('2022-10-18 00:00:00')]
}
# This is the new dict you want.
d3 = {
key: [stamp for stamp in stamplist if stamp.value > d1[key].value]
for (key, stamplist) in d2.items()
}
# Check it:
for key, stamplist in d3.items():
for stamp in stamplist:
print(stamp.value)
对于pandas,一种方法是使用pandas.Series
构造函数与dict/listcomp:
from pandas import Timestamp
s1 = pd.Series(dictionary1)
s2 = pd.Series(dictionary2)
out = {k: [v for v in s2[k] if k > s1[k]] for k in s2.index}
输出:
{'A': [Timestamp('2023-01-10 00:00:00')],
'L': [Timestamp('2023-01-16 00:00:00'), Timestamp('2023-01-13 00:00:00'), Timestamp('2023-01-12 00:00:00')],
'S': [Timestamp('2022-01-13 00:00:00'), Timestamp('2023-01-12 00:00:00')],
'D': [Timestamp('2023-01-16 00:00:00'), Timestamp('2022-10-18 00:00:00')]}