假设"顺序"日期时间是彼此之间一定时间间隔(即三十分钟)内的日期时间;非连续日期时间是比彼此之间的时间段更长的日期时间。
给定一个由日期时间列表(作为字符串)组成的输入,我想派生一个顺序日期时间列表的列表。我的解决方案如下,但我想知道是否有更好的方法:
list_of_datetime_strings: ['2016-02-26 10:30:00', '2016-02-26 11:00:00',
'2016-02-25 11:30:00', '2016-02-25 12:00:00', '2016-02-25 12:30:00',
'2016-02-26 12:30:00']
def find_datetime_sequences(list_of_datetime_strings, increment_in_minutes = 30):
if not list_of_datetime_strings:
return
str_to_datetime = lambda cur_datetime: datetime.strptime(cur_datetime, "%Y-%m-%d %H:%M:%S")
list__datetimes_sorted = sorted([str_to_datetime(cur_datetime) for cur_datetime in list_of_datetime_strings])
list_of_datetime_lists = [[list__datetimes_sorted[0]]]
for cur_datetime in list__datetimes_sorted[1:]:
time_difference = (cur_datetime - list_of_datetime_lists[-1][-1]).seconds / 60
if time_difference == increment_in_minutes:
list_of_datetime_lists[-1].append(cur_datetime)
else:
list_of_datetime_lists.append([cur_datetime])
return list_of_datetime_lists
find_datetime_sequences(list_of_datetime_strings)
输出:
list_of_datetime_lists: [[datetime.datetime(2016, 2, 25, 11, 30),
datetime.datetime(2016, 2, 25, 12, 0), datetime.datetime(2016, 2, 25, 12, 30)],
[datetime.datetime(2016, 2, 26, 10, 30), datetime.datetime(2016, 2, 26, 11, 0)],
[datetime.datetime(2016, 2, 26, 12, 30)]]
有没有更好的方法来完成上述工作?
我没有更好的方法来从字符串制作datetime
对象或对它们进行排序。但我认为其余部分可以通过使用生成器而不是常规函数来改进(如果没有别的,可读性方面
def sequencify(sorted_datetimes, increment_in_minutes=30):
"""Take a sorted list of datetime objects. Yield sequences as lists."""
if not sorted_datetimes:
return
first, *rest = sorted_datetimes
# python 2: first, rest = sorted_datetimes[0], sorted_datetimes[1:]
sequence = [first]
delta = datetime.timedelta(minutes=increment_in_minutes)
while rest:
first, *rest = rest
if first - sequence[-1] > delta:
yield sequence
sequence = [first]
else:
sequence.append(first)
yield sequence
使用基于索引的方法的替代版本,类似于@SimeonVisser所做的:
def sequencify(sorted_datetimes, increment_in_minutes=30):
"""Take a sorted list of datetime objects. Yield sequences as lists."""
delta = datetime.timedelta(minutes=increment_in_minutes)
start = 0
for i in range(start, len(sorted_datetimes) - 1):
if sorted_datetimes[i+1] - sorted_datetimes[i] > delta:
yield sorted_datetimes[start:i+1]
start = i + 1
if sorted_datetimes:
yield sorted_datetimes[start:]
无论哪种方式,调用者都需要进行最少的更改:只需添加一个list()
:
strings = [
'2016-02-26 10:30:00',
'2016-02-26 11:00:00',
'2016-02-25 11:30:00',
'2016-02-25 12:00:00',
'2016-02-25 12:30:00',
'2016-02-26 12:30:00'
]
sorted_datetimes = sorted(datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S')
for s in strings)
print(list(sequencify(sorted_datetimes))) # explicit conversion to list
输出:
[[datetime.datetime(2016, 2, 25, 11, 30),
datetime.datetime(2016, 2, 25, 12, 0),
datetime.datetime(2016, 2, 25, 12, 30)],
[datetime.datetime(2016, 2, 26, 10, 30),
datetime.datetime(2016, 2, 26, 11, 0)],
[datetime.datetime(2016, 2, 26, 12, 30)]]
以下方法本质上是相同的,但可能更容易维护:
import datetime
strings = [
'2016-02-26 10:30:00',
'2016-02-26 11:00:00',
'2016-02-25 11:30:00',
'2016-02-25 12:00:00',
'2016-02-25 12:30:00',
'2016-02-26 12:30:00',
]
def find_datetime_sequences(strings, increment_in_minutes=30):
if not strings:
return
dates = sorted([
datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S")
for s in strings
])
delta = datetime.timedelta(minutes=increment_in_minutes)
start = 0
n_items = len(dates)
cuts = []
for index in range(n_items):
next_index = index + 1
if next_index == n_items and start != next_index:
cuts.append((start, next_index))
elif dates[next_index] - dates[index] != delta:
cuts.append((start, next_index))
start = next_index
return [dates[i:j] for i, j in cuts]
这里的这一部分是检测两个日期之间的差异何时不是 30 分钟,我们需要在那里切入:
elif dates[next_index] - dates[index] != delta:
cuts.append((start, next_index))
start = next_index
这里的这一部分是为了确保,如果最后有一个日期时间需要进入它自己的组,我们这样做:
if next_index == n_items and start != next_index:
cuts.append((start, next_index))