将日期时间序列划分为"sequential"日期时间列表的有效方法?



假设"顺序"日期时间是彼此之间一定时间间隔(即三十分钟)内的日期时间;非连续日期时间是比彼此之间的时间段更长的日期时间。

给定一个由日期时间列表

(作为字符串)组成的输入,我想派生一个顺序日期时间列表的列表。我的解决方案如下,但我想知道是否有更好的方法:

list_of_datetime_strings: ['2016-02-26 10:30:00', '2016-02-26 11:00:00', 
'2016-02-25 11:30:00', '2016-02-25 12:00:00', '2016-02-25 12:30:00', 
'2016-02-26 12:30:00']
def find_datetime_sequences(list_of_datetime_strings, increment_in_minutes = 30):
    if not list_of_datetime_strings:
        return
    str_to_datetime = lambda cur_datetime: datetime.strptime(cur_datetime, "%Y-%m-%d %H:%M:%S")
    list__datetimes_sorted = sorted([str_to_datetime(cur_datetime) for cur_datetime in list_of_datetime_strings])
    list_of_datetime_lists = [[list__datetimes_sorted[0]]]
    for cur_datetime in list__datetimes_sorted[1:]:
        time_difference = (cur_datetime - list_of_datetime_lists[-1][-1]).seconds / 60            
        if time_difference == increment_in_minutes:
            list_of_datetime_lists[-1].append(cur_datetime)
        else:
            list_of_datetime_lists.append([cur_datetime])
    return list_of_datetime_lists
find_datetime_sequences(list_of_datetime_strings)

输出:

list_of_datetime_lists: [[datetime.datetime(2016, 2, 25, 11, 30), 
     datetime.datetime(2016, 2, 25, 12, 0), datetime.datetime(2016, 2, 25, 12, 30)], 
    [datetime.datetime(2016, 2, 26, 10, 30), datetime.datetime(2016, 2, 26, 11, 0)], 
    [datetime.datetime(2016, 2, 26, 12, 30)]]

有没有更好的方法来完成上述工作?

我没有更好的方法来从字符串制作datetime对象或对它们进行排序。但我认为其余部分可以通过使用生成器而不是常规函数来改进(如果没有别的,可读性方面

)。
def sequencify(sorted_datetimes, increment_in_minutes=30):
    """Take a sorted list of datetime objects. Yield sequences as lists."""
    if not sorted_datetimes:
        return
    first, *rest = sorted_datetimes
    # python 2: first, rest = sorted_datetimes[0], sorted_datetimes[1:]
    sequence = [first]
    delta = datetime.timedelta(minutes=increment_in_minutes)
    while rest:
        first, *rest = rest
        if first - sequence[-1] > delta:
            yield sequence
            sequence = [first]
        else:
            sequence.append(first)
    yield sequence

使用基于索引的方法的替代版本,类似于@SimeonVisser所做的:

def sequencify(sorted_datetimes, increment_in_minutes=30):
    """Take a sorted list of datetime objects. Yield sequences as lists."""
    delta = datetime.timedelta(minutes=increment_in_minutes)
    start = 0
    for i in range(start, len(sorted_datetimes) - 1):
        if sorted_datetimes[i+1] - sorted_datetimes[i] > delta:
            yield sorted_datetimes[start:i+1]
            start = i + 1
    if sorted_datetimes:
        yield sorted_datetimes[start:]

无论哪种方式,调用者都需要进行最少的更改:只需添加一个list()

strings = [
    '2016-02-26 10:30:00',
    '2016-02-26 11:00:00',
    '2016-02-25 11:30:00',
    '2016-02-25 12:00:00',
    '2016-02-25 12:30:00',
    '2016-02-26 12:30:00'
]
sorted_datetimes = sorted(datetime.datetime.strptime(s, '%Y-%m-%d %H:%M:%S')
                          for s in strings)
print(list(sequencify(sorted_datetimes)))  # explicit conversion to list

输出:

[[datetime.datetime(2016, 2, 25, 11, 30),
  datetime.datetime(2016, 2, 25, 12, 0),
  datetime.datetime(2016, 2, 25, 12, 30)],
 [datetime.datetime(2016, 2, 26, 10, 30),
  datetime.datetime(2016, 2, 26, 11, 0)],
 [datetime.datetime(2016, 2, 26, 12, 30)]]

以下方法本质上是相同的,但可能更容易维护:

import datetime
strings = [
    '2016-02-26 10:30:00',
    '2016-02-26 11:00:00',
    '2016-02-25 11:30:00',
    '2016-02-25 12:00:00',
    '2016-02-25 12:30:00',
    '2016-02-26 12:30:00',
]
def find_datetime_sequences(strings, increment_in_minutes=30):
    if not strings:
        return
    dates = sorted([
        datetime.datetime.strptime(s, "%Y-%m-%d %H:%M:%S")
        for s in strings
    ])
    delta = datetime.timedelta(minutes=increment_in_minutes)
    start = 0
    n_items = len(dates)
    cuts = []
    for index in range(n_items):
        next_index = index + 1
        if next_index == n_items and start != next_index:
            cuts.append((start, next_index))
        elif dates[next_index] - dates[index] != delta:
            cuts.append((start, next_index))
            start = next_index
    return [dates[i:j] for i, j in cuts]

这里的这一部分是检测两个日期之间的差异何时不是 30 分钟,我们需要在那里切入:

elif dates[next_index] - dates[index] != delta:
    cuts.append((start, next_index))
    start = next_index

这里的这一部分是为了确保,如果最后有一个日期时间需要进入它自己的组,我们这样做:

if next_index == n_items and start != next_index:
    cuts.append((start, next_index))

最新更新