Python通过比较日期时间递归地组合两个对象列表

继上一个问题之后，我现在阅读两个文件A和B，并将2009年的日期放入AB2对象（subAB）的列表中，位于其前面的第一行中

class AB2(object):
    def __init__(self, datetime, a=False, b=False):
        self.datetime = datetime
        self.a = a
        self.b = b
        self.subAB = []

例如：

file A: 20111225, 20111226, 20090101
file B: 20111225, 20111226, 20090101, 20090102, 20111227, 20090105

应导致：（方括号显示subAB列表）

AB2(20111225, a = true, b = true, [])
AB2(20111226, a = true, b = true, 
    [AB2(20090101, a = true, b = true, []),
     AB2(20090102, a = false, b = true, [])], 
AB2(20111227, a = false, b = true, 
    [AB2(20090105, a = false, b = true)])

不幸的是，这使以前的解决方案变得复杂：

list_of_objects = [(i, i in A, i in B) for i in set(A) | set(B)]

因为：

订单很重要（2009年的项目在文件中之前进入2011年的第一个项目）
文件中可能存在多个日期时间相同的项目
现在感兴趣的subAB对象列表也是

由于这些原因，我们不能使用当前存在的set（因为它会删除重复项并丢失顺序）。我已经探索过使用OrderedSet配方，但我想不出在这里应用它的方法。

我的当前代码：

listA = open_and_parse(file A) # list of parsed dates
listAObjects = [AB2(dt, True, None) for dt in listA] # list of AB2 Objects from list A
nested_listAObjects = nest(listAObjects) # puts 2009 objects into 2011 ones
<same for file B>
return combine(nested_listAObjects, nested_listBObjects)

Nest方法：（将2009项放入上一个2011项中。如果2009项位于文件开头，则忽略它们）

def nest(list):
    previous = None
    for item in list:       
        if item.datetime.year <= 2009:
            if previous is not None:
                previous.subAB.append(item)
            else:
                previous = item
    return [item for item in list if item.datetime.year > 2009]

但我有点拘泥于我的combine函数：

def combine(nestedA, nestedB):
    combined = nestedA + nestedB
    combined.sort(key=lambda x: x.datetime)
    <magic>
    return combined

在这一点上，如果没有魔法，combined会是这样的：

AB2(20111225, a = true, b = None, []) #  
AB2(20111225, a = None, b = true, []) # / these two should merge to AB2(20111225, a = true, b = true, [])
AB2(20111226, a = true, b = None, 
    [AB2(20090101, a = true, b = None, []),
     AB2(20090102, a = true, b = None, [])], 
AB2(20111226, a = None, b = true, 
    [AB2(20090101, a = None, b = true, [])], 
# The above two lines should combine, and so should their subAB lists (but only recurse to that level, not infinitely)
AB2(20111227, a = None, b = true, 
    [AB2(20090105, a = None, b = true)])

我希望我可以发布一个新问题——这将是一个与我以前的问题完全不同的解决方案。也很抱歉发了这么长的帖子，我认为最好解释一下我正在做的所有事情，这样你就可以完全理解这个问题，也许可以为整个问题提供一个替代的解决方案，而不仅仅是combine方法。谢谢

编辑：澄清：

基本上，我检查两台连接的计算机的日志，并比较它们是在特定时间都关闭了，还是只关闭了一台。计算机在2009年启动（但不总是1月1日，有时是1月4日等），如果它们在检索到真实的2012年时间之前重置。因此，我试图将2009年的后续关闭与前一次联系起来，这样我就能知道它何时会迅速重置。

2011/2012年的日期应该进行排序，但2009年的日期没有。一台计算机的日志文件（在我的示例中为fileA）可能如下所示：

2011/12/15
2011/12/17
2011/12/19 # Something goes wrong, and causes the computer to reset 5 times rapidly
2009/01/01 
2009/01/01
2009/01/04
2009/01/01
2011/12/20 # And everything is better again
2011/12/25

事实上，它们实际上是日期时间（例如2009/01/01 01:57:01），所以我可以简单地比较两个日期时间是否在某个timedelta内。

我正在寻找一个更干净的整体解决方案/方法，或者一个特定的解决方案来解决将这两个AB2对象列表组合在一起的问题。

将两者组合在一起最简单的方法是迭代经过排序的组合列表（2009个对象已经放入其父对象中），比较下一个项目是否与当前项目的日期相同，并从这些项目创建一个新列表。

这在比较中有点棘手，可能有一种更干净的方法，但这似乎有效，应该相对高效。

我认为日期的顺序很重要。将两个输入流中增加的日期进行比较并合并，当出现以前的日期时，将收集、合并并附加到以前较大的日期。

为了简洁起见，我刚刚在本例中创建了元组，而不是AB2类的实例。

from cStringIO import StringIO
fileA = StringIO("""20111225, 20111226, 20090101""")
fileB = StringIO("""20111225, 20111226, 20090101, 20090102, 20111227, 20090105""")
def fileReader(infile):
  for line in infile:
    for part in line.split(','):
      yield part.strip()
def next_or_none(iterable):
  for value in iterable:
    yield value
  yield None
def combine(a,b):
  current_val = None
  hasA = hasB = False
  next_a, next_b = next_or_none(a).next, next_or_none(b).next
  current_a, current_b = next_a(), next_b()
  while True:
    if current_val is None:
      if current_a == current_b:
        current_val = current_a
        hasA = hasB = True
        current_a, current_b = next_a(), next_b()
      elif current_a is not None and (current_b is None or current_a < current_b):
        current_val = current_a
        hasA = True
        current_a = next_a()
      elif current_b is not None and (current_a is None or current_b < current_a):
        current_val = current_b
        hasB = True
        current_b = next_b()
      else:
        break
    else: # There's a current_val
      sub_a = []
      while current_a is not None and current_a < current_val:
        sub_a.append(current_a)
        current_a = next_a()
      sub_b = []
      while current_b is not None and current_b < current_val:
        sub_b.append(current_b)
        current_b = next_b()
      if sub_a or sub_b:
        sub_ab = list(combine(sub_a,sub_b))
      else:
        sub_ab = []
      yield (current_val,hasA,hasB,sub_ab)
      current_val = None
      hasA = hasB = False
for row in combine(fileReader(fileA),fileReader(fileB)):
  print row

收益率：

('20111225', True, True, [])
('20111226', True, True, [('20090101', True, True, []), ('20090102', False, True, [])])
('20111227', False, True, [('20090105', False, True, [])])

相关内容

最新更新

热门标签：