在Python中的列表中保留项目的最新修订



我有一个文档名称列表,其中一些具有修订版,例如:

docs = ["ab-14-001", "ab-14-001A", "ab-14-001B", "ab-14-002", "jk-9-12B", "jk-9-12C", "io-34-003"]

我想保留每一个的最新修订,意思是:

docs_final = ["ab-14-001B", "ab-14-002", "jk-9-12C", "io-34-003"]

正如你所看到的,有些文档不会以第一种状态出现(末尾没有字母rev(,它们只是以rev字母出现(如jk-9-12B(。

有没有一种快速的方法可以解析并将这个列表分为docs_final和docs_old?

谢谢!


澄清:结尾的字母是唯一用来复习的东西。例如,这两份文件并不是相互修订的,它们完全不同:

"ab-14-001" and "ab-14-002A"

您可以使用itertools.groupby(我假设您的列表docs已排序(:

from itertools import groupby
docs = ["ab-14-001", "ab-14-001A", "ab-14-001B", "jk-9-12B", "jk-9-12C", "io-34-003"]
docs_old, docs_new = [], []
for _, g in groupby(docs, lambda k: k.rsplit('-', maxsplit=1)[0]):
*a, b = g
docs_old.extend(a)
docs_new.append(b)
print('Old = ', docs_old)
print('New = ', docs_new)

打印:

Old =  ['ab-14-001', 'ab-14-001A', 'jk-9-12B']
New =  ['ab-14-001B', 'jk-9-12C', 'io-34-003']

编辑:

import re
from itertools import groupby
docs = ["ab-14-001", "ab-14-001A", "ab-14-001B", "ab-14-002", "jk-9-12B", "jk-9-12C", "io-34-003"]
docs_old, docs_new = [], []
for _, g in groupby(docs, lambda k: re.search(r'(.*?)[A-Z]*$', k).group(1)):
*a, b = g
docs_old.extend(a)
docs_new.append(b)
print('Old = ', docs_old)
print('New = ', docs_new)

打印:

Old =  ['ab-14-001', 'ab-14-001A', 'jk-9-12B']
New =  ['ab-14-001B', 'ab-14-002', 'jk-9-12C', 'io-34-003']

您可以将以下代码示例用于此类文章名称的情况。不过要小心,因为这只适用于不包括破折号("-"(的版本。版本号/修订号是最后一个破折号后面的一组数字。如果修订命名样式发生更改,则必须相应地调整代码。

docs = ["ab-14-001", "ab-14-001A", "ab-14-001B", "jk-9-12B", "jk-9-12C", "io-34-003"]
#sort articles:
docs.sort()
#split identifier into article and revision number (version number) as tuple
splitted = [x.split('-') for x in docs]
revisions = [("-".join(doc[:-1]), doc[-1]) for doc in splitted]
#iterate over same doc and overwriting older versions with the newest found:
result = ["-".join([doc, version]) for (doc, version) in {x[0]: x[1] for x in revisions}.items()]
print(result)

作为替代方案,此oneliner还将返回最新版本的列表

#One Liner:
result = ["-".join([d, v]) for (d, v) in {x[0]: x[1] for x in [("-".join(doc[:-1]), doc[-1]) for doc in [x.split('-') for x in docs]]}.items()]
from string import ascii_uppercase
docs = ["ab-14-001", "ab-14-001A", "ab-14-001B", "ab-14-002", "jk-9-12B", "jk-9-12C", "io-34-003"]
revisions = tuple(ascii_uppercase)
latest_docs = {}
for doc in sorted(docs):
key = doc[:-1] if doc.endswith(revisions) else doc
latest_docs[key] = doc
print(list(latest_docs.values()))

输出

['ab-14-001B', 'ab-14-002', 'io-34-003', 'jk-9-12C']

使用dict理解可以缩短它,但我肯定会选择以上的那个

from string import ascii_uppercase
docs = ["ab-14-001", "ab-14-001A", "ab-14-001B", "ab-14-002", "jk-9-12B", "jk-9-12C", "io-34-003"]
latest_docs = {doc[:-1] if doc.endswith(tuple(ascii_uppercase)) else doc:doc for doc in sorted(docs)}
print(list(latest_docs.values()))

最新更新