将两个列表之间的数据排序为一个新列表,并使用新列表中保存的数据格式化字符串列表



如果这不是很清楚,我很抱歉,这是我第一次在这里提问,所以我希望我能正确地解释我的问题。

我有以下不同值的列表:

A_list = ['A', 'A', 'B', ['C', 'D'] ]
B_list = ['A1', 'W5', 'X6', 'A2', 'A3', 'T5', 'B0', 'Z9', 'C1', 'W3', 'D1']
C_list = []
string_list = ["{0} in Alpha", "{0} in Apple", "{0} in Bee", "{0} in Cheese and {1} in Dice"]

我需要在B_list中找到A_list的元素,将它们附加到C_list中,并使输出是string_list中带有C_list元素的格式化字符串。

B_list中寻找A_list[i]后,C_list最终会是这样的:

C_list = ['A1', 'A2', 'A3', 'B0', ['C1', 'D1'] ]

输出将是这样的:

A1 in Alpha,
A1 in Apple,
A2 in Alpha,
A2 in Apple,
A3 in Alpha,
A3 in Apple,
B0 in Bee,
C1 in Cheese and D1 in Dice

我一直在用嵌套列表来破坏我的头脑,并使它们以与A_list类似的顺序排列,以便能够用以下方式格式化输出:

output = string_list[i].format(*C_list[i]) // just an example

我一直在尝试使用for循环和if语句的混合来解决这个问题。我可以在一个简单的for循环中搜索B_listA_list的元素:

for a in A_list:
for b in B_list:
if a in b:
print(str(a) + " found in " + str(b))

让我崩溃的是如何将B_list中找到的元素添加到与A_list类似的格式中,以便我可能最终得到

C_list = ['A1', 'A2', 'A3', 'B0', ['C1', 'D1']]

而不是this:

C_list = ['A1', 'A2', 'A3', 'B0', 'C1', 'D1']

第一部分:获取C_list

您必须自己创建嵌套列表以附加到C_list。如果a中的项既可以是字符串列表也可以是字符串,则有两种情况

def get_A_in_B(a_list:"list[str|list[str]]",b_list:"list[str]"):
c_list = [] # global within this function     

# for neatness   
def process_base_item(a_str:"str",out_list:"list"):
matches = sorted([b_str for b_str in b_list if b_str.startswith(a_str)])
out_list.extend(matches)

for a_item in a_list: # case 1 - is list, extend nested
if type(a_item) is list:
sublist = a_item
nested_list = []
for sub_item in sublist:
process_base_item(sub_item,nested_list)
if nested_list:
c_list.append(nested_list)
else: # case 2 - is string, extend c list
process_base_item(a_item,c_list)
return c_list

用法:

A_list = ['A', 'B', ['C', 'D'] ]
B_list = ['A1', 'W5', 'X6', 'A2', 'A3', 'T5', 'B0', 'Z9', 'C1', 'W3', 'D1']
C_list = get_A_in_B(A_list,B_list,string_list)
输出:

['A1', 'A2', 'A3', 'B0', ['C1', 'D1']]

第二部分:格式

如果满足以下两个假设,可以工作:
  1. 假设只有一个
  2. 格式字符串中每种类型的字母
  3. 假设如果你想循环通过所有的可能性如果嵌套不均匀("C1"例如,"C2","D1"] =比;"C1"+"D1","C2" +"D1"

这是真正棘手的部分。我使用regex将字母与格式字符串匹配。

对于C_list的嵌套列表,我按字母将它们分成更多的子列表,然后将它们的笛卡尔积作为多个参数输入到格式字符串。

和之前一样,你有两种情况

def format_string_list(c_list,string_list):
formatted_string_list = []
for c_item in c_list:
for fmt_str in string_list:
if type(c_item) is list: # case 1 - is list, match multiple
c_sublist = c_item
# assumption 1: letters are unique
first_letters = sorted(set([c_str[0] for c_str in c_sublist]))
matched_letters = []
for letter in first_letters:
pat = f" in {letter}"
if pat in fmt_str:
matched_letters.append(letter)

if first_letters==matched_letters: 
# get dictionary of lists, indexed by first letter
c_str_d = {}
for letter in first_letters:
c_str_d[letter] = [c_str for c_str in c_sublist if letter in c_str]

# assumption 2: get all combinations
for c_str_list in itertools.product(*c_str_d.values()):
c_fmtted = fmt_str.format(*c_str_list)
formatted_string_list.append(c_fmtted) 
else: # case 2
c_str = c_item
first_letter = c_str[0]
pat = f" in {first_letter}"
if pat in fmt_str:
c_fmtted = fmt_str.format(c_str)
formatted_string_list.append(c_fmtted)

return formatted_string_list

用法:

C_list = ['A1', 'A2', 'A3', 'B0', ['C1', 'D1'] ]
string_list = ["{0} in Alpha", "{0} in Apple", "{0} in Bee", "{0} in Cheese and {1} in Dice"]
formatted_string_list = format_string_list(C_list,string_list)
# print output
print("n".join(formatted_string_list))
输出:

A1 in Alpha
A1 in Apple
A2 in Alpha
A2 in Apple
A3 in Alpha
A3 in Apple
B0 in Bee
C1 in Cheese and D1 in Dice

也适用于更复杂的情况

没有超过一层嵌套,不要认为你需要它

A_list = ['A', 'B', ['C', 'D', 'E']]
B_list = ['A1', 'W5', 'X6', 'D2', 'E1', 'A2', 'A3', 'T5', 'E2', 'B0', 'Z9', 'C1', 'W3', 'D1']
string_list = ["{0} in Alpha", "{0} in Apple", "{0} in Bee", "{0} in Cheese and {1} in Dice {2} in Egg"]

输出:

['A1', 'A2', 'A3', 'B0', ['C1', 'D1', 'D2', 'E1', 'E2']]
A1 in Alpha
A1 in Apple
A2 in Alpha
A2 in Apple
A3 in Alpha
A3 in Apple
B0 in Bee
C1 in Cheese and D1 in Dice E1 in Egg
C1 in Cheese and D1 in Dice E2 in Egg
C1 in Cheese and D2 in Dice E1 in Egg
C1 in Cheese and D2 in Dice E2 in Egg

如果您在处理A_list时将其规范化,使其始终是字符串列表,则问题会容易管理得多:

for a in A_list:
# Normalize a to a list[str]
a = a if isinstance(a, list) else [a]
# Pop all matches from B_list into C_list.
while True:
c = []
for i in a:
for b in B_list.copy():
if b.startswith(i):
c.append(b)
B_list.remove(b)
break
if len(c) == len(a):
break  # append this c and scan B again
else:
break  # no more matches, continue to next a
# Convert c back to a str|list[str]
C_list.append(c[0] if len(c) == 1 else c)
print(C_list)
# ['A1', 'A2', 'A3', 'B0', ['C1', 'D1']]

我可能建议在所有情况下将c保留为字符串列表,因为它可能使您的格式化部分更容易,但希望上面的内容可以帮助您克服如何处理这种棘手的嵌套格式的数据的初始障碍(同时仍然可以选择将其转换回原始棘手的格式)。

相关内容

  • 没有找到相关文章

最新更新