如何从列表中删除项目以及在其他列表中删除与其关联的项目



我正在做一个项目,我已经到了需要从主列表中删除任何重复项的地步。我这里有三个列表,我正在努力消除flight_ID列表中的重复项。我设法做到了,但不幸的是,我无法删除与flight_ID列表中删除的元素相关联的其他元素。

# All lists have a length of 20
flight_ID = ['1064662221', '1064617390', '1064614152', '1064614152', 
'1064775880', '1064645826', '1064645826', '1064664535', '1064659772', 
'1064659772', '1064614050', '1064614050', '1064614286', '1064614286', 
'1064614286', '1064614286', '1064614286', '1064614286', '1064614286', '1064646536']
flight_number = ['1827', '1585', '8409', '1465', '30', '9188', '2232', '3760', '579', '3309', '1259', '2193', '6566', '2231', '5214', '8601', '3169', '1601', '7832', '335']
airline_Code = ['TK', 'AY', 'DL', 'AF', 'FX', 'UA', 'LH', 'U2', 'SK', 'A3', 'AF', 'KL', 'VS', 'UX', 'G3', 'UU', 'KQ', 'AF', 'AR', 'LO']

我使用以下功能从主列表中删除重复项:

def remove_dup(a):
i = 0
while i < len(a):
j = i + 1
while j < len(a):
if a[i] == a[j]:
del a[j]
else:
j += 1
i += 1
remove_dup(flight_ID)
# OUTPUT
['1064662221', '1064617390', '1064614152', '1064775880', '1064645826', '1064664535', '1064659772', '1064614050', '1064614286', '1064646536']
# 10 elements have been removed.

现在,正如我上面所描述的,我需要对其他列表做同样的事情,所以与主列表(flight_ID(中的项目匹配的项目也会被删除。

注意:尽管主列表显示重复项目,但其他列表的项目不会

如果您要对以您所描述的方式格式化的数据做更多的处理,我建议使用Pandas,因为它可以以无痛的方式删除重复项等操作:

import pandas as pd
# Make a DataFrame
flight_ID = ['1064662221', '1064617390', ...]
flight_number = ['1827', '1585', '8409', ...]
airline_Code = ['TK', 'AY', 'DL', ...]
df = pd.DataFrame({'flight_ID': flight_ID,
'flight_number': flight_number,
'airline_Code': airline_Code})
# Remove duplicates - just one line!
df.drop_duplicates('flight_ID', inplace=True)

你会得到一个看起来像这样的DataFrame:

flight_ID flight_number airline_Code
0   1064662221          1827           TK
1   1064617390          1585           AY
2   1064614152          8409           DL
4   1064775880            30           FX
5   1064645826          9188           UA
7   1064664535          3760           U2
8   1064659772           579           SK
10  1064614050          1259           AF
12  1064614286          6566           VS
19  1064646536           335           LO

首先,根据需要更改表示以链接项目,而不是使用并行列表。

flight_list = zip(flight_ID, flight_number, airline_Code)

这使得更容易删除三个相关项目。

现在,使用任何标准的方法删除重复项。在每一个中构建一个新的列表:正如本网站上的许多帖子所记录的那样,改变迭代目标是个坏主意。将其保持在您演示的编程级别:

unique_flight = []
found_ID = set()
for flight in flight_list:
if flight[0] not in found_ID:
found_ID.add(flight[0])
unique_flight.append(flight)
for flight in unique_flight:
print(flight)

输出:

('1064662221', '1827', 'TK')
('1064617390', '1585', 'AY')
('1064614152', '8409', 'DL')
('1064775880', '30', 'FX')
('1064645826', '9188', 'UA')
('1064664535', '3760', 'U2')
('1064659772', '579', 'SK')
('1064614050', '1259', 'AF')
('1064614286', '6566', 'VS')
('1064646536', '335', 'LO')

这里有几种方法,但我会考虑使用一个类来表示这种数据(类似于namedtuple示例的工作方式(

将flight_ID作为关键字添加到字典中,使其具有唯一性,并将值作为索引:

flight_ID_inds = {f: i for i, f in enumerate(flight_ID)}
flight_ID = list(flight_ID_inds.keys())
flight_number = [flight_number[i] for i in flight_ID_inds.values()]
airline_Code = [airline_Code[i] for i in flight_ID_inds.values()]

同样,但将值作为其他列表数据的元组,而不是索引:

dic = {fid: (fn, ac) for fid, fn, ac in zip(flight_ID, flight_number, airline_Code)}
flight_ID = list(dic.keys())
flight_number = [x[0] for x in dic.values()]
airline_Code = [x[1] for x in dic.values()]

使用命名元组(使用dicts表示的列表也可以(:

from collections import namedtuple
flight_nt = namedtuple("Flight", "flight_ID, flight_number, airline_Code")
flights = [flight_nt(fid, fn, ac) for fid, fn, ac in zip(flight_ID, flight_number, airline_Code)]
uniq_ids = set()
uniq_flights = []
for f in flights:
if f.flight_ID not in uniq_ids:
uniq_ids.add(f.flight_ID)
uniq_flights.append(f)
flight_ID = [x.flight_ID for x in uniq_flights]
flight_number = [x.flight_number for x in uniq_flights]
airline_Code = [x.airline_Code for x in uniq_flights]

对于这种问题,我推荐一个面向对象的(类或数据类(:

class Flight:
def __init__(self, flight_id, flight_number, airline_code):
self.flight_id = flight_id
self.flight_number = flight_number
self.airline_code = airline_code
def __hash__(self):
return hash(self.flight_id)
def __eq__(self, other):
return other.flight_id == self.flight_id
flights = [Flight(fid, fn, ac) for fid, fn, ac in zip(flight_ID, flight_number, airline_Code)]
uniq_flights = set(flights)

@Prune有一个更好的解决方案,但您可以始终使用enumerate()

for index, id in enumerate(flight_ID):
if id in flight_ID[index:]:
del flight_ID[index]
del flight_number[index]
del airline_Code[index]

注意,这并不能保持顺序,如果你想这样做,你必须在切片中找到值的索引。

您可以首先确定要保留/删除的元素,然后使用itertools.compress删除元素:

import itertools as it
keep = []
seen = set()
for x in flight_ID:
keep.append(x not in seen)
seen.add(x)
flight_ID = list(it.compress(flight_ID, keep))
flight_number = list(it.compress(flight_number, keep))
airline_Code = list(it.compress(airline_Code, keep))

然而,由于这些数据似乎在逻辑上属于一起,因此为其创建一个专用容器类可能是个好主意,例如通过namedtuple:

from collections import namedtuple
FlighData = namedtuple('id number code')
data = [FlightData(*x) for x in zip(flight_ID, flight_number, airline_Code)]

那么另一种方法是使用itertools.groupby:

unique_data = list(next(g) for k, g in it.groupby(sorted(data), key=op.itemgetter(0)))

最新更新