列表比较和删除重复

我尝试编写一个脚本，将不同大小的列表作为输入，并将输入的最长列表(包括最短列表中的字符)作为输出。

我把列表放在一个数据框架中，并使用一个脚本遍历数据框架的所有值，以查看相同的列表中是否存在相同的字符，如果有匹配，则打印最长的字符。

lists = [['a','b','g'], ['a','c','d','e','g'], ['a','b'], ['b', 'd', 'f'], ['a', 'c']]
df = pd.DataFrame(lists)

定义数字行:

nber_rows=len(df.index)

循环遍历数据帧以查找列表之间的匹配:

> listnorep=[] for f in range(nber_rows):
>         row1 = df.iloc[f].dropna().tolist();
>         list_intersection=[]
>         for g in range(nber_rows):
>             row2 = df.iloc[g].dropna().tolist();
>             check = all( elem in row2 for elem in row1);
>             if check == True:
>                 list_intersection.append(row2);
>         if list_intersection:
>             listnorep.append(list_intersection);
>         else:
>             listnorep.append(row1); listnorep

本例中期望的输出是:

a b g None None
a c d e g
b d f

可以使用集合操作。如果有一个集合是<对于另一个，我们不要选择它:>

# aggregate as set (after stacking to drop the NaNs)
s = df.stack().groupby(level=0).agg(set)
# keep rows that do not have any sweet greater than them
df[[not any(a<b for b in s) for a in s]]

输出:

0  1  2     3     4
0  a  b  g  None  None
1  a  c  d     e     g
3  b  d  f  None  None

相关内容

最新更新

热门标签：