是否从Pandas Dataframe中的一列json对象中删除重复值



我正在尝试删除previous_location列中的重复项,该列当前是json对象。我想将该行保留为json对象。

初始化列表列表

data = [['tom', 20, [{"location":"USA", "State":"CA"}, {"location":"USA", "State":"CA"}, {"location":"USA", "State":"TX"}]], 
['nick', 35, [{"location":"USA", "State":"PA"}, {"location":"USA", "State":"PA"}, {"location":"USA", "State":"ME"}]], 
['julie', 29, [{"location":"USA", "State":"WA"}, {"location":"USA", "State":"WA"}, {"location":"USA", "State":"HI"}]]]

创建熊猫数据帧

df = pd.DataFrame(data, columns=['name', 'age', 'previous_location'])

打印数据帧

print(df)
previous_location[/tr>{州:"CA","地点":"美国"},{州;美国[{"州":"PA","地点":"美国"},{"国":"巴勒斯坦","位置":"USA"},
名称年龄
tom20
nick35
julie29[{"州":"华盛顿州","地点":"美国"},{"国":"西澳州",]

尝试:

df["previous_location"] = df["previous_location"].apply(
lambda x: [dict(d) for d in set(tuple(sorted(d.items())) for d in x)]
)
print(df)

打印:

name  age                                                         previous_location
0    tom   20  [{'State': 'CA', 'location': 'USA'}, {'State': 'TX', 'location': 'USA'}]
1   nick   35  [{'State': 'ME', 'location': 'USA'}, {'State': 'PA', 'location': 'USA'}]
2  julie   29  [{'State': 'WA', 'location': 'USA'}, {'State': 'HI', 'location': 'USA'}]

编辑:

def fn(x):
out, seen = [], set()
for dct in x:
t = tuple(sorted(dct.items()))
if t not in seen:
out.append(dct)
seen.add(t)
return out

df["previous_location"] = df["previous_location"].apply(fn)
print(df)

打印:

name  age                                                         previous_location
0    tom   20  [{'location': 'USA', 'State': 'CA'}, {'location': 'USA', 'State': 'TX'}]
1   nick   35  [{'location': 'USA', 'State': 'PA'}, {'location': 'USA', 'State': 'ME'}]
2  julie   29  [{'location': 'USA', 'State': 'WA'}, {'location': 'USA', 'State': 'HI'}]

最新更新