我正在尝试删除previous_location列中的重复项,该列当前是json对象。我想将该行保留为json对象。
初始化列表列表
data = [['tom', 20, [{"location":"USA", "State":"CA"}, {"location":"USA", "State":"CA"}, {"location":"USA", "State":"TX"}]],
['nick', 35, [{"location":"USA", "State":"PA"}, {"location":"USA", "State":"PA"}, {"location":"USA", "State":"ME"}]],
['julie', 29, [{"location":"USA", "State":"WA"}, {"location":"USA", "State":"WA"}, {"location":"USA", "State":"HI"}]]]
创建熊猫数据帧
df = pd.DataFrame(data, columns=['name', 'age', 'previous_location'])
打印数据帧
print(df)
名称 | 年龄 | previous_location[/tr>|
---|---|---|
tom | 20 | {州:"CA","地点":"美国"},{州;美国|
nick | 35 | [{"州":"PA","地点":"美国"},{"国":"巴勒斯坦","位置":"USA"},|
julie | 29 | [{"州":"华盛顿州","地点":"美国"},{"国":"西澳州",] |
尝试:
df["previous_location"] = df["previous_location"].apply(
lambda x: [dict(d) for d in set(tuple(sorted(d.items())) for d in x)]
)
print(df)
打印:
name age previous_location
0 tom 20 [{'State': 'CA', 'location': 'USA'}, {'State': 'TX', 'location': 'USA'}]
1 nick 35 [{'State': 'ME', 'location': 'USA'}, {'State': 'PA', 'location': 'USA'}]
2 julie 29 [{'State': 'WA', 'location': 'USA'}, {'State': 'HI', 'location': 'USA'}]
编辑:
def fn(x):
out, seen = [], set()
for dct in x:
t = tuple(sorted(dct.items()))
if t not in seen:
out.append(dct)
seen.add(t)
return out
df["previous_location"] = df["previous_location"].apply(fn)
print(df)
打印:
name age previous_location
0 tom 20 [{'location': 'USA', 'State': 'CA'}, {'location': 'USA', 'State': 'TX'}]
1 nick 35 [{'location': 'USA', 'State': 'PA'}, {'location': 'USA', 'State': 'ME'}]
2 julie 29 [{'location': 'USA', 'State': 'WA'}, {'location': 'USA', 'State': 'HI'}]