如何使用python以类似字典的格式合并多个重复的键名



我的数据格式像字典,我的数据有多个重复的键重复多次,字符串在列表中作为值,我想合并所有的键具有相同的名称和它们的值,数据恰好是在一个格式像字典,但不是一个实际的字典,我把它称为字典只是因为它存在的方式。

#Data我有如下所示,

"city":["New York", "Paris", "London"],
"country":["India", "France", "Italy"],
"city":["New Delhi", "Tokio", "Wuhan"],
"organisation":["ITC", "Google", "Facebook"],
"country":["Japan", "South Korea", "Germany"],
"organisation":["TATA", "Amazon", "Ford"]

我有1000个重复的键,其中有一些重复的和唯一的值,我想根据键合并或追加。

#产量预期

"city":["New York", "Paris", "London", "New Delhi", "Tokio", "Wuhan"],
"country":["India", "France", "Italy", "Japan", "South Korea", "Germany"],
"organisation":["ITC", "Google", "Facebook", "TATA", "Amazon", "Ford"],

谁能建议一下。

  • 已经确定这不是字典,这是一个LR(1)语法,类似于JSON语法
  • 用LR解析器解析并标记它
  • https://lark-parser.readthedocs.io/en/latest/json_tutorial.html展示了如何解析JSON
  • 需要一个小的调整,以便重复的键工作(考虑一个字典)作为列表,见代码)
  • 已使用熊猫从解析器中获取输出并根据需要进行重塑
from lark import Transformer
from lark import Lark
import pandas as pd
json_parser = Lark(r"""
?value: dict
| list
| string
| SIGNED_NUMBER      -> number
| "true"             -> true
| "false"            -> false
| "null"             -> null
list : "[" [value ("," value)*] "]"
dict : "{" [pair ("," pair)*] "}"
pair : string ":" value
string : ESCAPED_STRING
%import common.ESCAPED_STRING
%import common.SIGNED_NUMBER
%import common.WS
%ignore WS
""", start='value')
class TreeToJson(Transformer):
def string(self, s):
(s,) = s
return s[1:-1]
def number(self, n):
(n,) = n
return float(n)
list = list
pair = tuple
dict = list # deal with issue of repeating keys...
null = lambda self, _: None
true = lambda self, _: True
false = lambda self, _: False
js = """{
"city":["New York", "Paris", "London"],
"country":["India", "France", "Italy"],
"city":["New Delhi", "Tokio", "Wuhan"],
"organisation":["ITC", "Google", "Facebook"],
"country":["Japan", "South Korea", "Germany"],
"organisation":["TATA", "Amazon", "Ford"]
}"""    

tree = json_parser.parse(js)
pd.DataFrame(TreeToJson().transform(tree), columns=["key", "list"]).explode(
"list"
).groupby("key").agg({"list": lambda s: s.unique().tolist()}).to_dict()["list"]

输出
{'city': ['New York', 'Paris', 'London', 'New Delhi', 'Tokio', 'Wuhan'],
'country': ['India', 'France', 'Italy', 'Japan', 'South Korea', 'Germany'],
'organisation': ['ITC', 'Google', 'Facebook', 'TATA', 'Amazon', 'Ford']}

相关内容

  • 没有找到相关文章

最新更新