如何读取 HTML json 文件并获取具有特定 ID 或类的元素



我有以下函数:

def html_dict_search(html_dict, selector):

哪里

 html_dict = json.load(f)

selector是类/ID 名称。

选择器=.headline-item 的函数应该返回这样的东西:

     {
         "name": "li",
         "attrs": {
             "class": "headline-item"
         },
         "text": "one",
         "children": []
     },
     {
         "name": "li",
         "attrs": {
             "class": "headline-item"
         },
         "text": "two",
         "children": []
     }

我似乎想不出一种方法来做到这一点,也找不到任何好书。欢迎任何建议或想法。

您可以使用列表推导式来过滤集合,如下所示

html_json = [{
     "name": "li",
     "attrs": {
         "class": "headline-item"
     },
     "text": "one",
     "children": []
 },
 {
     "name": "li",
     "attrs": {
         "class": "headline-item"
     },
     "text": "two",
     "children": []
 },
 {
     "name": "li",
     "attrs": {
         "class": "subtitle-item"
     },
     "text": "two",
     "children": []
 }]
headline_items = [element for element in html_json if element["attrs"]["class"] == "headline-item"]

这将产生以下数据headline_items

[{'name': 'li',
  'attrs': {'class': 'headline-item'},
  'text': 'one',
  'children': []},
 {'name': 'li',
  'attrs': {'class': 'headline-item'},
  'text': 'two',
  'children': []}]

最新更新