如何从JSON生成表?ValueError:将dicts与非系列混合可能导致排序不明确



我确实是python的初学者,但我正在尝试使用IBM的情绪分析器来制作数据集。我得到一个JSON响应,我想把它放到一个表中。到目前为止,我拥有的是:

response = natural_language_understanding.analyze(
text = df_text,
features=Features(sentiment=SentimentOptions(targets=['Pericles']))).get_result()
print(json.dumps(response, indent=2))
respj = json.dumps(response['sentiment'])
respj

它打印

'{"targets": [{"text": "Pericles", "score": -0.939436, "label": "negative"}], "document": {"score": -0.903556, "label": "negative"}}'

现在正是在这一点上,我真的想用这些数据制作一个熊猫表。理想情况下,我希望以上所有信息的格式都像->文本|文本分数|文档分数

我真的不需要正面或负面的标签,但拥有它也无妨。我该如何做到这一点?现在当我尝试时

json_df = pd.read_json(respj)
json_df.head()

我得到

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-20-b06d8a1caf3f> in <module>
----> 1 json_df = pd.read_json(respj)
2 json_df.head()
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
212                 else:
213                     kwargs[new_arg_name] = new_arg_value
--> 214             return func(*args, **kwargs)
215 
216         return cast(F, wrapper)
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
606         return json_reader
607 
--> 608     result = json_reader.read()
609     if should_close:
610         filepath_or_buffer.close()
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in read(self)
729             obj = self._get_object_parser(self._combine_lines(data.split("n")))
730         else:
--> 731             obj = self._get_object_parser(self.data)
732         self.close()
733         return obj
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json)
751         obj = None
752         if typ == "frame":
--> 753             obj = FrameParser(json, **kwargs).parse()
754 
755         if typ == "series" or obj is None:
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in parse(self)
855 
856         else:
--> 857             self._parse_no_numpy()
858 
859         if self.obj is None:
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self)
1086 
1087         if orient == "columns":
-> 1088             self.obj = DataFrame(
1089                 loads(json, precise_float=self.precise_float), dtype=None
1090             )
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
433             )
434         elif isinstance(data, dict):
--> 435             mgr = init_dict(data, index, columns, dtype=dtype)
436         elif isinstance(data, ma.MaskedArray):
437             import numpy.ma.mrecords as mrecords
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in init_dict(data, index, columns, dtype)
252             arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
253         ]
--> 254     return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
255 
256 
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in arrays_to_mgr(arrays, arr_names, index, columns, dtype)
62     # figure out the index, if necessary
63     if index is None:
---> 64         index = extract_index(arrays)
65     else:
66         index = ensure_index(index)
/opt/conda/envs/Python-3.8-main/lib/python3.8/site-packages/pandas/core/internals/construction.py in extract_index(data)
366 
367             if have_dicts:
--> 368                 raise ValueError(
369                     "Mixing dicts with non-Series may lead to ambiguous ordering."
370                 )
ValueError: Mixing dicts with non-Series may lead to ambiguous ordering

如果有人能给我一些关于如何制作我想要制作的桌子的建议,我将不胜感激。此外,如果有人能够解释我现在的错误,那也将是非常棒的。我想我得到了一个基本前提,那就是JSON有两个不兼容的";表格";已经在里面了。谢谢你的帮助。

如果只想将response['sentiment']转换为DataFrame,则不需要将其转储为JSON字符串。请改用pandas.json_normalize

看起来response['sentiment']有点像

>>> response['sentiment']
{
"targets": [{"text": "Pericles", 
"score": -0.939436, 
"label": "negative"}], 
"document": {"score": -0.903556, 
"label": "negative"}
}

然后,你只需要

df = pd.json_normalize(response['sentiment'], 
record_path='targets',
meta=[['document','score'], ['document','label']])

输出

>>> df
text     score     label document.score document.label
0  Pericles -0.939436  negative      -0.903556       negative

可选地,您可以在之后使用DataFrame.rename:根据需要重命名列

cols_mapping = {
'text': 'Text', 
'score': 'Text Score', 
'label': 'Text Label', 
'document.score': 'Document Score', 
'document.label': 'Document Label'
}
df = df.rename(columns=cols_mapping)
>>> df 
Text  Text Score Text Label Document Score Document Label
0  Pericles   -0.939436   negative      -0.903556       negative

我相信这应该适用于您:

targets = {k: [t[k] for t in j['targets']] for k in j['targets'][0].keys()}
doc_scores = [j['document']['score']] * len(j['targets'])
pd.DataFrame({'document_score': doc_scores, **targets})

相关内容

最新更新