如何将来自API的JSON转换为DATAFRAME pyspark



我试图读取API返回的Json并将其播放到DATAFRAME pyspar,但该文件只出现在一个名为_corrup_record:的字段中

我的JSON是这样的:

{
"title": "texto",
"nickname": "(app)",
"language": "en",
"folder_id": "0",
"category": "",
"question_count": 3,
"page_count": 3,
"response_count": 2636,
"date_created": "2021-01-27T19:22:00",
"date_modified": "2021-06-01T11:43:00",
"id": "00000",
"buttons_text": {
"next_button": "next",
"prev_button": "prev",
"done_button": "done",
"exit_button": ""
},
"is_owner": true,
"footer": false,
"custom_variables": {

},
"href": "https://api",
"analyze_url": "https://api",
"edit_url": "https://api",
"collect_url": "https://api",
"summary_url": "https://api",
"preview": "https://api",
"pages": [
{
"title": "",
"description": "",
"position": 1,
"question_count": 1,
"id": "00000",
"href": "https://api",
"questions": [
{
"id": "602406071",
"position": 1,
"visible": true,
"family": "matrix",
"subtype": "rating",
"layout": null,
"sorting": null,
"required": {
"text": "texto",
"type": "all",
"amount": "0"
},
"validation": null,
"forced_ranking": false,
"headings": [
{
"heading": "texto"
}
],
"href": "https://api",
"answers": {
"rows": [
{
"position": 1,
"visible": true,
"text": "",
"id": "00000"
}
],
"choices": [
{
"position": 1,
"visible": true,
"text": "texto",
"id": "00000",
"is_na": false,
"weight": 0,
"description": ""
},
{
"position": 2,
"visible": true,
"text": "texto",
"id": "00000",
"is_na": false,
"weight": 0,
"description": ""
},
{
"position": 3,
"visible": true,
"text": "texto",
"id": "00000",
"is_na": false,
"weight": 0,
"description": ""
},
{
"position": 4,
"visible": true,
"text": "texto",
"id": "00000",
"is_na": false,
"weight": 0,
"description": ""
},
{
"position": 5,
"visible": true,
"text": "texto",
"id": "00000",
"is_na": false,
"weight": 0,
"description": ""
}
]
},
"display_options": {
"show_display_number": true,
"display_type": "emoji",
"display_subtype": "star",
"left_label_id": null,
"left_label": "",
"right_label_id": null,
"right_label": "",
"middle_label_id": null,
"middle_label": "",
"custom_options": {
"color": "#f5a623",
"option_set": [

]
}
}
}
]
}
]
}

我需要它按列分隔,包括对齐的字段,我想按列分隔:

我正在尝试:

endpoint = f"my_API"
headers = {'Authorization': subscription_key}
request = requests.get(endpoint, headers=headers)
aall_df = json.loads(request.content)
rdd = sc.parallelize([aall_df])
df1 = spark.read.json(rdd)
df1.show(truncate=False)

预期输出:

/tr>n>td style="text-align:right;">nan 7…/td>
titledate_createddate_modifiedid
texto(应用程序(02021-027…0下一个prev

将字符串/未解析的json传递给rdd,而不是解析的json

aall_df = request.content # json.loads(request.content) 
rdd = sc.parallelize([aall_df])
df1 = spark.read.json(rdd)
df1.show(truncate = False)

打印:

+-----------+--------------------+--------+-----------+-------------------+-------------------+-----------+---------+------+-----------+-----+--------+--------+--------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+--------------+--------------+-----------+-----+
|analyze_url|buttons_text        |category|collect_url|date_created       |date_modified      |edit_url   |folder_id|footer|href       |id   |is_owner|language|nickname|page_count|pages                                                                                                                                                                                                                                                                                                                                                                                                                    |preview    |question_count|response_count|summary_url|title|
+-----------+--------------------+--------+-----------+-------------------+-------------------+-----------+---------+------+-----------+-----+--------+--------+--------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+--------------+--------------+-----------+-----+
|https://api|{done, , next, prev}|        |https://api|2021-01-27T19:22:00|2021-06-01T11:43:00|https://api|0        |false |https://api|00000|true    |en      |(app)   |3         |[{, https://api, 00000, 1, 1, [{{[{, 00000, false, 1, texto, true, 0}, {, 00000, false, 2, texto, true, 0}, {, 00000, false, 3, texto, true, 0}, {, 00000, false, 4, texto, true, 0}, {, 00000, false, 5, texto, true, 0}], [{00000, 1, , true}]}, {{#f5a623, []}, star, emoji, , null, , null, , null, true}, matrix, false, [{texto}], https://api, 602406071, null, 1, {0, texto, all}, null, rating, null, true}], }]|https://api|3             |2636          |https://api|texto|
+-----------+--------------------+--------+-----------+-------------------+-------------------+-----------+---------+------+-----------+-----+--------+--------+--------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+--------------+--------------+-----------+-----+

df1.printSchema()

打印

root
|-- analyze_url: string (nullable = true)
|-- buttons_text: struct (nullable = true)
|    |-- done_button: string (nullable = true)
|    |-- exit_button: string (nullable = true)
|    |-- next_button: string (nullable = true)
|    |-- prev_button: string (nullable = true)
|-- category: string (nullable = true)
|-- collect_url: string (nullable = true)
|-- date_created: string (nullable = true)
|-- date_modified: string (nullable = true)
|-- edit_url: string (nullable = true)
|-- folder_id: string (nullable = true)
|-- footer: boolean (nullable = true)
|-- href: string (nullable = true)
|-- id: string (nullable = true)
|-- is_owner: boolean (nullable = true)
|-- language: string (nullable = true)
|-- nickname: string (nullable = true)
|-- page_count: long (nullable = true)
|-- pages: array (nullable = true)
|    |-- element: struct (containsNull = true)
|    |    |-- description: string (nullable = true)
|    |    |-- href: string (nullable = true)
|    |    |-- id: string (nullable = true)
|    |    |-- position: long (nullable = true)
|    |    |-- question_count: long (nullable = true)
|    |    |-- questions: array (nullable = true)
|    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |-- answers: struct (nullable = true)
|    |    |    |    |    |-- choices: array (nullable = true)
|    |    |    |    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |    |    |    |-- description: string (nullable = true)
|    |    |    |    |    |    |    |-- id: string (nullable = true)
|    |    |    |    |    |    |    |-- is_na: boolean (nullable = true)
|    |    |    |    |    |    |    |-- position: long (nullable = true)
|    |    |    |    |    |    |    |-- text: string (nullable = true)
|    |    |    |    |    |    |    |-- visible: boolean (nullable = true)
|    |    |    |    |    |    |    |-- weight: long (nullable = true)
|    |    |    |    |    |-- rows: array (nullable = true)
|    |    |    |    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |    |    |    |-- id: string (nullable = true)
|    |    |    |    |    |    |    |-- position: long (nullable = true)
|    |    |    |    |    |    |    |-- text: string (nullable = true)
|    |    |    |    |    |    |    |-- visible: boolean (nullable = true)
|    |    |    |    |-- display_options: struct (nullable = true)
|    |    |    |    |    |-- custom_options: struct (nullable = true)
|    |    |    |    |    |    |-- color: string (nullable = true)
|    |    |    |    |    |    |-- option_set: array (nullable = true)
|    |    |    |    |    |    |    |-- element: string (containsNull = true)
|    |    |    |    |    |-- display_subtype: string (nullable = true)
|    |    |    |    |    |-- display_type: string (nullable = true)
|    |    |    |    |    |-- left_label: string (nullable = true)
|    |    |    |    |    |-- left_label_id: string (nullable = true)
|    |    |    |    |    |-- middle_label: string (nullable = true)
|    |    |    |    |    |-- middle_label_id: string (nullable = true)
|    |    |    |    |    |-- right_label: string (nullable = true)
|    |    |    |    |    |-- right_label_id: string (nullable = true)
|    |    |    |    |    |-- show_display_number: boolean (nullable = true)
|    |    |    |    |-- family: string (nullable = true)
|    |    |    |    |-- forced_ranking: boolean (nullable = true)
|    |    |    |    |-- headings: array (nullable = true)
|    |    |    |    |    |-- element: struct (containsNull = true)
|    |    |    |    |    |    |-- heading: string (nullable = true)
|    |    |    |    |-- href: string (nullable = true)
|    |    |    |    |-- id: string (nullable = true)
|    |    |    |    |-- layout: string (nullable = true)
|    |    |    |    |-- position: long (nullable = true)
|    |    |    |    |-- required: struct (nullable = true)
|    |    |    |    |    |-- amount: string (nullable = true)
|    |    |    |    |    |-- text: string (nullable = true)
|    |    |    |    |    |-- type: string (nullable = true)
|    |    |    |    |-- sorting: string (nullable = true)
|    |    |    |    |-- subtype: string (nullable = true)
|    |    |    |    |-- validation: string (nullable = true)
|    |    |    |    |-- visible: boolean (nullable = true)
|    |    |-- title: string (nullable = true)
|-- preview: string (nullable = true)
|-- question_count: long (nullable = true)
|-- response_count: long (nullable = true)
|-- summary_url: string (nullable = true)
|-- title: string (nullable = true)

最新更新