如何理解这种数据提取并将其转换为pandas数据框架?



我使用python API在线提取一些数据。

import urllib.request
url_path = "https://www.censtatd.gov.hk/api/get.php?id=660-69001&lang=en&param=N4KABGBEDGBukC4yghSBJAcgEUWA2uKmgJYB2AJgPoCKAQgErpUDK6kANEcZOdfU1a4uxUpSoAWAKyduqXuIkA2WaKh9JAdlWiF-RswkBOKlIAMpgIymATDZ08Nxh-I0DmAdQAqLsfsEAgj4iuhpSNpa+6uJSAMz2IY4xMomuMSqpflQAUlF6VEqRmdHUUkZFcllKEpY2BTWxeRrVEvUmSoUFSo3F+e5UAGIBTeL9ASwj1EpSXQmVJbQGVAAyk1QAsmuYa-1eAPI7SwDCB3IAukQAviGQAM7wSCjyBniEapAMAIYA7lSWFAAHKgkCgADyoZEg5yuNwBAFMAE4kAD2FDwTzQtwALp8EVi8JAbGYzOZIjCiLw0UhIB0zABaJRGYkVNAAG0+ZAA5gS4ZCQJcgA"
with urllib.request.urlopen(url_path) as url:
s = url.read()

如何使这个数据转储成有用的熊猫数据框架格式?

返回的数据是JSON文档。这将转换JSON数据中的数据集,但data中还有其他元数据:

import urllib.request
import json
import pandas as pd
url_path = "https://www.censtatd.gov.hk/api/get.php?id=660-69001&lang=en&param=N4KABGBEDGBukC4yghSBJAcgEUWA2uKmgJYB2AJgPoCKAQgErpUDK6kANEcZOdfU1a4uxUpSoAWAKyduqXuIkA2WaKh9JAdlWiF-RswkBOKlIAMpgIymATDZ08Nxh-I0DmAdQAqLsfsEAgj4iuhpSNpa+6uJSAMz2IY4xMomuMSqpflQAUlF6VEqRmdHUUkZFcllKEpY2BTWxeRrVEvUmSoUFSo3F+e5UAGIBTeL9ASwj1EpSXQmVJbQGVAAyk1QAsmuYa-1eAPI7SwDCB3IAukQAviGQAM7wSCjyBniEapAMAIYA7lSWFAAHKgkCgADyoZEg5yuNwBAFMAE4kAD2FDwTzQtwALp8EVi8JAbGYzOZIjCiLw0UhIB0zABaJRGYkVNAAG0+ZAA5gS4ZCQJcgA"
with urllib.request.urlopen(url_path) as url:
s = url.read()
data = json.loads(s)
df = pd.DataFrame(data['dataSet'])
print(df)

我更喜欢第三方requests模块:

import requests
import pandas as pd
url_path = "https://www.censtatd.gov.hk/api/get.php?id=660-69001&lang=en&param=N4KABGBEDGBukC4yghSBJAcgEUWA2uKmgJYB2AJgPoCKAQgErpUDK6kANEcZOdfU1a4uxUpSoAWAKyduqXuIkA2WaKh9JAdlWiF-RswkBOKlIAMpgIymATDZ08Nxh-I0DmAdQAqLsfsEAgj4iuhpSNpa+6uJSAMz2IY4xMomuMSqpflQAUlF6VEqRmdHUUkZFcllKEpY2BTWxeRrVEvUmSoUFSo3F+e5UAGIBTeL9ASwj1EpSXQmVJbQGVAAyk1QAsmuYa-1eAPI7SwDCB3IAukQAviGQAM7wSCjyBniEapAMAIYA7lSWFAAHKgkCgADyoZEg5yuNwBAFMAE4kAD2FDwTzQtwALp8EVi8JAbGYzOZIjCiLw0UhIB0zABaJRGYkVNAAG0+ZAA5gS4ZCQJcgA"
response = requests.get(url_path)
if response.ok:
data = response.json()
df = pd.DataFrame(data['dataSet'])
print(df)
else:
print(response)

输出:

IND                                       INDDesc  ... figure sd_value
0          ind_45                           Import/export trade  ...   73.4         
1          ind_46                                     Wholesale  ...   61.8         
2          ind_47                                        Retail  ...     43         
3          ind_49                          <i>Land transport<i>  ...   60.1         
4         ind_521                       Warehousing and storage  ...   35.4         
...           ...                                           ...  ...    ...      ...
2155  ind_QBRI_CO  Computer and information technology services  ...  106.1        p
2156  ind_QBRI_FA  <i>Financial markets and asset management<i>  ...  113.6        p
2157   ind_QBRI_L                                   Real estate  ...  130.4        p
2158  ind_QBRI_TO   Tourism, convention and exhibition services  ...     17        p
2159  ind_QBRI_WT                         <i>Water transport<i>  ...  162.8        p
[2160 rows x 8 columns]

最新更新