我试图从使用谷歌API后的json输出提取某些值。在整个输出中,我希望根据定义的标准提取特定的值。我正在使用的表看起来像这样-
<表类>
指数
价值
tbody><<tr>1 {city_display_city_name_fg:真的,"subMarket_id":1631年,"subMarket_tx":"南方NH","market_id":644年,"market_tx":"南方NH","metro_id":37} 2{city_display_city_name_fg:假的,"subMarket_id":2464年,"subMarket_tx":"北NH"、"market_id":541年,"metro_id":57} 表类>
这是一个使用Python的可行解决方案。
逻辑:
-
转换您的API输出用
pd.Dataframe()
-
分裂">价值";用
分隔成不同的列apply(pd.Series)
-
用
concat()
合并分割的列 -
用
drop()
删除无用的列
代码:
import pandas as pd
api_output = {'index':[1,2] ,
'value': [{'city_display_city_name_fg': True, 'subMarket_id': 1631, 'subMarket_tx': 'Southern NH', 'market_id': 644, 'market_tx': 'Southern NH', 'metro_id': 37}, {'city_display_city_name_fg': False, 'subMarket_id': 2464, 'subMarket_tx': 'north NH', 'market_id': 541, 'metro_id': 57}]}
# convert entire API output into pandas df
api_df = pd.DataFrame(api_output)
# split "value" to columns, concat with previous api_df, drop useless columns
final_df = pd.concat([api_df, api_df['value'].apply(pd.Series)], axis=1).drop(['city_display_city_name_fg','subMarket_id','subMarket_tx'], axis=1)
结果:
<表类>指数 价值 market_id market_tx metro_id tbody><<tr>1 {city_display_city_name_fg:真的,"subMarket_id":1631年,"subMarket_tx":"南方NH","market_id":644年,"market_tx":"南方NH","metro_id":37} 644 南部NH 37 2{city_display_city_name_fg:假的,"subMarket_id":2464年,"subMarket_tx":"北NH"、"market_id":541年,"metro_id":57} 541 南 57 表类>
您可以从字符串中提取所有值,只保留您需要的列。
library(dplyr)
library(tidyr)
df %>%
mutate(value = gsub('[{}]', '', value)) %>%
separate_rows(value, sep = ',\s*') %>%
separate(value, c('name', 'value'), sep = ':\s*') %>%
pivot_wider(names_from = name, values_from = value)
# index city_display_city_name_fg subMarket_id subMarket_tx market_id market_tx metro_id
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 True 1631 Southern NH 644 Southern NH 37
#2 2 False 2464 north NH 541 NA 57
以可重复的格式提供数据更容易提供帮助。
df <- structure(list(index = 1:2, value = c("{city_display_city_name_fg: True, subMarket_id: 1631, subMarket_tx: Southern NH, market_id: 644, market_tx: Southern NH, metro_id: 37}",
"{city_display_city_name_fg: False, subMarket_id: 2464, subMarket_tx: north NH, market_id: 541, metro_id: 57}"
)), row.names = c(NA, -2L), class = "data.frame")