R——根据标准从json输出中提取字符串/s



我试图从使用谷歌API后的json输出提取某些值。在整个输出中,我希望根据定义的标准提取特定的值。我正在使用的表看起来像这样-

<表类> 指数 价值 tbody><<tr>1{city_display_city_name_fg:真的,"subMarket_id":1631年,"subMarket_tx":"南方NH","market_id":644年,"market_tx":"南方NH","metro_id":37}2{city_display_city_name_fg:假的,"subMarket_id":2464年,"subMarket_tx":"北NH"、"market_id":541年,"metro_id":57}

这是一个使用Python的可行解决方案。

逻辑:

  • 转换您的API输出pd.Dataframe()

  • 分裂">价值";用apply(pd.Series)

    分隔成不同的列
  • concat()合并分割的列

  • drop()删除无用的列

代码:

import pandas as pd

api_output = {'index':[1,2] ,
'value': [{'city_display_city_name_fg': True, 'subMarket_id': 1631, 'subMarket_tx': 'Southern NH', 'market_id': 644, 'market_tx': 'Southern NH', 'metro_id': 37}, {'city_display_city_name_fg': False, 'subMarket_id': 2464, 'subMarket_tx': 'north NH', 'market_id': 541, 'metro_id': 57}]}

# convert entire API output into pandas df
api_df = pd.DataFrame(api_output)

# split "value" to columns, concat with previous api_df, drop useless columns
final_df = pd.concat([api_df, api_df['value'].apply(pd.Series)], axis=1).drop(['city_display_city_name_fg','subMarket_id','subMarket_tx'], axis=1)

结果:

<表类>指数价值market_idmarket_txmetro_idtbody><<tr>1{city_display_city_name_fg:真的,"subMarket_id":1631年,"subMarket_tx":"南方NH","market_id":644年,"market_tx":"南方NH","metro_id":37}644南部NH372{city_display_city_name_fg:假的,"subMarket_id":2464年,"subMarket_tx":"北NH"、"market_id":541年,"metro_id":57}541南57

您可以从字符串中提取所有值,只保留您需要的列。

library(dplyr)
library(tidyr)
df %>%
mutate(value = gsub('[{}]', '', value)) %>%
separate_rows(value, sep = ',\s*') %>%
separate(value, c('name', 'value'), sep = ':\s*') %>%
pivot_wider(names_from = name, values_from = value)
#  index city_display_city_name_fg subMarket_id subMarket_tx market_id market_tx   metro_id
#  <int> <chr>                     <chr>        <chr>        <chr>     <chr>       <chr>   
#1     1 True                      1631         Southern NH  644       Southern NH 37      
#2     2 False                     2464         north NH     541       NA          57      

以可重复的格式提供数据更容易提供帮助。

df <- structure(list(index = 1:2, value = c("{city_display_city_name_fg: True, subMarket_id: 1631, subMarket_tx: Southern NH, market_id: 644, market_tx: Southern NH, metro_id: 37}", 
"{city_display_city_name_fg: False, subMarket_id: 2464, subMarket_tx: north NH, market_id: 541, metro_id: 57}"
)), row.names = c(NA, -2L), class = "data.frame")

最新更新