CSV中类似JSON的奇怪数据格式



我不得不使用一种奇怪的类似JSON的格式,这种格式会使解析器崩溃,因为它不完全是JSON(没有引号、等号而不是冒号等(。

有人见过这样的数据格式吗?如果有,那是什么?

"[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16},..."

它嵌套在CSV结构中,所以我想知道它是否与此有关。

编辑

完整示例

[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]

使用python以yaml的形式运行此过程会得到以下结果:

import yaml
dct = yaml.safe_load(body)
dct
[{'location_type=1': None,
'location_fullname=Papua New Guinea': None,
'location_countrycode=PP': None,
...

该格式似乎足够兼容,只需几次字符串操作即可将其转换为YAML或JSON。

1.YAML转换

这是将字符串加载到字典列表中的最简单方法。如果你的Python项目已经需要YAML,那么你没有任何理由不使用这个解决方案:

import yaml
body = '[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'
dct = yaml.safe_load( body.replace('=',': ') )

2.JSON转换

如果您希望避免在项目中添加外部依赖项(YAML(,则可以使用此解决方案。虽然大多数人不关心这一点,但我确实关心(最好的(。需要注意的是,它需要知道转换数据类型的数据结构(该函数是@dawganswer的略微修改版本(:

import re
import json
body = '[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'
def conv(s):
try:
return int(s)
except ValueError:
pass
try:
return float(s)
except ValueError:
return None if s == '' else s
dct = [ 
{ k: conv(v) for k,v in d.items() }
for d in json.loads(
re.sub(
'([^s[{^=]+)=([^,}]*)([,}]])',
'"\1":"\2"\3',
body
)
)
]

两种溶液的产量:
# dct
[
{
'location_type': 1,
'location_fullname': 'Papua New Guinea',
'location_countrycode': 'PP',
'location_adm1code': 'PP',
'location_adm2code': None,
'location_latitude': -6,
'location_longitude': 147,
'location_featureid': 'PP',
'character_offset': 16
},
{
'location_type': 1,
'location_fullname': 'Papua New Guinea',
'location_countrycode': 'PP',
'location_adm1code': 'PP',
'location_adm2code': None,
'location_latitude': -6,
'location_longitude': 147,
'location_featureid': 'PP',
'character_offset': 290},
{
'location_type': 1,
'location_fullname': 'Indonesia',
'location_countrycode': 'ID',
'location_adm1code': 'ID',
'location_adm2code': None,
'location_latitude': -5,
'location_longitude': 120,
'location_featureid': 'ID',
'character_offset': 676
},
{
'location_type': 1,
'location_fullname': 'North Korea',
'location_countrycode': 'KN',
'location_adm1code': 'KN',
'location_adm2code': None,
'location_latitude': 40,
'location_longitude': 127,
'location_featureid': 'KN',
'character_offset': 748
},
{
'location_type': 1,
'location_fullname': 'British Indian Ocean Territory',
'location_countrycode': 'IO',
'location_adm1code': 'IO',
'location_adm2code': None,
'location_latitude': -6,
'location_longitude': 71.5,
'location_featureid': 'IO', 'character_offset': 892
}
]

我会处理yaml给你的东西,因为它几乎在那里:

import yaml 
ex='''
[{location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=16}, {location_type=1, location_fullname=Papua New Guinea, location_countrycode=PP, location_adm1code=PP, location_adm2code=, location_latitude=-6, location_longitude=147, location_featureid=PP, character_offset=290}, {location_type=1, location_fullname=Indonesia, location_countrycode=ID, location_adm1code=ID, location_adm2code=, location_latitude=-5, location_longitude=120, location_featureid=ID, character_offset=676}, {location_type=1, location_fullname=North Korea, location_countrycode=KN, location_adm1code=KN, location_adm2code=, location_latitude=40, location_longitude=127, location_featureid=KN, character_offset=748}, {location_type=1, location_fullname=British Indian Ocean Territory, location_countrycode=IO, location_adm1code=IO, location_adm2code=, location_latitude=-6, location_longitude=71.5, location_featureid=IO, character_offset=892}]'''
def conv(s):
try:
return int(s)
except ValueError:
pass 

try:
return float(s)
except ValueError:
return s

res=[{x:conv(y) for x,y in map(lambda s: s.split('='), di)} 
for di in yaml.load(ex, Loader=yaml.CLoader)]
>>> res
[{'location_type': 1, 'location_fullname': 'Papua New Guinea', 'location_countrycode': 'PP', 'location_adm1code': 'PP', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 147, 'location_featureid': 'PP', 'character_offset': 16}, {'location_type': 1, 'location_fullname': 'Papua New Guinea', 'location_countrycode': 'PP', 'location_adm1code': 'PP', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 147, 'location_featureid': 'PP', 'character_offset': 290}, {'location_type': 1, 'location_fullname': 'Indonesia', 'location_countrycode': 'ID', 'location_adm1code': 'ID', 'location_adm2code': '', 'location_latitude': -5, 'location_longitude': 120, 'location_featureid': 'ID', 'character_offset': 676}, {'location_type': 1, 'location_fullname': 'North Korea', 'location_countrycode': 'KN', 'location_adm1code': 'KN', 'location_adm2code': '', 'location_latitude': 40, 'location_longitude': 127, 'location_featureid': 'KN', 'character_offset': 748}, {'location_type': 1, 'location_fullname': 'British Indian Ocean Territory', 'location_countrycode': 'IO', 'location_adm1code': 'IO', 'location_adm2code': '', 'location_latitude': -6, 'location_longitude': 71.5, 'location_featureid': 'IO', 'character_offset': 892}]

最新更新