使用Miller将CSV转换为JSON时,如何清理空字段



我有几个CSV文件的项目数据,我正在处理一个游戏,我需要将其转换为JSON以供消费。数据可能是非常不规则的,每个记录有几个空字段,这导致了某种丑陋的JSON输出。

具有伪值的示例:

Id,Name,Value,Type,Properties/1,Properties/2,Properties/3,Properties/4
01:Foo:13,Foo,13,ACME,CanExplode,IsRocket,,
02:Bar:42,Bar,42,,IsRocket,,,
03:Baz:37,Baz,37,BlackMesa,CanExplode,IsAlive,IsHungry,

转换输出:

[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": ["CanExplode", "IsRocket", ""]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Type": "",
"Properties": ["IsRocket", "", ""]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

到目前为止,我已经非常成功地使用了Miller。我已经设法从CSV中删除了完全空的列,并将Properties/X列聚合到一个数组中。

但现在我想再做两件事来改进输出格式,使JSON更容易使用:

  • Properties数组中删除空字符串""
  • 将其他空字符串""(例如第二条记录的Type(替换为null

所需输出:

[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": ["CanExplode", "IsRocket"]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Type": null,
"Properties": ["IsRocket"]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

有没有办法用Miller实现这一点?

我当前的命令是:

  • mlr -I --csv remove-empty-columns file.csv清理列
  • 转换用mlr --icsv --ojson --jflatsep '/' --jlistwrap cat file.csv > file.json

这可能不是你想要的方式。我也使用jq。

运行

mlr --c2j  --jflatsep '/' --jlistwrap remove-empty-columns then cat input.csv | 
jq '.[].Properties|=map(select(length > 0))' | 
jq '.[].Type|=(if . == "" then null else . end)'

你会有

[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": [
"CanExplode",
"IsRocket"
]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Type": null,
"Properties": [
"IsRocket"
]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": [
"CanExplode",
"IsAlive",
"IsHungry"
]
}
]

使用Miller,您可以"过滤掉";每个记录的空字段带有:

mlr --c2j --jflatsep '/' --jlistwrap put '
$* = select($*, func(k,v) {return v != ""})
' file.csv

备注:实际上,我们正在构建一个包含非空字段的新记录,而不是从记录中删除空字段;最终的结果是等价的:

[
{
"Id": "01:Foo:13",
"Name": "Foo",
"Value": 13,
"Type": "ACME",
"Properties": ["CanExplode", "IsRocket"]
},
{
"Id": "02:Bar:42",
"Name": "Bar",
"Value": 42,
"Properties": ["IsRocket"]
},
{
"Id": "03:Baz:37",
"Name": "Baz",
"Value": 37,
"Type": "BlackMesa",
"Properties": ["CanExplode", "IsAlive", "IsHungry"]
}
]

最新更新