我有一个CSV,看起来像这样:
created,id,value
2022-12-16 11:55,58,10
2022-12-16 11:55,59,2
2022-12-16 11:50,58,11
2022-12-16 11:50,59,3
2022-12-16 11:50,60,7
我想解析它,所以我有以下结果,将id设置为列并按日期分组:
created,58,59,60
2022-12-16 11:55,10,2,nan
2022-12-16 11:50,11,3,7
缺失值设置为nan,每个id在每个日期最多出现一次
我该怎么做?如果jq
更方便的话,我还有JSON格式的CSVJSON由类似的元素组成:
{
"created": "2022-12-16 09:15",
"value": "10.4",
"id": "60"
}
使用伟大的米勒(版本>= 6),运行
mlr --csv reshape -s id,value then unsparsify then fill-empty -v "nan" input.csv
你
created,58,59,60
2022-12-1611:55,10,2,nan
2022-12-1611:50,11,3,7
这里的核心命令是reshape -s id,value
,用于将您的输入从长结构转换为宽结构。
基于JSON输入流,我将在jq中这样做:
reduce inputs as {$created, $value, $id} ({head: [], body: {}};
.head |= (.[index($id) // length] = $id) | .body[$created][$id] = $value
)
| (.head | sort_by(tonumber)) as $head | ["created", $head[]], (
.body | to_entries[] | [.key, .value[$head[]]]
)
然后,要么使用内置的@csv
,它将值包装在引号中,并为缺失的组合生成空值:
jq -nr '
⋮
| @csv
'
"created","2","3","10","11","50","55","58","59"
"2022-12-16 11:55","6",,"3",,,"4","2","5"
"2022-12-16 11:50",,"12",,"9","10",,"8","11"
演示或者通过map
ping和join
ping来手动生成nan
和,
:
jq -nr '
⋮
| map(. // "nan") | join(",")
'
created,2,3,10,11,50,55,58,59
2022-12-16 11:55,6,nan,3,nan,nan,4,2,5
2022-12-16 11:50,nan,12,nan,9,10,nan,8,11
演示