Bash按值分组csv数据



我有一个CSV,看起来像这样:

created,id,value
2022-12-16 11:55,58,10
2022-12-16 11:55,59,2
2022-12-16 11:50,58,11
2022-12-16 11:50,59,3
2022-12-16 11:50,60,7

我想解析它,所以我有以下结果,将id设置为列并按日期分组:

created,58,59,60
2022-12-16 11:55,10,2,nan
2022-12-16 11:50,11,3,7

缺失值设置为nan,每个id在每个日期最多出现一次

我该怎么做?如果jq

更方便的话,我还有JSON格式的CSVJSON由类似的元素组成:

{ 
"created": "2022-12-16 09:15", 
"value": "10.4", 
"id": "60" 
} 

使用伟大的米勒(版本>= 6),运行

mlr --csv reshape -s id,value then unsparsify then fill-empty -v "nan" input.csv

created,58,59,60
2022-12-1611:55,10,2,nan
2022-12-1611:50,11,3,7

这里的核心命令是reshape -s id,value,用于将您的输入从长结构转换为宽结构。

基于JSON输入流,我将在jq中这样做:

reduce inputs as {$created, $value, $id} ({head: [], body: {}};
.head |= (.[index($id) // length] = $id) | .body[$created][$id] = $value
)
| (.head | sort_by(tonumber)) as $head | ["created", $head[]], (
.body | to_entries[] | [.key, .value[$head[]]]
)

然后,要么使用内置的@csv,它将值包装在引号中,并为缺失的组合生成空值:

jq -nr '
⋮
| @csv
'
"created","2","3","10","11","50","55","58","59"
"2022-12-16 11:55","6",,"3",,,"4","2","5"
"2022-12-16 11:50",,"12",,"9","10",,"8","11"

演示或者通过mapping和joinping来手动生成nan,:

jq -nr '
⋮
| map(. // "nan") | join(",")
'
created,2,3,10,11,50,55,58,59
2022-12-16 11:55,6,nan,3,nan,nan,4,2,5
2022-12-16 11:50,nan,12,nan,9,10,nan,8,11

演示

相关内容

  • 没有找到相关文章

最新更新