我有一个csv文件,我试图在bash中解析。每行的第一个字段是时间戳,格式为yyyy-mm-dd hh:mm:ss。每10分钟生成6行代码,我在下面添加了一个小示例。
我想做的是得到每天的前6行。每天的第一个条目可以发生在00:00:xx和00:10:xx之间的任何时间,因此"00:0"的grep不起作用。
2010-04-23 00:04:43, 0.0, 0, 4666724, 3217665, 28866, 28866, 0.92, 65,
2010-04-23 00:04:43, 0.1, 0, 4666724, 3217663, 20832, 20832, 0.62, 65,
2010-04-23 00:04:43, 0.2, 0, 4666724, 3217662, 14702, 14702, 0.46, 65,
2010-04-23 00:04:43, 0.3, 0, 4666724, 3217664, 27739, 27739, 0.92, 65,
2010-04-23 00:04:43, 0.4, 0, 4666724, 3217664, 25105, 25105, 0.77, 65,
2010-04-23 00:04:43, 0.5, 0, 4666724, 3217664, 24546, 24546, 0.77, 65,
2010-04-23 00:14:43, 0.0, 0, 4666724, 3217665, 29226, 29226, 0.92, 65,
2010-04-23 00:14:43, 0.1, 0, 4666724, 3217663, 21552, 21552, 0.62, 65,
2010-04-23 00:14:43, 0.2, 0, 4666724, 3217662, 15422, 15422, 0.46, 65,
2010-04-23 00:14:43, 0.3, 0, 4666724, 3217664, 28459, 28459, 0.92, 65,
2010-04-23 00:14:43, 0.4, 0, 4666724, 3217664, 25825, 25825, 0.77, 65,
2010-04-23 00:14:43, 0.5, 0, 4666724, 3217664, 25266, 25266, 0.77, 65,
2010-04-23 00:24:43, 0.0, 0, 4666724, 3217665, 29586, 29586, 0.92, 65,
2010-04-23 00:24:43, 0.1, 0, 4666724, 3217663, 22272, 22272, 0.77, 65,
以此类推到
2010-04-24 00:05:02, 0.0, 0, 4666724, 3217701, 71388, 71388, 2.31, 65,
2010-04-24 00:05:02, 0.1, 0, 4666724, 3217701, 70264, 70264, 2.31, 65,
2010-04-24 00:05:02, 0.2, 0, 4666724, 3217700, 61254, 61254, 2.00, 65,
2010-04-24 00:05:02, 0.3, 0, 4666724, 3217701, 71011, 71011, 2.31, 65,
2010-04-24 00:05:02, 0.4, 0, 4666724, 3217701, 68111, 68111, 2.15, 65,
2010-04-24 00:05:02, 0.5, 0, 4666724, 3217702, 69904, 69904, 2.31, 65,
的想法,意见吗?鲍勃
eugene y回答的awk版本
awk '
$1 != date {count = 0; date = $1}
++count <= 6 {print}
' filename
它可以像使用2种模式grep一样简单:
grep -e " 00:0" -e " 00:10" myFIle.csv
第一个模式将匹配00:00
到00:09
,第二个模式将匹配00:10
。
使用Perl应该很容易:
perl -ane '$l = 0 if $F[0] ne $d; print if $l++ < 6; $d = $F[0]' file
下面使用read
和自定义IFS
(=Input Field Separator)设置将输入行拆分为日期-时间字段,然后使用bash的子字符串操作符从ISO日期-时间中提取日期,然后基本上继续打印接下来的N行。在echo
的位置上,您可能希望对结果执行任何处理,因为read
+ echo
不会完全保留输入。
function first_n_of_each_day() {
local N="$1"
local lastDateTime=""
local I=0
while IFS=',' read DATETIME OTHER ; do
local DATE="${DATETIME:0:10}"
if [ "$DATE" != "$lastDateTime" ] ; then
I=0
lastDateTime="$DATE"
fi
if [ $I -lt "$N" ] ; then
let ++I
# line matches:
echo "$DATETIME,$OTHER"
fi
done
}
first_n_of_each_day 6 < file.csv