CSV文件可以有新行数据。它可以与任何列一起使用。此外,一些线路可以在没有任何新线路的情况下有数据,因此它应该在所有情况下都能工作
样本输入
ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnect
Thanks for your time!
With Joy.
Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnect
Thanks for your time!
With Joy.
Test",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111113,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111114,TestUser,1234567891,test1,hello msg1,Address test2,City test2
我正在使用以下命令读取csv 的前5条记录
awk -v RS='("[^"]*")?r?n' 'NF{ORS = gensub(/r?n(.)/, "\\n\1", "g", RT); ++n; print} n==5{exit}' file.csv
实际输出:
ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnectnThanks for your time!nWith Joy.Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnectnThanks for your time!nWith Joy.nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111113,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111114,TestUser,1234567891,test1,hello msg1,Address test2,City test2
想要输出:
ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnectnThanks for your time!nWith Joy.Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnectnThanks for your time!nWith Joy.nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
只显示您的示例,您可以尝试以下awk
代码吗。使用GNUawk
编写和测试。使用RS
记录分隔符,然后全局替换以使RT中的新行无效,然后相应地打印这些行。
awk -v RS='"[^"]*"' '{gsub(/n/,"\n",RT);ORS=RT} 1' Input_file
要获得前10条记录,请尝试以下操作:
awk -v RS='"[^"]*"' '{gsub(/n/,"\n",RT);ORS=RT} 1' Input_file | head -10
警告:自我提升在即!
我写了一个类似awk
的实用程序tawk
,它使用tcl作为脚本语言,并且具有读取CSV数据的模式,而不必使用正则表达式来处理带有嵌入换行符和引号的记录(这个功能实际上是我的主要灵感来源(。
使用它:
$ tawk -csv 'line {$NR <= 5} { puts [regsub -all {n+} $F(0) "\n"]; if {$NR == 5} exit }' input.csv
ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnectnThanks for your time!nWith Joy.nTest",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnectnThanks for your time!nWith Joy.nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2