打印CSV的前N行,其中带引号的字段可以包含换行符



CSV文件可以有新行数据。它可以与任何列一起使用。此外,一些线路可以在没有任何新线路的情况下有数据,因此它应该在所有情况下都能工作

样本输入

ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnect
Thanks for your time!
With Joy.
Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnect
Thanks for your time!
With Joy.
Test",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111113,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111114,TestUser,1234567891,test1,hello msg1,Address test2,City test2

我正在使用以下命令读取csv 的前5条记录

awk -v RS='("[^"]*")?r?n' 'NF{ORS = gensub(/r?n(.)/, "\\n\1", "g", RT);  ++n; print} n==5{exit}' file.csv

实际输出:

ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnectnThanks for your time!nWith Joy.Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnectnThanks for your time!nWith Joy.nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111113,TestUser,1234567891,test1,hello msg1,Address test2,City test2
11111114,TestUser,1234567891,test1,hello msg1,Address test2,City test2

想要输出:

ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnectnThanks for your time!nWith Joy.Test",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnectnThanks for your time!nWith Joy.nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2

只显示您的示例,您可以尝试以下awk代码吗。使用GNUawk编写和测试。使用RS记录分隔符,然后全局替换以使RT中的新行无效,然后相应地打印这些行。

awk -v RS='"[^"]*"' '{gsub(/n/,"\n",RT);ORS=RT} 1' Input_file

要获得前10条记录,请尝试以下操作:

awk -v RS='"[^"]*"' '{gsub(/n/,"\n",RT);ORS=RT} 1' Input_file | head -10

警告:自我提升在即!

我写了一个类似awk的实用程序tawk,它使用tcl作为脚本语言,并且具有读取CSV数据的模式,而不必使用正则表达式来处理带有嵌入换行符和引号的记录(这个功能实际上是我的主要灵感来源(。

使用它:

$ tawk -csv 'line {$NR <= 5} { puts [regsub -all {n+} $F(0) "\n"]; if {$NR == 5} exit }' input.csv
ID,username,mobile,city,Message,Address,city
'11111111',TestUSer,1234567890,test,"Hi how are you? Well: we will connnectnThanks for your time!nWith Joy.nTest",Address test,City test
11111116,TestUser,1234567891,test,hello msg,Address test1,City test1
'111111167',TestUSer,1234567890,test,"Hi how are you one? Well: we will connnectnThanks for your time!nWith Joy.nTest",Address test,City test
11111112,TestUser,1234567891,test1,hello msg1,Address test2,City test2

相关内容

  • 没有找到相关文章

最新更新