Sed/Awk段落格式解决方案



我需要从一起运行的文本中创建段落,在大多数情况下,回车和/或换行都已删除。对话穿插在课文中。所以我想要的是在第二次引用后插入一行空行。看起来这些引号会衬托出重建后的段落。我添加了正斜杠(不在文本中),因为我不知道在这个网站上引用代码的惯例。这里有一个例子:

从此开始:

培根ipsum dolor amet熏牛肉chuck鹿肉猪,意大利腊肠火腿腿猪肚。菲力牛排、牛肋排、火腿飞节、培根碎圆牛肝菌。牛肉培根牛柳牛排"我想要培根"。肩舌肉丸尾肉干猪里脊菲力牛排"我想要培根"。短里脊猪里脊汉堡咸牛肉里贝耶三尖伦敦人火腿飞节兰杰格t骨猪。猪肚法兰克福香肠,t骨火腿飞节培根熏牛肉。毕尔巴鄂牛肉夹火腿飞节猪里脊肩条"我要培根。"牛排短里脊尾巴cupim-rump-alcatra。肩部牛肉杯臀磨圆。牛里脊肉库皮姆肉丸火腿里贝耶。"我想要培根。"鹿肉尾里贝耶,熏牛肉舌猪牛肋排,基尔巴沙烤肉。香腿菲力牛排,肩球尖猪肚肉肠肥肠。熏火腿鹿肉、小里脊肉和意大利腊肠舌头咸牛肉。牛里脊肉牛里脊胸肉三尖潘切塔基尔巴萨条形牛排leberkas短肋排侧面菲力牛排火腿飞节猪肉。三尖cupim"我想要培根。"我想要熏肉。"

到此:

培根ipsum dolor amet熏牛肉chuck鹿肉猪,意大利腊肠火腿腿猪肚。菲力牛排、牛肋排、火腿飞节、培根碎圆牛肝菌。牛肉培根牛柳

"我想要培根。"

chuck胸脯landjaeger肉干熏火腿leberkas猪里脊肉。肩舌肉丸尾肉干猪里脊肉

"我想要培根。"

mignon柄卡盘柄式清管器。短里脊猪里脊汉堡咸牛肉里贝耶三尖伦敦人火腿飞节兰杰格t骨猪。猪肚法兰克福香肠,t骨火腿飞节培根熏牛肉。毕尔巴鄂牛肉夹火腿飞节猪里脊肩带

"我想要培根。"

牛排短腰肉,小腰肉。肩部牛肉杯臀磨圆。牛里脊肉库皮姆肉丸火腿里贝耶。

"我想要培根。"

鹿肉尾里贝耶,熏牛肉舌,猪牛肋排,烤肉。香腿菲力牛排,肩球尖猪肚肉肠肥肠。熏火腿鹿肉、小里脊肉和意大利腊肠舌头咸牛肉。牛里脊肉牛里脊胸肉三尖潘切塔基尔巴萨条形牛排leberkas短肋排侧面菲力牛排火腿飞节猪肉。三尖杯

"我想要培根。"

"我想要培根。"

awk -v RS='"' '{
if (NR % 2 == 1) {
    if (/[^[:space:]]/) printf "%s%snn", (NR==1? "" : "n"), $0
} else {
    printf ""%s"n", $0
}}' file

输出

Bacon ipsum dolor amet pastrami chuck venison swine, salami prosciutto shank pork belly. Filet mignon beef ribs ham hock, bacon ground round porchetta alcatra. Beef bacon biltong bresaola short loin filet mignon 
"I want bacon."
 chuck brisket landjaeger jerky prosciutto ham leberkas pork loin doner. Shoulder tongue meatball tail jerky pork loin filet 
"I want bacon."
 mignon shank chuck shankle flank pig. Short loin pork loin hamburger corned beef ribeye tri-tip doner ham hock landjaeger t-bone swine. Swine pork belly frankfurter, t-bone ham hock bacon pastrami. Biltong beef chuck ham hock pork loin shoulder strip 
"I want bacon."
steak short loin tail cupim rump alcatra.Shoulder beef cupim rump ground round. Beef sirloin cupim meatball ham ribeye. 
"I want bacon."
 Venison tail ribeye, pastrami tongue pig beef ribs kielbasa bresaola doner. Shankle filet mignon pig, shoulder ball tip pork belly jowl sausage fatback boudin. Prosciutto venison capicola bacon, short loin andouille salami shank tongue corned beef. Sirloin biltong boudin tenderloin brisket tri-tip pancetta kielbasa strip steak leberkas short ribs flank filet mignon ham hock pork. Tri-tip cupim 
"I want bacon."
"I want bacon."

试试这个:

awk 'BEGIN{RS=" ?" ?"; ORS="nn"}
     NR%2==0{print """$0""";next;}
     {}1' inputFile

这将在每个引用("...")前后插入一个新段落。然而,这将使最后几段看起来像这个

"I want bacon."

"I want bacon."

删除"我想要培根"之间的空白段落:

awk 'BEGIN{RS=" ?" ?"; ORS="nn"}
     NR%2==0{print """$0""";next;}
     ($0!=""){print $0}' inputFile

sed可能更容易

$ sed 's/"[^"]*" /nn&nn/g' bacon

示例:

$ echo "bla bla bla "This is bacon." Starts a new paragraph" | sed 's/"[^"]*" /nn&nn/g'
bla bla bla
"This is bacon."
Starts a new paragraph

使用GNU awk用于多字符RS和gensub():

$ awk -v RS='^$' -v ORS= '{$0=gensub(/s*("[^"]+")s*/,"nn\1nn","g"); gsub(/n+/,"nn")}1' file
Bacon ipsum dolor amet pastrami chuck venison swine, salami prosciutto shank pork belly. Filet mignon beef ribs ham hock, bacon ground round porchetta alcatra. Beef bacon biltong bresaola short loin filet mignon
"I want bacon."
chuck brisket landjaeger jerky prosciutto ham leberkas pork loin doner. Shoulder tongue meatball tail jerky pork loin filet
"I want bacon."
mignon shank chuck shankle flank pig. Short loin pork loin hamburger corned beef ribeye tri-tip doner ham hock landjaeger t-bone swine. Swine pork belly frankfurter, t-bone ham hock bacon pastrami. Biltong beef chuck ham hock pork loin shoulder strip
"I want bacon."
steak short loin tail cupim rump alcatra.Shoulder beef cupim rump ground round. Beef sirloin cupim meatball ham ribeye.
"I want bacon."
Venison tail ribeye, pastrami tongue pig beef ribs kielbasa bresaola doner. Shankle filet mignon pig, shoulder ball tip pork belly jowl sausage fatback boudin. Prosciutto venison capicola bacon, short loin andouille salami shank tongue corned beef. Sirloin biltong boudin tenderloin brisket tri-tip pancetta kielbasa strip steak leberkas short ribs flank filet mignon ham hock pork. Tri-tip cupim
"I want bacon."
"I want bacon."

相关内容

最新更新