我有以下格式的数据。
>ab:xy_a0by98-2 Movie= top gun actor= Tom Genere=Action Length=234 Credits=30 pe=1 summry=(Tom|action|234)
Top Gun is a 1986 American action drama film directed by Tony Scott, and produced by Don Simpson and Jerry Bruckheimer
>ab:xy_b0ha81-5 Movie= Thor actor= chris hemsworth Genere=Action Length=321 Credits=20 pe=0 summry=(chris|Action|321)
Thor embarks on a journey unlike anything he's ever faced a quest for inner peace
>ab:xy_c0ma65-1 Movie= Batman actor= Bale Genere=Action Length=251 Credits=30 pe=1 summry=(Bale|Action|251)
From American Psycho to Batman Begins to Vice, Christian Bale is a bonafide A-list star
But he missed out on plenty of huge roles along the way.
>ab:xy_d0fc78-2 Movie= Joker actor= Phoenix Genere=thriller Length=341 Credits=35 pe=2 summry=(phoenix|thriller|341)
Joker is a 2019 American psychological thriller film directed and produced by Todd Phillips
who co-wrote the screenplay with Scott Silver
>ab:xy_e0ra81-2 Movie= Superman actor= henry cavill Genere=Action Length=254 Credits=28 pe=1 summry=(cavill|action|254)
Henry William Dalgliesh Cavill is a British actor
He is known for his portrayal of Charles Brandon in Showtime's The Tudors
我想提取所有包含pe=1的条目,每个条目以>
符号开头,如下所示:
>ab:xy_a0by98-2 Movie= top gun actor= Tom Genere=Action Length=234 Credits=30 pe=1 summry=(Tom|action|234)
Top Gun is a 1986 American action drama film directed by Tony Scott, and produced by Don Simpson and Jerry Bruckheimer
>ab:xy_c0ma65-1 Movie= Batman actor= Bale Genere=Action Length=251 Credits=30 pe=1 summry=(Bale|Action|251)
From American Psycho to Batman Begins to Vice, Christian Bale is a bonafide A-list star
But he missed out on plenty of huge roles along the way.
>ab:xy_e0ra81-2 Movie= Superman actor= henry cavill Genere=Action Length=254 Credits=28 pe=1 summry=(cavill|action|254)
Henry William Dalgliesh Cavill is a British actor
He is known for his portrayal of Charles Brandon in Showtime's The Tudors
并将表中的几个值格式化为:
Name Length
ab:xy_a0by98-2 234
ab:xy_c0ma65-1 251
ab:xy_e0ra81-2 254
我试过grep "pe=1" input.txt > output.txt
。但它只引出了第一行,而不是描述。感谢任何帮助。。。
这个sed
命令应该完成以下工作:
sed -n 's/^>([^[:blank:]]*).*\Length=([0-9]*).*\pe=1.*/1 2/p' file
第一个解决方案(使用GNUawk
(:
使用您显示的示例,请在awk
代码中尝试以下操作。用GNUawk
编写和测试。简单的解释是,检查行是否以>
开头,并使用match
函数与正则表达式\Length=([0-9]+)
进行pe=1
and运算并将其匹配值放入一个捕获组中,放入名为CCD_。如果这两个条件都为真,那么;打印阵列排列的第一个字段后的第一个项目
awk '/^>.*\pe=1 / && match($0,/\Length=([0-9]+)/,arr){print $1,arr[1]}' Input_file
第二个解决方案(使用任何awk
(:使用任何版本的awk
,请尝试以下代码,对第一个解决方案稍作调整。
awk '
/^>.*\pe=1 / && match($0,/\Length=[0-9]+/){
val=substr($0,RSTART,RLENGTH)
sub(/.*=/,"",val)
print $1,val
}
' Input_file