如何根据特定条件提取少数条目



我有以下格式的数据。

>ab:xy_a0by98-2 Movie= top gun actor= Tom Genere=Action Length=234 Credits=30 pe=1 summry=(Tom|action|234)
Top Gun is a 1986 American action drama film directed by Tony Scott, and produced by Don Simpson and Jerry Bruckheimer
>ab:xy_b0ha81-5 Movie= Thor actor= chris hemsworth Genere=Action Length=321 Credits=20 pe=0 summry=(chris|Action|321)
Thor embarks on a journey unlike anything he's ever faced a quest for inner peace
>ab:xy_c0ma65-1 Movie= Batman actor= Bale Genere=Action Length=251 Credits=30 pe=1 summry=(Bale|Action|251)
From American Psycho to Batman Begins to Vice, Christian Bale is a bonafide A-list star
But he missed out on plenty of huge roles along the way.
>ab:xy_d0fc78-2 Movie= Joker actor= Phoenix Genere=thriller Length=341 Credits=35 pe=2 summry=(phoenix|thriller|341)
Joker is a 2019 American psychological thriller film directed and produced by Todd Phillips
who co-wrote the screenplay with Scott Silver
>ab:xy_e0ra81-2 Movie= Superman actor= henry cavill Genere=Action Length=254 Credits=28 pe=1 summry=(cavill|action|254)
Henry William Dalgliesh Cavill is a British actor
He is known for his portrayal of Charles Brandon in Showtime's The Tudors

我想提取所有包含pe=1的条目,每个条目以>符号开头,如下所示:

>ab:xy_a0by98-2 Movie= top gun actor= Tom Genere=Action Length=234 Credits=30 pe=1 summry=(Tom|action|234)
Top Gun is a 1986 American action drama film directed by Tony Scott, and produced by Don Simpson and Jerry Bruckheimer
>ab:xy_c0ma65-1 Movie= Batman actor= Bale Genere=Action Length=251 Credits=30 pe=1 summry=(Bale|Action|251)
From American Psycho to Batman Begins to Vice, Christian Bale is a bonafide A-list star
But he missed out on plenty of huge roles along the way.
>ab:xy_e0ra81-2 Movie= Superman actor= henry cavill Genere=Action Length=254 Credits=28 pe=1 summry=(cavill|action|254)
Henry William Dalgliesh Cavill is a British actor
He is known for his portrayal of Charles Brandon in Showtime's The Tudors

并将表中的几个值格式化为:

Name            Length
ab:xy_a0by98-2  234
ab:xy_c0ma65-1  251
ab:xy_e0ra81-2  254

我试过grep "pe=1" input.txt > output.txt。但它只引出了第一行,而不是描述。感谢任何帮助。。。

这个sed命令应该完成以下工作:

sed -n 's/^>([^[:blank:]]*).*\Length=([0-9]*).*\pe=1.*/1 2/p' file

第一个解决方案(使用GNUawk(: 使用您显示的示例,请在awk代码中尝试以下操作。用GNUawk编写和测试。简单的解释是,检查行是否以>开头,并使用match函数与正则表达式\Length=([0-9]+)进行pe=1and运算并将其匹配值放入一个捕获组中,放入名为CCD_。如果这两个条件都为真,那么;打印阵列排列的第一个字段后的第一个项目

awk '/^>.*\pe=1 / && match($0,/\Length=([0-9]+)/,arr){print $1,arr[1]}' Input_file


第二个解决方案(使用任何awk(:使用任何版本的awk,请尝试以下代码,对第一个解决方案稍作调整。

awk '
/^>.*\pe=1 / && match($0,/\Length=[0-9]+/){
val=substr($0,RSTART,RLENGTH)
sub(/.*=/,"",val)
print $1,val
}
' Input_file

相关内容

  • 没有找到相关文章

最新更新