我想将一个字符串从文本文件(file1(添加到第二个文本文件(file2(。文件1中的字符串应在每个大于符号>
之后按顺序添加到文件2中。文件2中有9个大于符号,文件1中有9个字符串。File1在第1列的第1-9行包含9个不同的字符串。像这样:
...
sctC_
sctJ_
sctV_
...
这是sed的while循环,我试图将字符串添加到文件2:中
while IFS=$'t' read f1 f2 ; do sed "s/^>/&$f2/" ; done < <(paste file2 file1)
但是,只有第一个字符串被添加到文件2中,并且第一行被从文件2中剥离:
MRNVLYAFLLTLYRGFCWSTVLLGMLPMAHAVTPPEWNKGAYAYSAEQTLLSTILIDFANSHGVELVMDN sctJ_
LKDTLVEAKIRAETPAAFLDRLALEHRFQWFVYNHTLYVSSQDTQASIRLEISPDAAPDLKQALSGIGLL sctV_
DPRFGWGELPEEGVVLVTGPQTYIDLIRNFSQQREKQDERRKVMIFPLRFASVSDRTLQYRDQRIVIPGV sctN_
ATILSELMDGQRPPPTGASGPTDAVPDSAMEAMRENTRAMLTRLATRNNPARSTDENGRLVLNGRISADV sctQ_
RNNALLVRDDEKRREEYQQLVEQIDVPQNLVNIDAIILDVDRTALSRLEANWQGTLGNVSAGSTMMMGRS sctR_
TLFVSDFKRFFADIQALEGEGTASIVANPSVLTLENQPAIVDFSRTAFITATGERVAQIQPITAGTSLQV sctS_
TPRVVGQDGPRSIQLVIDIEDGRVETGRDGEATGVKRGTVSTQALIGENRALVLGGFHVEESGDRDHRIP sctT_
LLGDIPWLGRLFTSTRHEVSRRERLFILTPHLIGDQTDPTRYVSAENRHQINDVMNRVSQRNGKHDLYSL sctU_
VENALRDLAGKQLPAGFQSETRGTRLSEVCRSQPGLVYDSNRYQWYGNGSIRLTVGVVRNSGTRIQRFDE
SVCGSNRTLAVAAWPKTTLAPGESTEVFLALQTLSSTAPPRRSLLASY
>sctC_12a_02741 hypothetical protein
MKTDLRALFLLLSLLLMGCGDPIELNRGLSENDANEVIAALGRYQIAAEKRVDKTGVTLIIDAKNMERAV
NILNAAGLPRQSRTNLGEVFQKSGVISTPLEERARYIYALSQEVEATLTQIDGVLVARVHVVLPERIAPG
EPVQPASAAVFIKYQPELEPDSVEPRIRRMVASSIPGLSGKNDKDLSIVFVPAEPYQDTIPVVTLGPFTL
TPQEMVRWQWTAGLMGALIIGLLAWRLGKPYMRQWQQNRADARQQR
>sctC_12a_02750 Invasion protein InvA
MNLVIIWLNRIALSAMQRSEVVGAVIVMSIVFMMIIPLPTSLIDVLIAFNICVSSLLIVLAMYLPKPLAF
STFPAVLLLTTMFRLALSISTTRQILLQQDGGHIVEAFGNYVVGGNLAVGLVIFLILTVVNFLVITKGSE
RVAEVAARFTLDAMPGKQMSIDSDLRAGLIEAHQARQRRDNLAKESQLFGAMDGAMKFVKGDAIAGLVIV
FINMIGGFAIGVLQHGMSAADAMHVYSVLTIGDGLIAQIPALLISLTAGMIITRVSAEGQPLDANIGREI
AEQLTSQPKAWIISALGMFGFALLPGMPSMVFMVISLASFSSGVFQLWRIKQQGILTHSQAEADNQPAEQ
NGHQDLRRFNPTRAYLLQFHPSMQGNPATLSLVQHIRRLRNRLVYQFGMTLPSFDIEFSDRLDEDEFQFG
VYEIPYVKATFVTERLAVHRSSFDQGELEDAIAGSTLRDEADWLWVSPMHPLLEQETCPRWAAGELILMR
MENAIHRSGAQFIGLQETKSILTWLESEQPELAQELQRIMPLSRFAGVLQRLASERIPLRSVRPIAEALI
EIGQHERDVHALTDYVRLALKAQICHQYSQQNTLHVWLLTPETEELLRDSLRQTQNETFFALTQDYAATL
LGQLRRAFPPSLPSTGQILVAQDLRTPLRVLLQEEFHHVPVLSFSELESHLSINVLGRFDLYEENTPFSA
>sctC_12a_02752 Type III secretion ATP synthase HrcN
MQTQAAIDFPLMTRWFQQQRRRLSDFAPVDLKGRIIGISGILLECSLPRARIGDLCLVERQDGSQVMAEV
VGFSPRNTFLSALGALDGIAQGAAVAPLYQPHCIQVSDRLFGSVLDGFGRALEDGGESAFVQPGELHGNA
QPVLGDAPPPTARPRIATPLPTGLRAIDGLLTLGQGQRVGIFAGAGCGKTTLLAELARNTPCDAIVFGLI
GERGRELREFLDHELDDDLRRRTVLVCSTSDRSSMERARAAFTATAIAEAYRAAGKQVLLIIDSLTRFAR
AQREIGLALGEPQGRGGLPPSVYTLLPRLVERAGQTQTGAITALYSVLIEQDSMNDPVADEVRSLIDGHI
VLTRRLAEQGHYPAIDVLASLSRTMSNVVDDGHNRHAGAVRRLMAAYKQVEMLIRLGEYQSGHDALTDSA
VNAQQDITRFLRQAMRDPMAYDDIQQQLAEVSAHAP
如何从文件1中获取字符串,并在文件2的大于符号后递归添加?
谢谢,
JD-
我不确定我是否完全理解您的需求,但Perl应该很容易处理。将第一个文件读取到数组中,然后对第二个文件进行迭代,并使用数组添加缺失的信息。
perl -we 'push @s, scalar <> until eof;
chomp @s;
s/(?<=^>)/shift @s/e, print while <>;
' file1 file2
<>
是readline的较短版本,它在标量上下文中从文件中读取一行- 当文件用完时,eof返回true
- chomp从数组中删除尾随换行符
(?<=...)
是一个后备项,在这种情况下,它与行开头的>
之后的匹配- 替换运算符
s///
的/e
修饰符将替换求值为代码,移位从数组@s
中提取第一个元素