使用SED在bash中折断线路，带有正则表达式的问题

大家好，我的数据看起来像

  samplename 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 ...
  samplename2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 ...

我希望它看起来像这样：

  >samplename
  0 1 1 1 1 1 1 1 1 1 
  1 0 0 0 0 0 0 0 0 ...
  >samplename2 
  0 0 0 0 0 1 1 1 1 1 
  1 1 1 1 1 1 0 0 0 ...

[注意 - 每10位数字后显示一条线路；我实际上每200后想要它，但是我意识到表现出这样的行不会有帮助]。

。

我可以使用文本编辑器上的正则表达式进行操作，但是我想在bash中使用sed命令，因为我必须执行几次，并且每行需要200个字符。

我尝试了这个错误：

sed -e "s/(>w+)s([0-9]+)/1n2" < myfile > myfile2

sed：1：" s/（> w ） s（[0-9] ）/...

另外一张笔记 - 我在Mac上这样做；我知道Mac上的sed与gnu sed有点不同。如果您能够给我适用于Mac的解决方案，那将是很棒的。

预先感谢。

，在200个数字后添加了换行的请求，使用 awk。

。

echo "hello 1 2 3 4" | awk '{print ">"$1; for(i=2; i<=NF; i++) {printf("%d ",$i); if((i+1)%2 == 0) printf("n");}}

打印出

>hello
1 2 
3 4

如果您想在以hello开头的行上使用，则可以修改为

echo "hello 1 2 3 4" | awk '/^hello / {print ">"$1; for(i=2; =NF; i++) {printf("%d ",$i); if((i+1)%2 == 0) printf("n");}}

（/ /中的正则表达式说"仅在与此表达式匹配的行上执行此操作"。

您可以将语句if( (i + 1) % 2 == 0)修改为if( (i + 1) % 100 == 0 )，以在100位数字后获得newline ...我只是为2展示了它，因为打印输出更可读。

update 要使所有更清洁，请执行以下操作。

使用以下内容创建一个文件调用断开：（如果不想选择以" Hello"开头的行，请放出/^hello /；但请在代码周围留下{}，这很重要）。

/^hello/ { print ">"$1;
   for(i=2; i<=NF; i++)
   {
      printf("%d ",$i);
      if((i+1)%100 == 0) printf("n");
   }
   print "";
}

现在您可以发行命令

awk -f breakIt inputFile > outputFile

这说"使用breakIt的内容作为处理inputFile的命令，并将结果放入outputFile中"。

应该为您做好问题。

编辑以防万一您确实想要sed解决方案，这是一个不错的解决方案（我认为是这样）。将以下内容复制到称为sedSplit

的文件中

s/^([A-Za-z]+ )/>1
/g
s/([0-9 ]{10})/1
/g
s/$/
/g

这具有三个连续的sed命令；这些都是自己的行，但是由于它们插入了新线，因此实际上似乎占据了六行。

s/^                  - substitute, starting from the beginning of the line
([A-Za-z]+ )/        - substitute the first word (letters only) plus space, replacing with 
>1
/g                   - the literal '>', then the first match, then a newline, as often as needed (g)
s/([0-9] ]{10})/     - substitute 10 repetitions of [digit followed by space]
1
/g                   - replace with itself, followed by newline, as often as needed
s/$/
/g                   - replace the 'end of line' with a carriage return

您这样调用此SED脚本：

sed -E -f sedSplit < inputFile > outputFile

这使用

-E flag（使用扩展的正则表达式 - 不需要逃脱括号等）

-f flag（'从此文件获取指令'）

它使整个过程都变得更加干净 - 并为您提供了在Mac上要求的输出（即使额外的运输返回将组分开；如果您不想要的话，请遗漏最后两行）。

$ awk '{print ">" $1; for (i=2;i<=NF;i++) printf "%s%s", $i, ((i-1)%10 ? FS : RS)}' file
>samplename
0 1 1 1 1 1 1 1 1 1
1 0 0 0 0 0 0 0 0 ...
>samplename2
0 0 0 0 0 1 1 1 1 1
1 1 1 1 1 1 0 0 0 ...

fold是您的朋友：

sed 's/([^ ]*) /1n/' input | fold -w 100

普通bash：

while read -r name values; do
    printf ">%sn%sn" "$name" "$values"
done <<END
samplename 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 ...
samplename2 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 ...
END

>samplename
0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 ...
>samplename2
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 ...

假设采样名称不包含Whitespace

在双引号中，shell解释了后斜线。其中任何一个都应该起作用。

sed -e 's/(>w+)s([0-9]+)/1n2/' < myfile > myfile2
sed -e "s/\(>\w+\)\s\([0-9]+\)/\1\n\2/" < myfile > myfile2

ps，我添加了终止斜线。您有一个S/.../...而不是S/.../.../.../

ps，当我看着您的言论时，SED会抱怨没有结束。尝试一下。

sed -e 's/^(w+)s+/>1n/' < myfile > myfile2

Mac版本，具有200个字符限制（100个单位数和100个空格）

sed -Ee 's/^([a-zA-Z0-9]+) />1
/' | sed -Ee 's/(([0-9] ){99}[0-9]) /1
/g' < myfile > myfile2

首先将字符串与数字分开，第二个将线分开。

相关内容

最新更新

热门标签：