使用正则表达式搜索并替换bash



我已经看过此示例:

hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//[0-9]/}

遵循此语法: ${variable//pattern/replacement}

不幸的是,pattern字段似乎不支持完整的正则语法(例如,如果我使用.s,它会尝试匹配字面字符)。

如何使用完整的正则语法搜索/替换字符串?

使用sed:

MYVAR=ho02123ware38384you443d34o3434ingtod38384day
echo "$MYVAR" | sed -e 's/[a-zA-Z]/X/g' -e 's/[0-9]/N/g'
# prints XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

请注意,随后的-e s按顺序处理。另外,该表达式的g标志将匹配输入中的所有出现。

您还可以使用此方法选择自己喜欢的工具,即Perl,Awk,例如:

echo "$MYVAR" | perl -pe 's/[a-zA-Z]/X/g and s/[0-9]/N/g'

这可能使您可以进行更多的创意匹配...例如,在上面的剪辑中,除非第一个表达式上有匹配(由于懒惰的and评估),否则不使用数字替换。当然,您有Perl的全部语言支持来进行竞标...

这实际上可以在纯bash中完成:

hello=ho02123ware38384you443d34o3434ingtod38384day
re='(.*)[0-9]+(.*)'
while [[ $hello =~ $re ]]; do
  hello=${BASH_REMATCH[1]}${BASH_REMATCH[2]}
done
echo "$hello"

...屈服...

howareyoudoingtodday

这些示例在bash中也无需使用sed:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[a-zA-Z]/X} 
echo ${MYVAR//[0-9]/N}

您也可以使用字符类表达式

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day
MYVAR=${MYVAR//[[:alpha:]]/X} 
echo ${MYVAR//[[:digit:]]/N}

输出

XXNNNNNXXXXNNNNNXXXNNNXNNXNNNNXXXXXXNNNNNXXX

@lanaru想知道的是,如果我正确理解这个问题,那就是为什么"完整"或PCRE扩展sSwWdD等不在PHP Ruby Python等中所支持的等等。这些扩展来自Perl兼容的正则表达式(PCRE),并且可能与其他形式的基于壳的正则表达式不兼容。

这些不起作用:

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo ${hello//d/}

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | sed 's/d//g'

输出所有字面的" D"字符已删除

ho02123ware38384you44334o3434ingto38384ay

但是以下确实可以按预期工作

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello | perl -pe 's/d//g'

输出

howareyoudoingtodday

希望可以澄清更多内容,但是如果您不混淆,为什么不在启用reg_enhanced标志的Mac OS X上尝试此操作:

#!/bin/bash
MYVAR=ho02123ware38384you443d34o3434ingtod38384day;
echo $MYVAR | grep -o -E 'd'

在 *nix的大多数口味上,您只会看到以下输出:

d
d
d

njoy!

如果您要进行重复调用并且与性能有关,则该测试表明,BASH方法比向SED分叉快〜15倍。

hello=123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X123456789X
P1=$(date +%s)
for i in {1..10000}
do
   echo $hello | sed s/X//g > /dev/null
done
P2=$(date +%s)
echo $[$P2-$P1]
for i in {1..10000}
do
   echo ${hello//X/} > /dev/null
done
P3=$(date +%s)
echo $[$P3-$P2]

使用[[:digit:]](请注意双括号)作为模式:

$ hello=ho02123ware38384you443d34o3434ingtod38384day
$ echo ${hello//[[:digit:]]/}
howareyoudoingtodday

只想总结答案(尤其是 @nickl-'s https://stackoverflow.com/a/a/22226134/2916086)。

我知道这是一个古老的线程,但这是我在Google上的第一次命中,我想分享我组合在一起的以下resub,这增加了支持多个$ 1,$ 2的支持等等...

#!/usr/bin/env bash
############################################
###  resub - regex substitution in bash  ###
############################################
resub() {
    local match="$1" subst="$2" tmp
    if [[ -z $match ]]; then
        echo "Usage: echo "some text" | resub '(.*) (.*)' '$2 me ${1}time'" >&2
        return 1
    fi
    ### First, convert "$1" to "$BASH_REMATCH[1]" and 'single-quote' for later eval-ing...
    ### Utility function to 'single-quote' a list of strings
    squot() { local a=(); for i in "$@"; do a+=( $(echo '${i//'/'"'"'}' )); done; echo "${a[@]}"; }
    tmp=""
    while [[ $subst =~ (.*)${([0-9]+)}(.*) ]] || [[ $subst =~ (.*)$([0-9]+)(.*) ]]; do
        tmp="${BASH_REMATCH[${BASH_REMATCH[2]}]}$(squot "${BASH_REMATCH[3]}")${tmp}"
        subst="${BASH_REMATCH[1]}"
    done
    subst="$(squot "${subst}")${tmp}"
    ### Now start (globally) substituting
    tmp=""
    while read line; do
        counter=0
        while [[ $line =~ $match(.*) ]]; do
            eval tmp='"${tmp}${line%${BASH_REMATCH[0]}}"'"${subst}"
            line="${BASH_REMATCH[$(( ${#BASH_REMATCH[@]} - 1 ))]}"
        done
        echo "${tmp}${line}"
    done
}
resub "$@"
##################
###  EXAMPLES  ###
##################
###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub quick slow
###    The slow brown fox jumps slowly over the lazy dog
###  % echo "The quick brown fox jumps quickly over the lazy dog" | resub 'quick ([^ ]+) fox' 'slow $1 sheep'
###    The slow brown sheep jumps quickly over the lazy dog
###  % animal="sheep"
###  % echo "The quick brown fox 'jumps' quickly over the "lazy" $dog" | resub 'quick ([^ ]+) fox' ""$low" ${1} '$animal'"
###    The "$low" brown 'sheep' 'jumps' quickly over the "lazy" $dog
###  % echo "one two three four five" | resub "one ([^ ]+) three ([^ ]+) five" 'one $2 three $1 five'
###    one four three two five
###  % echo "one two one four five" | resub "one ([^ ]+) " 'XXX $1 '
###    XXX two XXX four five
###  % echo "one two three four five one six three seven eight" | resub "one ([^ ]+) three ([^ ]+) " 'XXX $1 YYY $2 '
###    XXX two YYY four five XXX six YYY seven eight

h/t to @charles duffy re: (.*)$match(.*)

设置var

hello=ho02123ware38384you443d34o3434ingtod38384day

然后,在VAR

上回声以替换正则替换
echo ${hello//[[:digit:]]/}

这将打印:

howareyoudoingtodday

额外 - 如果您需要相反的(要获得数字字符)

echo ${hello//[![:digit:]]/}

这将打印:

021233838444334343438384

输入hello ugly world中的此示例它搜索正则 bad|ugly并用nice

替换它
#!/bin/bash
# THIS FUNCTION NEEDS THREE PARAMETERS
# arg1 = input              Example:  hello ugly world
# arg2 = search regex       Example:  bad|ugly
# arg3 = replace            Example:  nice
function regex_replace()
{
  # $1 = hello ugly world
  # $2 = bad|ugly
  # $3 = nice
  # REGEX
  re="(.*?)($2)(.*)"
  if [[ $1 =~ $re ]]; then
    # if there is a match
    
    # ${BASH_REMATCH[0]} = hello ugly world
    # ${BASH_REMATCH[1]} = hello 
    # ${BASH_REMATCH[2]} = ugly
    # ${BASH_REMATCH[3]} = world    
    # hello + nice + world
    echo ${BASH_REMATCH[1]}$3${BASH_REMATCH[3]}
  else    
    # if no match return original input  hello ugly world
    echo "$1"
  fi    
}
# prints 'hello nice world'
regex_replace 'hello ugly world' 'bad|ugly' 'nice'
# to save output to a variable
x=$(regex_replace 'hello ugly world' 'bad|ugly' 'nice')
echo "output of replacement is: $x"
exit

您可以使用python。这将不是有效的,但是通过更灵活的语法完成工作。

申请文件

以下Pythonscript将替换为"从"(但不是" not from")与"。

Regex_replace.py

import sys
import re
for line in sys.stdin:
    line = re.sub(r'(?<!not)FROM', 'TO', line)
    sys.stdout.write(line)

您可以将其应用于文本文件,例如

$ cat test.txt
bla notFROM
FROM FROM
bla bla
FROM bla
bla  notFROM FROM
bla FROM
bla bla

$ cat test.txt | python regex_replace.py
bla notFROM
TO TO
bla bla
TO bla
bla  notFROM TO
bla TO
bla bla

申请变量

#!/bin/bash
hello=ho02123ware38384you443d34o3434ingtod38384day
echo $hello
PYTHON_CODE=$(cat <<END
import sys
import re
for line in sys.stdin:
    line = re.sub(r'[0-9]', '', line)
    sys.stdout.write(line)
END
)
echo $hello | python -c "$PYTHON_CODE"

输出

ho02123ware38384you443d34o3434ingtod38384day
howareyoudoingtodday

相关内容

  • 没有找到相关文章

最新更新