用于单词重复的正则表达式

我需要一个sed的正则表达式（请只使用sed），它可以帮助我判断某个单词是否在一个单词中出现3次，所以打印此行。。。

假设这是文件：

abc abc gh abc
abcabc abc
 ab ab cd ab xx ab
ababab cc ababab
abab abab cd abab

因此输出为：

P1 F1

abc abc gh abc
 ab ab cd ab xx ab
abab abab cd abab

这就是我尝试的

sed -n '/([^ ]+)[ ]+111/p' $1

它不起作用…：/我做错了什么？？

单词是否在开头并不重要，它们不需要以序列的形式出现

您需要在1 之间添加.*

$ sed -n '/b([^ ]+)b.*b1b.*b1b/p' file
abc abc gh abc
 ab ab cd ab xx ab
abab abab cd abab

我假设您的输入只包含空格和单词字符。

我知道它要求sed，但我看到的所有使用sed的系统都有awk，所以这里有一个awk解决方案：

awk -F"[^[:alnum:]]" '{delete a;for (i=1;i<=NF;i++) a[$i]++;for (i in a) if (a[i]>2) {print $0;next}}' file
abc abc gh abc
 ab ab cd ab xx ab
abab abab cd abab

与regex解决方案相比，这可能更容易理解。

awk -F"[^[:alnum:]]" # Set field separator to anything other than alpha and numerics characters.
'{
delete a            # Delete array "a"
for (i=1;i<=NF;i++) # Loop trough one by one word
    a[$i]++         # Store number of hits of word in array "a"
for (i in a)        # Loop trough the array "a"
    if (a[i]>2) {   # If one word is found more than two times:
        print $0    # Print the line
        next        # Skip to next line, so its not printed double if other word is found three times
    }
}' file             # Read the file

相关内容

最新更新

热门标签：