如何删除包含特定字符串的所有行,但仅当后面的字符是CJK字符时



我需要从包含read (symbol)匹配项的文件中删除所有行,其中(symbol)是任何CJK字符。然而,在匹配中,如果read (symbol)前面紧跟A-Z或A-Z,则不应删除该行。例如,以下是一些样本线和结果:

Do you like to read books? (not deleted)
Can you read 书? ( deleted)
.read 书. (deleted)
This is some thread 线. (not deleted)

如何仅删除与(not A-Z or a-z)read (CJK symbol)匹配的行?

我不完全确定如何匹配CJK字符,但如果您匹配非ASCII字符,可能会获得您想要的结果:

grep -vP "[^A-Za-z]read [x80-xFF]" file.txt

理论上,你应该能够做到:

grep -vP "[^A-Za-z]read [x{2E80}-x{9FBB}]+" file.txt

然而,在我的测试中,我得到了错误:

grep: character value in x{...} sequence is too large

http://en.wikipedia.org/wiki/List_of_Unicode_characters#CJK_unified_ideographs

编辑:

LC_ALL="POSIX" sed -r '/[^A-Za-z]read [o200-o377]+/d' file.txt

结果:

Do you like to read books? (not deleted)
This is some thread 线. (not deleted)

另请参阅:

如何删除所有出现在特定符号后的CJK文本?

awk '$0~/ read [a-zA-Z]+/' your_file

最新更新