识别并替换给定文本文件中的选择性空间



我是sed及其功能的新手。我需要有选择地用"在文件中,文件的内容如下。"我不想替换里面的空间"但是所有其他空间都需要被替换。

文件内容

my data "this is my very first encounter with sed"  "valuable" - - "c l e a r"

使用的图案使用sed将空间替换为"-彭定康的//,/g'

实际输出

my,data,"this,is,my,very,first,encounter,with,sed",,"valuable",-,-,"c,l,e,a,r"

预期输出

my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"

下面的sed脚本带有来自bash的注释字符串:

<<<'my data "this is my very first encounter with sed"  "valuable" - - "c l e a r"' sed -E '
# Split input with each character on its own line
s/./&n/g;
# Add a newline on the end to separate output from input
s/$/n/;
# Each line has one character
# Add a leading character that stores "state"
# There are two states available - in quoting or not in quoting
# The state character is space when we are not in quotes
# The state character is double quote when we are in quotes
s/^/ /;
# For each character in input
:again; {
# Substitute a space that is not in quotes for a comma
s/^  / ,/
# When quotes is encountered and we are not in quotes
/^ "/{
# Change state to quotes
s//""/
b removed_quotes
} ; {
# When quotes is encountered and we are in quotes
# then we are no longer in quotes
s/^""/ "/
} ; : removed_quotes
# Preserve state as the first character
# Add the parsed character to the output on the end
# Preserve the rest
s/^(.)(.)n(.*)/132/;
# If end of input was not reached, then parse another character.
/^.n/!b again;
};
# Remove the leading state character with the newline
s///;
'

输出:

my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"

和oneliner,因为谁读到这些评论:

sed -E 's/./&n/g;s/$/n/;s/^/ /;:a;s/^  / ,/;/^ "/{s//""/;bq;};s/^""/ "/;:q;s/^(.)(.)n(.*)/132/;/^.n/!ba;s///'

我认为s命令替换字符串中的换行n是posix不需要的扩展。在解析时,可以使用另一个唯一字符而不是换行符来分隔输入。无论如何,我用GNU sed测试了它。

正如评论中所提到的,这更适合实际的CSV解析器,而不是试图使用正则表达式来拼凑一些东西,尤其是sed相当基本的正则表达式。

perl中使用有用的Text::AutoCSV模块的一行代码(通过您的操作系统包管理器或最喜欢的CPAN客户端安装(:

$ perl -MText::AutoCSV -e 'Text::AutoCSV->new(sep_char=>" ", out_sep_char=>",")->write' < input.txt
my,data,"this is my very first encounter with sed",,valuable,-,-,"c l e a r"

使用GNU awk for FPAT:

$ awk -v FPAT='[^ ]*|"[^"]+"' -v OFS=',' '{$1=$1} 1' file
my,data,"this is my very first encounter with sed",,"valuable",-,-,"c l e a r"

您的输入是CSV,其中CCD_;字符"而不是传统的";逗号";其中有问题的字符是空白的,而您只是试图将其转换为逗号分隔的CSV。请参阅What';使用awk高效解析CSV的最稳健方法是什么?有关以上操作以及使用awk解析CSV的更多信息。

awk 'BEGIN {RS=ORS="""} NR%2 {gsub(" ",",")} {print}' file
  • 在开始时,将双引号设置为记录分隔符
  • 对于奇数记录,即引号外的记录,请用逗号全局替换任何空格
  • 打印每条记录

这可能对你有用(GNU sed(:

sed -E ':a;s/^((("[^"]*")*[^" ]*)*) /1,/;ta' file

替换,由零个或多个双引号字符串组成的组,后跟零个或更多非空格字符零个或更长时间,后跟空格,组后跟逗号,重复直到失败。

最新更新