如何为split创建一个只使用awk并接受任何字符串作为输入和分隔符的函数



该建议是一个只使用awk来分割字符串的函数,该函数接受任何字符串作为分隔符,并接受任何字符串为输入。

如何为split创建一个只使用awk并接受任何字符串作为输入和分隔符的函数?

有很多关于使用bash命令进行字符串分割的建议(参见本例(,但所有这些建议都只在特定情况下有效,而不是根据我们的建议。

我们决定以我们的代码为例,但尽管它功能齐全,但我们认为有几点可以改进/调整/纠正。

示例函数(f_split(

F_PRESERVE_BLANK_LINES_R=""
f_preserve_blank_lines() {
: 'Remove "single quotes" used to prevent blank lines being erroneously removed.
The "single quotes" are used at the beginning and end of the strings to prevent
blank lines with no other characters in the sequence being erroneously removed.
We do not know the reason for this side effect. This problem occurs, for example,
in commands that involve "awk".
Args:
STR_TO_TREAT_P (str): String to be treated.
Returns:
F_PRESERVE_BLANK_LINES_R (str): String treated.
'
F_PRESERVE_BLANK_LINES_R=""
STR_TO_TREAT_P=$1
STR_TO_TREAT_P=${STR_TO_TREAT_P%?}
F_PRESERVE_BLANK_LINES_R=${STR_TO_TREAT_P#?}
}
F_SPLIT_R=()
f_split() {
: 'It does a "split" into a given string and returns an array.
Args:
TARGET_P (str): Target string to "split".
DELIMITER_P (Optional[str]): Delimiter used to "split". If not informed the
split will be done by spaces.
Returns:
F_SPLIT_R (array): Array with the provided string separated by the informed
delimiter.
'
F_SPLIT_R=()
TARGET_P=$1
DELIMITER_P=$2
if [ -z "$DELIMITER_P" ] ; then
DELIMITER_P=" "
fi
REMOVE_N=1
if [ "$DELIMITER_P" == "n" ] ; then
REMOVE_N=0
fi
# PROBLEM: This was the only parameter that has been a problem so far... There are
# probably others. Maybe a scheme using "sed" would solve the problem...
if [ "$DELIMITER_P" == "./" ] ; then
DELIMITER_P="[.]/"
fi
if [ ${REMOVE_N} -eq 1 ] ; then
# PROBLEM: Due to certain limitations we have some problems getting the output
# of a split by awk inside an array and so we need to use "line break" (n)
# to succeed. Seen this, we remove the line breaks momentarily afterwards
# we reintegrate them. The problem is that if there is a line break in the
# "string" informed, this line break will be lost, that is, it is erroneously
# removed in the output...
TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")
fi
# PROBLEM: The replace of "n" by "3F2C417D448C46918289218B7337FCAF" results in
# more occurrences of "3F2C417D448C46918289218B7337FCAF" than the amount of "n"
# that there was originally in the string (one more occurrence at the end of
# the string). We can not explain the reason for this side effect. The line below
# corrects this problem...
TARGET_P=${TARGET_P%????????????????????????????????}
SPLIT_NOW=$(awk -F "$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%sn", $i}}' <<< "${TARGET_P}")
while IFS= read -r LINE_NOW ; do
if [ ${REMOVE_N} -eq 1 ] ; then
LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "n"); printf $0}' <<< "'${LINE_NOW}'")
# PROBLEM: It would be perfect if we didn't need to use the function below...
f_preserve_blank_lines "$LN_NOW_WITH_N"
LN_NOW_WITH_N="$F_PRESERVE_BLANK_LINES_R"
F_SPLIT_R+=("$LN_NOW_WITH_N")
else
F_SPLIT_R+=("$LINE_NOW")
fi
done <<< "$SPLIT_NOW"
}

用法

read -r -d '' FILE_CONTENT << 'HEREDOC'
BEGIN
15
It may also be helpful to note (though understandably you had no room to do so) that the -d option to readarray first appears in Bash 4.4. – 
fbicknel
Aug 18, 2017 at 15:57
4
Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,""); print }' ./  and eliminate that concatenation of the final ", " then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,""); print; }' <<<"$string") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readarray – 
dawg
Nov 26, 2017 at 22:28 
10
Wow, what a brilliant answer! Hee hee, my response: ditched the bash script and fired up python! – 
artfulrobot
May 14, 2018 at 11:32
11
I'd move your right answers up to the top, I had to scroll through a lot of rubbish to find out how to do it properly :-) – 
paxdiablo
Jan 9, 2020 at 12:31
44
This is exactly the kind of thing that will convince you to never code in bash. An astoundingly simple task that has 8 incorrect solutions. Btw, this is without a design constraint of, "Make it as obscure and finicky as possible"
END
HEREDOC
FILE_CONTENT="${FILE_CONTENT:6:-3}"
DELIMITER_P="int }' ./  and eliminate"
f_split "$FILE_CONTENT" "$DELIMITER_P"
LENGTH=${#F_SPLIT_R[*]}
for ((i=0;i<=$(($LENGTH-1));i++)); do
echo ">>>>>>>>>>"
echo "${F_SPLIT_R[$i]}"
echo "<<<<<<<<<<"
done

"通用的";用于分割字符串的函数,也就是说,它接受任何字符串作为分隔符,并接受任何字符串为输入("目标"(

一个零字节的解决方案,应该在大多数时间都能工作:

mysplit() {
mapfile -t -d '' "$2" < <(sed "s/$1/x00/g")
}
mysplit SEPARATOR result <<<"string SEPARATOR anotherstring SEPARATORstring"
declare -p result

输出:

declare -a result=([0]="string " [1]=" anotherstring " [2]=$'stringn')

用户必须意识到分隔符被传递给sed,这很好,允许使用regex和sed转义,或者您可以从Escape中添加代码作为sed替换模式的字符串。

下面的代码将mysplit与非常粗糙的逃离sed的模式进行了比较,并将呈现的f_split与呈现的输入进行了比较:

F_PRESERVE_BLANK_LINES_R=""
f_preserve_blank_lines() {
: 'Remove "single quotes" used to prevent blank lines being erroneously removed.
The "single quotes" are used at the beginning and end of the strings to prevent
blank lines with no other characters in the sequence being erroneously removed.
We do not know the reason for this side effect. This problem occurs, for example,
in commands that involve "awk".
Args:
STR_TO_TREAT_P (str): String to be treated.
Returns:
F_PRESERVE_BLANK_LINES_R (str): String treated.
'
F_PRESERVE_BLANK_LINES_R=""
STR_TO_TREAT_P=$1
STR_TO_TREAT_P=${STR_TO_TREAT_P%?}
F_PRESERVE_BLANK_LINES_R=${STR_TO_TREAT_P#?}
}
F_SPLIT_R=()
f_split() {
: 'It does a "split" into a given string and returns an array.
Args:
TARGET_P (str): Target string to "split".
DELIMITER_P (Optional[str]): Delimiter used to "split". If not informed the
split will be done by spaces.
Returns:
F_SPLIT_R (array): Array with the provided string separated by the informed
delimiter.
'
F_SPLIT_R=()
TARGET_P=$1
DELIMITER_P=$2
if [ -z "$DELIMITER_P" ] ; then
DELIMITER_P=" "
fi
REMOVE_N=1
if [ "$DELIMITER_P" == "n" ] ; then
REMOVE_N=0
fi
# PROBLEM: This was the only parameter that has been a problem so far... There are
# probably others. Maybe a scheme using "sed" would solve the problem...
if [ "$DELIMITER_P" == "./" ] ; then
DELIMITER_P="[.]/"
fi
if [ ${REMOVE_N} -eq 1 ] ; then
# PROBLEM: Due to certain limitations we have some problems getting the output
# of a split by awk inside an array and so we need to use "line break" (n)
# to succeed. Seen this, we remove the line breaks momentarily afterwards
# we reintegrate them. The problem is that if there is a line break in the
# "string" informed, this line break will be lost, that is, it is erroneously
# removed in the output...
TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")
fi
# PROBLEM: The replace of "n" by "3F2C417D448C46918289218B7337FCAF" results in
# more occurrences of "3F2C417D448C46918289218B7337FCAF" than the amount of "n"
# that there was originally in the string (one more occurrence at the end of
# the string). We can not explain the reason for this side effect. The line below
# corrects this problem...
TARGET_P=${TARGET_P%????????????????????????????????}
SPLIT_NOW=$(awk -F "$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%sn", $i}}' <<< "${TARGET_P}")
while IFS= read -r LINE_NOW ; do
if [ ${REMOVE_N} -eq 1 ] ; then
LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "n"); printf $0}' <<< "'${LINE_NOW}'")
# PROBLEM: It would be perfect if we didn't need to use the function below...
f_preserve_blank_lines "$LN_NOW_WITH_N"
LN_NOW_WITH_N="$F_PRESERVE_BLANK_LINES_R"
F_SPLIT_R+=("$LN_NOW_WITH_N")
else
F_SPLIT_R+=("$LINE_NOW")
fi
done <<< "$SPLIT_NOW"
}
read -r -d '' FILE_CONTENT << 'HEREDOC'
BEGIN
15
It may also be helpful to note (though understandably you had no room to do so) that the -d option to readarray first appears in Bash 4.4. – 
fbicknel
Aug 18, 2017 at 15:57
4
Great answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,""); print }' ./  and eliminate that concatenation of the final ", " then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,""); print; }' <<<"$string") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readarray – 
dawg
Nov 26, 2017 at 22:28 
10
Wow, what a brilliant answer! Hee hee, my response: ditched the bash script and fired up python! – 
artfulrobot
May 14, 2018 at 11:32
11
I'd move your right answers up to the top, I had to scroll through a lot of rubbish to find out how to do it properly :-) – 
paxdiablo
Jan 9, 2020 at 12:31
44
This is exactly the kind of thing that will convince you to never code in bash. An astoundingly simple task that has 8 incorrect solutions. Btw, this is without a design constraint of, "Make it as obscure and finicky as possible"
END
HEREDOC
FILE_CONTENT="${FILE_CONTENT:6:-3}"
DELIMITER_P="int }' ./  and eliminate"
f_split "$FILE_CONTENT" "$DELIMITER_P"
echo "f_split result:"
declare -p F_SPLIT_R

mysplit() {
mapfile -t -d '' "$2" < <(sed "s/$1/x00/g")
}
mysplit "$(printf "%s" "$DELIMITER_P" | sed 's~[./]~\&~g')" F_SPLIT_R2 < <(printf "%s" "$FILE_CONTENT")
echo "mysplit result:"
declare -p F_SPLIT_R2
echo "difference:"
diff <(printf "%sn" "${F_SPLIT_R[@]}") <(printf "%sn" "${F_SPLIT_R2[@]}") && echo none

输出:

f_split result:
declare -a F_SPLIT_R=([0]=$'15nnIt may also be helpful to note (though understandably you had no room to do so) that the -d option to readarray first appears in Bash 4.4. – nfbickneln Aug 18, 2017 at 15:57n4nnGreat answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,"\0"); pr' [1]=$' that concatenation of the final ", " then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,"\0"); print; }' <<<"$string") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readarray – ndawgn Nov 26, 2017 at 22:28 n10nnWow, what a brilliant answer! Hee hee, my response: ditched the bash script and fired up python! – nartfulrobotn May 14, 2018 at 11:32n11nnI'd move your right answers up to the top, I had to scroll through a lot of rubbish to find out how to do it properly :-) – npaxdiablon Jan 9, 2020 at 12:31n44nnThis is exactly the kind of thing that will convince you to never code in bash. An astoundingly simple task that has 8 incorrect solutions. Btw, this is without a design constraint of, "Make it as obscure and finicky as possible"n')
mysplit result:
declare -a F_SPLIT_R2=([0]=$'15nnIt may also be helpful to note (though understandably you had no room to do so) that the -d option to readarray first appears in Bash 4.4. – nfbickneln Aug 18, 2017 at 15:57n4nnGreat answer (+1). If you change your awk to awk '{ gsub(/,[ ]+|$/,"\0"); pr' [1]=$' that concatenation of the final ", " then you don't have to go through the gymnastics on eliminating the final record. So: readarray -td '' a < <(awk '{ gsub(/,[ ]+/,"\0"); print; }' <<<"$string") on Bash that supports readarray. Note your method is Bash 4.4+ I think because of the -d in readarray – ndawgn Nov 26, 2017 at 22:28 n10nnWow, what a brilliant answer! Hee hee, my response: ditched the bash script and fired up python! – nartfulrobotn May 14, 2018 at 11:32n11nnI'd move your right answers up to the top, I had to scroll through a lot of rubbish to find out how to do it properly :-) – npaxdiablon Jan 9, 2020 at 12:31n44nnThis is exactly the kind of thing that will convince you to never code in bash. An astoundingly simple task that has 8 incorrect solutions. Btw, this is without a design constraint of, "Make it as obscure and finicky as possible"n')
difference:
none

相关内容

最新更新