如何在awk中将camelCase字符串拆分为数组



如何使用split函数在awk中将camelBase字符串拆分为数组?

输入:

STRING="camelCasedExample"

期望结果:

WORDS[1]="camel"
WORDS[2]="Cased"
WORDS[3]="Example"

错误尝试:

split(STRING, WORDS, /([a-z])([A-Z])/);

错误结果:

WORDS[1]="came"
WORDS[2]="ase"
WORDS[3]="xample"

你不能单独使用split(),这就是为什么GNU awk有patsplit():

$ awk 'BEGIN {
patsplit("camelCasedExample",words,/(^|[[:upper:]])[[:lower:]]+/)
for ( i in words ) print words[i]
}'
camel
Cased
Example

使用您显示的示例,请尝试以下操作。在GNUawk中编写和测试应该在任何awk中工作。这将创建一个名为words的数组,其值可以从索引1、2、3开始访问,依此类推。我正在将其打印为输出,您以后也可以根据自己的意愿使用它。

awk -F'=|"' -v s1=""" '
{
gsub(/[A-Z]/,"n&",$3)
val=(val?val ORS:"")$3
}
END{
num=split(val,words,ORS)
for(i=1;i<=num;i++){
if(words[i]!=""){
print "WORDS[" ++count "]=" s1 words[i] s1
}
}
}
' Input_file

说明:添加对上述awk代码的详细说明。

awk -F'=|"' -v s1=""" '                     ##Starting awk program, setting field separator as = OR " and setting s1 to " here.
{
gsub(/[A-Z]/,"n&",$3)                     ##Using gsub to globally substitute captial letter with new character and value itself in 3rd field.
val=(val?val ORS:"") $3                    ##Creating val which has $3 in it and keep adding values in val itself.
}
END{                                         ##Starting END block of this program from here.
num=split(val,words,ORS)                     ##Splitting val into array arr with delmiter of ORS.
for(i=1;i<=num;i++){                       ##Running for loop from value of 1 to till num here.
if(words[i]!=""){                          ##Checking if arr item is NOT NULL then do following.
print "WORDS[" ++count "]=" s1 words[i] s1    ##Printing WORDS[ value of i followed by ]= followed by s1 words[i] value and s1.
}
}
}
'  Input_file                                ##Mentioning Input_file name here.

这里有一个适用于任何版本的awk:的awk解决方案

s='camelCasedExample'
awk '{
while (match($0, /(^|[[:upper:]])[[:lower:]]+/)) {
wrd = substr($0,RSTART,RLENGTH)
print wrd
# you can also store it in array
arr[++n] = wrd
$0 = substr($0,RSTART+RLENGTH)
}
}' <<< "$s"
camel
Cased
Example
echo 'camelCasedExample' | 
mawk '{ for (_=(____=split($((_=_<_) * gsub("[>-[]",
(___)"&")), __, ___) )^_; _<=____; _++) {
print "","__["(_)"]",__[_] } }' OFS=' :: ' FS='^$' ___='2022'

最新更新