我有下面的示例csv文件。
- 需要删除第一列(主机(中.com后的最后一个单词。 如果第一列(主机(中
- 有任何数字,则该数字应打印在第二列(端口(中。
- 删除 http//和 https://
- 完整文件几乎是12 KB,这里我附上了示例。
示例 csv 文件
Host Port
https://abcd03.face.op.api.example.com/v1/authent/token?grant_type,443
https://defghu04.core.op.api.example.com/hello1/v4/tokens,443
https://abcdo3.xyz.def.tata.com/v1/xyz/accesstoken?grant_type,443
https://abcdef.clever.api.sell.com/samsung/v1/managements/autoPayments,443
https://abcdefe.orsd.api.ssample.com/auth/v1/customer-management/interacting,443
http://century.test.ext.sample.com:6102/ABC1/Genereate/CreditSale,80
http://century.test.ext.extra.com:6102/ABC2/proxy/sales,80
http://century.test.ext.sell.com:6550/commerce/1.x/transactionProcessor,80
https://century.test.ext.basic.com:6446/tokenize,443
https://sell.test.ext.state.com:6446/transfer,443
https://century.test.ext.sell.com:6446/delete,443
预期成果:
abcd03.face.op.api.example.com,443
defghu04.core.op.api.example.com,443
abcdo3.xyz.def.tata.com,443
abcdef.clever.api.sell.com,443
abcdefe.orsd.api.ssample.com,443
century.test.ext.sample.com,6102
century.test.ext.extra.com,6102
century.test.ext.sell.com,6550
century.test.ext.basic.com,6446
sell.test.ext.state.com,6446
century.test.ext.sell.com,6446
提前感谢您的帮助。
考虑到您的实际Input_file将与显示的示例相同,请您尝试以下操作。
awk '
BEGIN{
FS=OFS=","
}
match($0,///.*.com:[0-9]+/){
val=substr($0,RSTART+2,RLENGTH-2)
sub(/:/,",",val)
print val
next
}
match($0,///.*.com[^/]*/){
val=substr($0,RSTART+2,RLENGTH-2)
print val,$NF
}
' Input_file
说明:为上述添加详细说明。
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS=OFS="," ##Setting FS and OFS as comma here.
}
match($0,///.*.com:[0-9]+/){ ##Matching from // to till .com digits then if match found then do following.
val=substr($0,RSTART+2,RLENGTH-2) ##Creating val which has sub-string of matched value above.
sub(/:/,",",val) ##Substituting colon with comma here in val.
print val ##Printing val here.
next ##next will skip all further statements.
}
match($0,///.*.com[^/]*/){ ##Matching from // to .com here, followed by /
val=substr($0,RSTART+2,RLENGTH-2) ##Creating val which has sub-string of current line.
print val,$NF ##Printing val and last field here.
}
' Input_file ##Mentioning Input_file name here.
bash
请您尝试:
while IFS=, read -r url port; do
if [[ $url =~ https?://([^/:]+)(:([0-9]+))? ]]; then
[[ -n ${BASH_REMATCH[3]} ]] && port="${BASH_REMATCH[3]}"
# if the port number is included in the url, replace the 2nd field with it
echo "${BASH_REMATCH[1]},$port"
fi
done < file.csv
输出:
abcd03.face.op.api.example.com,443
defghu04.core.op.api.example.com,443
abcdo3.xyz.def.tata.com,443
abcdef.clever.api.sell.com,443
abcdefe.orsd.api.ssample.com,443
century.test.ext.sample.com,6102
century.test.ext.extra.com,6102
century.test.ext.sell.com,6550
century.test.ext.basic.com,6446
sell.test.ext.state.com,6446
century.test.ext.sell.com,6446
这可能对你有用(GNU sed(:
sed -E 's#^https?://##;s#/[^,]*##;s/:([^,]*).*/,1/' file
取下前绳。
删除中间字符串。
如果端口已存在,请删除第二列。
有关演示,请参见此处。
另类:
sed -E 's#^https?://(([^:]*):([^/]*).*(,).*|([^/]*)/.*(,.*))#24356#' file