正则表达式替换字符



我在Go中创建了一个CSV文件,我必须在每个列中添加引号("),我添加了这些,但这次,CSV编程在注释中添加了额外的(双)引号(如果列中有逗号,)

我CSV

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""

我需要这样的CSV(没有双引号在评论栏">

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"

My Golang Code

RegContent := regexp.MustCompile(`",""[A-Za-z0-9]`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `","`)
fmt.Println("PLAY: ", newRegexp)
err = ioutil.WriteFile(path, []byte(newRegexp), 0)
if err != nil {
fmt.Println("error: ", err)
}

输出
"son likes this video, good job" //(Missing My)
"don't like this video, it may be better" //(Missing I)

您可以匹配最后一列,同时捕获外引号之间的所有内容,并在ReplaceAllString的replacement参数中使用反向引用来恢复该部分:

package main
import (
"fmt"
"regexp"
)
func main() {
CSV_Contents := `
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
`   
RegContent := regexp.MustCompile(`(?m),"("[^"]*(?:""[^"]*)*")"$`)
result := RegContent.ReplaceAllString(CSV_Contents, `,$1`)
fmt.Println(result)
}

查看Go演示,输出:

comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"

参见regex演示。细节:

  • (?m)-多行模式开启,$将匹配行尾
  • ,"-逗号和"
  • ("[^"]*(?:""[^"]*)*")-组1 ($1):",然后是"以外的任何零或多个字符,然后是零或多个""序列(如果在评论列中有转义引号,它们将保持完整),然后是零或多个非"字符,然后
  • "$-"在一行的末尾。

您可以使用ReplaceAllStringFunc()获得描述行为

f := func(s string) string {
return strings.ReplaceAll(s, `""`, `"`)
}
RegContent := regexp.MustCompile(`",""[^,].+""`)
newRegexp := RegContent.ReplaceAllStringFunc(CSV_Contents, f)
fmt.Println("PLAY: ", newRegexp)

https://play.golang.org/p/1NqTyN1hs1J

替换为ReplaceAllString():

RegContent := regexp.MustCompile(`,""([^,].+)""`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `,"$1"`)
fmt.Println("PLAY: ", newRegexp)

https://play.golang.org/p/tY8zGWTbLLB

相关内容

  • 没有找到相关文章

最新更新