我在Go中创建了一个CSV文件,我必须在每个列中添加引号("),我添加了这些,但这次,CSV编程在注释列中添加了额外的(双)引号(如果列中有逗号,)
我CSV
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
我需要这样的CSV(没有双引号在评论栏">
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"
My Golang Code
RegContent := regexp.MustCompile(`",""[A-Za-z0-9]`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `","`)
fmt.Println("PLAY: ", newRegexp)
err = ioutil.WriteFile(path, []byte(newRegexp), 0)
if err != nil {
fmt.Println("error: ", err)
}
输出"son likes this video, good job" //(Missing My)
"don't like this video, it may be better" //(Missing I)
您可以匹配最后一列,同时捕获外引号之间的所有内容,并在ReplaceAllString
的replacement参数中使用反向引用来恢复该部分:
package main
import (
"fmt"
"regexp"
)
func main() {
CSV_Contents := `
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30",""My son likes this video, good job""
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32",""I don't like this video, it may be better""
`
RegContent := regexp.MustCompile(`(?m),"("[^"]*(?:""[^"]*)*")"$`)
result := RegContent.ReplaceAllString(CSV_Contents, `,$1`)
fmt.Println(result)
}
查看Go演示,输出:
comment_ID","post_ID","product_SKU","comment_author","author_mail","author_location","date","comment"
"100","60574","VID17","Jordi","","","2021-06-02 16:20:30","My son likes this video, good job"
"101","60574","VID18","Scarlett,"","","2020-12-29 23:06:32","I don't like this video, it may be better"
参见regex演示。细节:
(?m)
-多行模式开启,$
将匹配行尾,"
-逗号和"
("[^"]*(?:""[^"]*)*")
-组1 ($1
):"
,然后是"
以外的任何零或多个字符,然后是零或多个""
序列(如果在评论列中有转义引号,它们将保持完整),然后是零或多个非"
字符,然后"$
-"
在一行的末尾。
您可以使用ReplaceAllStringFunc()获得描述行为
f := func(s string) string {
return strings.ReplaceAll(s, `""`, `"`)
}
RegContent := regexp.MustCompile(`",""[^,].+""`)
newRegexp := RegContent.ReplaceAllStringFunc(CSV_Contents, f)
fmt.Println("PLAY: ", newRegexp)
https://play.golang.org/p/1NqTyN1hs1J
替换为ReplaceAllString():
RegContent := regexp.MustCompile(`,""([^,].+)""`)
newRegexp := RegContent.ReplaceAllString(CSV_Contents, `,"$1"`)
fmt.Println("PLAY: ", newRegexp)
https://play.golang.org/p/tY8zGWTbLLB