我正在尝试找到使用Powershell将几行特定文本移动到其上方文本末尾的最佳方法。它正在抓取CSV的内容并寻找有人在打字过程中点击返回键的错误。
以下是内容的外观,但有两个略有不同的问题。所有行的长度应为五列。您可以看到其中两条线在中间被拆分。一个末尾有双引号,而另一个没有。
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS
","WORDS","WORDS" <--Line should be moved to the end of the line above.
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS"
","WORDS","WORDS" <--Line should be moved to the end of the line above AND it needs to throw out one of the double quotes.
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
我已经发布了我用来整理下面的 CSV 的代码。第一行通过修剪确保任何单引号都切换到双引号,并且行尾没有空格。我们得到了许多格式奇怪的CSV,混合了单引号和双引号,以及一些行尾的大量空白。第二行应该找到以下模式(NEWLINE)","和"(NEWLINE)",并将每个模式替换为",",以便它正确尾随其上方的行后面。
(Get-Content $File).trim() -replace("','",'","') -replace("^'|'$", '"') | Set-Content $File
(Get-Content $File -Raw) -replace("`"[`r`n]`",`"", '","') -replace("[`r`n]`",`"", '","') | Set-Content $File
第一行代码本身运行良好。
第二行代码上的第二个替换似乎有效,只要我不运行它之前的第一行代码。这是一个问题,因为我需要确保在运行第二行代码之前修剪所有内容并使用双引号。
我还没有能够让第二行代码的第一个替换工作。我让任何东西工作的唯一方法是从每个双引号中转义并将换行符放在方括号中。有没有办法让所有这些正确协同工作?提前感谢您提供的任何帮助。
您的-replace
操作存在缺陷;请尝试以下操作:
$fileContent = @'
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS
","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS"
","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
'@
$fileContent -replace '(?:"|(.))r?n","', '$1","'
结果是:
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
正则表达式操作数,
'(?:"|(.))r?n","'
:(?:"|([^"]))
是一个非捕获组((?:...)
),它匹配单个"
或(|
)任何其他(非换行)字符(.
),包含在嵌套的捕获(捕获)组((...)
)中。r?n
匹配CRLF(rn
)或仅LF换行符(n
),通过可选(?
)匹配r
。- 作为优化,如果您知道仅存在 CRLF 序列,则可以使用
rn
;如果您知道仅存在 LF 换行符,则可以使用n
。
- 作为优化,如果您知道仅存在 CRLF 序列,则可以使用
","
按原样(逐字)匹配该字符串。
替换操作数,
'$1","'
:$1
是指第一个(也是唯一一个)捕获组匹配的内容 - 如果该行以"
结尾,则什么都没有,否则该行的最后一个字符;通过逐字","
,换行符被有效地删除。
至于你尝试过的:
假设你的文件有CRLF换行符,你的-replace
操作的问题是子表达式"[`r`n]"
:它只匹配[...]
内字符集中的单个字符,即CR("`r"
)或LF("`n"
)。
请注意,上面的解决方案分别使用 CR 和 LF 的正则表达式转义序列r
和n
,这允许使用单引号字符串 ('...'
) 作为正则表达式操作数,这可以防止混淆 PowerShell 的字符串插值预先解释的内容与正则表达式引擎最终看到的内容。
在您的示例中,此解决方案有效。
$lines = @'
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS
","WORDS","WORDS" <--Line should be moved to the end of the line above.
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS"
","WORDS","WORDS" <--Line should be moved to the end of the line above AND it needs to throw out one of the double quotes.
"WORDS","WORDS","WORDS","WORDS","WORDS"
"WORDS","WORDS","WORDS","WORDS","WORDS"
'@ -split "`r`n"
for ($i = 0; $i -lt $lines.Count; $i++){
if (($lines[$i] -split '","').Count -ne 5){
if ($lines[$i].StartsWith('",')){
$lines[$i-1].TrimEnd('"') + '"' + $lines[$i].TrimStart('"')
}
}
else{
$lines[$i]
}
}