Powershell-在不消耗大量内存的情况下获得两个文件的差异-是否有其他c#或c++API



我有两个大文件要比较(超过10 GB(。下面的命令适用于小文件,但似乎占用了我机器上的RAM空间。

如何在不消耗大量内存的情况下获得两个文件的差异?

任何想法都将不胜感激。

robocopy.exe C:Folder C:Folder /l /nocopy /is /e /fp /ns /nc /njh /njs /tee  /log:c:tempFolderList.txt
$path = 'C:Folder'
$pattern = [regex]::Escape($path)
$newContent = @()
Get-Content -Path "c:tempFolderList.txt" | ForEach-Object {$newContent += $_ -replace $pattern, ''}
Set-Content -Path "c:tempFolderList.txt" -Value $newContent
(Get-Content C:tempFolderList.txt).Trim() -ne '' | Set-Content C:tempFolderList.txt
robocopy.exe C:Folder2 C:Folder2 /l /nocopy /is /e /fp /ns /nc /njh /njs /tee  /log:c:tempFolderList2.txt
$path = 'C:Folder2'
$pattern = [regex]::Escape($path)
$newContent = @()
Get-Content -Path "c:tempFolderList2.txt" | ForEach-Object {$newContent += $_ -replace $pattern, ''}
Set-Content -Path "c:tempFolderList2.txt" -Value $newContent
(Get-Content C:tempFolderList2.txt).Trim() -ne '' | Set-Content C:tempFolderList2.txt
Compare-Object -ReferenceObject (Get-Content c:tempFolderList.txt) -DifferenceObject (Get-Content c:tempFolderList2.txt)

最后更新

Folderlist.txt

C:FolderData2Documents
C:FolderData2Documents1.txt
C:FolderData2Documents2.txt
C:FolderData2Documents3.txt
C:FolderData2Documents4.txt
C:FolderData2Documents5.txt

比较Log1.text

Data2Documents
C:FolderData2Documents
Data2Documents1.txt
C:FolderData2Documents1.txt
Data2Documents2.txt
C:FolderData2Documents2.txt
Data2Documents3.txt
C:FolderData2Documents3.txt
Data2Documents4.txt
C:FolderData2Documents4.txt
Data2Documents5.txt
C:FolderData2Documents5.txt

期望输出:

Data2Documents
Data2Documents1.txt
Data2Documents2.txt
Data2Documents3.txt
Data2Documents4.txt
Data2Documents5.txt

更新-2:

输出:

Data2Documents
C:FolderData2Documents
Data2Documents1.txt
C:FolderData2Documents1.txt
Data2Documents2.txt
C:FolderData2Documents2.txt
Data2Documents3.txt
C:FolderData2Documents3.txt
Data2Documents4.txt
C:FolderData2Documents4.txt
Data2Documents5.txt
C:FolderData2Documents5.txt

首先,使用+=向数组添加内容是一种已知的内存占用,因为数组有固定的长度,当您向其中添加新元素时,需要在内存中重建完整的数组。

因此,对于每个日志文件的替换和删除空行,我建议这样做:

robocopy.exe C:Folder C:Folder /l /nocopy /is /e /fp /ns /nc /njh /njs /tee  /log:c:tempFolderList.txt
robocopy.exe C:Folder2 C:Folder2 /l /nocopy /is /e /fp /ns /nc /njh /njs /tee  /log:c:tempFolderList2.txt
$path    = 'C:Folder'
$newFile = 'C:tempCompareLog_1.txt'  # have it create a new file instead of gathering all 10Gb in memory
$pattern = [regex]::Escape($path)
# use 'switch' to parse the log file line-by-line
# and write the processed lines to the new file.
# this will be lean on mmory, but takes a lot of disk write actions..
switch -Regex -File 'C:tempFolderList.txt' {
$pattern { Add-Content $newFile -Value ($_ -replace $pattern).Trim() }
default  { if ($_ -match 'S') { Add-Content $newFile -Value $_.Trim() }}  # non-empty or whitespace-only lines
}

对于第二个日志文件:

$path    = 'C:Folder2'
$newFile = 'C:tempCompareLog_2.txt'
$pattern = [regex]::Escape($path)
switch -Regex -File 'C:tempFolderList2.txt' {
$pattern { Add-Content $newFile -Value ($_ -replace $pattern).Trim() }
default  { if ($_ -match 'S') { Add-Content $newFile -Value $_.Trim() }}
}

接下来,您需要比较新文件CompareLog_1.txtCompareLog_2.txt,但我想这些文件可能仍然很大,因此我同意Zilog80最好使用专用软件。

根据您希望看到的结果,您也可以考虑使用旧的fc.exe,它工作速度快,不需要占用内存
类似的东西

fc.exe  /C /N 'C:tempCompareLog_1.txt' 'C:tempCompareLog_2.txt'

您可以不使用Add-Content,而是使用StreamWriter来加快要比较的文件的写入速度:(这将创建一个Utf8NoBOM编码的文件(

$path    = 'C:Folder'
$newFile = 'C:tempCompareLog_1.txt'
$writer  = [System.IO.StreamWriter]::new($newFile)
$pattern = [regex]::Escape($path)
switch -Regex -File 'C:tempFolderList.txt' {
$pattern { $writer.WriteLine(($_ -replace $pattern).Trim()) }
default  { if ($_ -match 'S') { $writer.WriteLine($_.Trim()) }}
}
# clean up
$writer.Flush()
$writer.Dispose()
$path    = 'C:Folder2'
$newFile = 'C:tempCompareLog_2.txt'
$writer  = [System.IO.StreamWriter]::new($newFile)
$pattern = [regex]::Escape($path)
switch -Regex -File 'C:tempFolderList2.txt' {
$pattern { $writer.WriteLine(($_ -replace $pattern).Trim()) }
default  { if ($_ -match 'S') { $writer.WriteLine($_.Trim()) }}
}
# clean up
$writer.Flush()
$writer.Dispose()

相关内容

  • 没有找到相关文章

最新更新