csv文件UTF-8(带BOM)转换为ANSI / Windows-1251



我希望创建一个批处理文件/宏来删除自动生成的UTF-8 CSV的第一行,并将其转换为Windows代码页1251 ("ANSI")。我一直在网上找,试了很多东西,但就是找不到一个有效的…

删除第一行很简单

@echo off
set "csv=test.csv"
more +1 "%csv%" >"%csv%.new"
move /y "%csv%.new" "export%csv%" >nul

之后我迷路了,我试着使用DOS的TYPE集合

cmd /a /c TYPE test.csv > ansi.csv

和它的许多变体,但它要么返回一个空的CP1251文件,要么只是另一个UTF文件。

我已经尝试使用vbs,但这返回了另一个UTF-8文件,但现在没有BOM

Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
    Dim strText
    With CreateObject("ADODB.Stream")
        .Open
        .Type = adTypeBinary
        .LoadFromFile UTF8FName
        .Type = adTypeText
        .Charset = "utf-8"
        strText = .ReadText(adReadAll)
        .Position = 0
        .SetEOS
        .Charset = "_autodetect" 'Use current ANSI codepage.
        .WriteText strText, adWriteChar
        .SaveToFile ANSIFName, adSaveCreateOverWrite
        .Close
    End With
End Sub
UTF8toANSI "UTF8-wBOM.txt", "ANSI1.txt"
UTF8toANSI "UTF8-noBOM.txt", "ANSI2.txt"
MsgBox "Complete!", vbOKOnly, WScript.ScriptName

EDIT1:尝试使用VBS转换为iso-8859-1而不是cp1251

Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName)
  Dim strText
  With CreateObject("ADODB.Stream")
    .Open
    .Type = adTypeBinary
    .LoadFromFile UTF8FName
    .Type = adTypeText
    .Charset = "utf-8"
    strText = .ReadText(adReadAll)
    .Position = 0
    .SetEOS
    .Charset = "iso-8859-1"
    .WriteText strText, adWriteChar
    .SaveToFile ANSIFName, adSaveCreateOverWrite
    .Close
  End With
End Sub
UTF8toANSI WScript.Arguments(0), WScript.Arguments(1)

然而,这也没有工作。

编辑2:我找到了一种使用stringconverter.exe将文件从UTF转换为ANSI的方法(从http://www.computerperformance.co.uk/ezine/tools.htm下载)

Setlocal
Set _source=C:Userslloyd.EVDdelFirstBatimport
Set _dest=C:Userslloyd.EVDdelFirstBatexport
For /F "Tokens=*" %%I In ('dir /b /a-d "%_source%*.CSV"') Do stringconverter "%_source%%%~nxI" "%_dest%%%~nxI" /ANSI

现在,当我删除文件的第一行(之前或之后,无关紧要)时,它又变成了一个没有BOM的UTF-8。

所以我现在需要的是一个脚本来删除第一行而不改变字符集

下一步VBScript可以提供帮助:程序UTF8toANSIUTF-8编码的文本文件转换为另一种编码。

Option Explicit
Private Const adReadAll = -1
Private Const adSaveCreateOverWrite = 2
Private Const adTypeBinary = 1
Private Const adTypeText = 2
Private Const adWriteChar = 0
Private Sub UTF8toANSI(ByVal UTF8FName, ByVal ANSIFName, ByVal ANSICharSet)
  Dim strText
  With CreateObject("ADODB.Stream")
    .Type = adTypeText
    .Charset = "utf-8"
    .Open
    .LoadFromFile UTF8FName
    strText = .ReadText(adReadAll)
    .Close
    .Charset = ANSICharSet
    .Open
    .WriteText strText, adWriteChar
    .SaveToFile ANSIFName, adSaveCreateOverWrite
    .Close
  End With
End Sub
'UTF8toANSI WScript.Arguments(0), WScript.Arguments(1)
UTF8toANSI "D:testSO38835837utf8.csv", "D:testSO38835837ansi1250.csv", "windows-1250"
UTF8toANSI "D:testSO38835837utf8.csv", "D:testSO38835837ansi1251.csv", "windows-1251"
UTF8toANSI "D:testSO38835837utf8.csv", "D:testSO38835837ansi1253.csv", "windows-1253"

关于系统所知道的字符集名称的列表,请参见Windows注册表中HKEY_CLASSES_ROOTMIMEDatabaseCharset的子键:

for /F "tokens=5* delims=" %# in ('reg query HKCRMIMEDatabaseCharset') do @echo "%#"

Data (38835837utf8.csv file):

1st Line    1250    852 čeština (Čechie)
2nd Line    1251    966 русский (Россия)
3rd Line    1253    737 ελληνικά (Ελλάδα)

Output显示那些不能转换为特定字符集的字符使用字符分解映射(č => c, š => s, Č => C等)进行转换;如果不适用,那么这些都转换为?问号(通用替换字符):

==> chcp 1250
Active code page: 1250
==> type D:testSO38835837ansi1250.csv
1st Line        1250    852     čeština (Čechie)
2nd Line        1251    966     ??????? (??????)
3rd Line        1253    737     ???????? (??????)
==> chcp 1251
Active code page: 1251
==> type D:testSO38835837ansi1251.csv
1st Line        1250    852     cestina (Cechie)
2nd Line        1251    966     русский (Россия)
3rd Line        1253    737     ???????? (??????)
==> chcp 1253
Active code page: 1253
==> type D:testSO38835837ansi1253.csv
1st Line        1250    852     cestina (Cechie)
2nd Line        1251    966     ??????? (??????)
3rd Line        1253    737     ελληνικά (Ελλάδα)

最新更新