Python只在ASCII值为的情况下写入文本文件



我正在尝试编写一个程序,该程序将允许我相互比较SQL文件,并已开始将完整的SQL文件写入文本文件。文本文件成功生成,但末尾有块,如下例所示:

SET ANSI_NULLS ON਍ഀ
GO਍ഀ
SET QUOTED_IDENTIFIER ON਍ഀ 
GO਍ഀ
CREATE TABLE [dbo].[CDR](਍ഀ

下面是生成文本文件的代码

#!/usr/bin/python
# -*- coding: utf-8 -*-
import os
from _ast import Num 
#imports packages
r= open('master_lines.txt', 'w')
directory= "E:\" #file directory, anonymous omission
master= directory + "master" 
databases= ["\1", "\2", "\3", "\4"]
file_types= ["\StoredProcedure", "\Table", "\UserDefinedFunction", "\View"]
servers= []
server_number= []
master_lines= []
for file in os.listdir("E:\"):     #adds server paths to an array   
servers.append(file)
for num in range(0, len(servers)):
for file in os.listdir(directory + servers[num]):      #adds all the servers and paths to an array 
server_number.append(servers[num] + "\" + file)
master= directory + server_number[server_number.index("master")]
master_var= master + databases[0]
tmp= master_var + file_types[1]
for file in os.listdir(tmp):
with open(file) as tmp_file:
line= tmp_file.readlines()
for num in range(0, len(line)):
r.write(line[num])                      
r.close

我已经尝试将编码更改为latin1和utf-8;当前的文本文件是最成功的,因为ascii和latin1分别生成中文和阿拉伯字符。

以下是文本格式的SQL文件:

/****** Object:  Table [dbo].[CDR]    Script Date: 2017-01-12 02:30:49 PM ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[CDR](
[calldate] [datetime] NOT NULL,
[clid] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[src] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dst] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dcontext] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[channel] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[dstchannel] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[lastapp] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[lastdata] [varchar](80) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[duration] [int] NOT NULL,
[billsec] [int] NOT NULL,
[disposition] [varchar](45) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[amaflags] [int] NOT NULL,
[accountcode] [varchar](20) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[userfield] [varchar](255) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[uniqueid] [varchar](64) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL,
[cdr_id] [int] NOT NULL,
[cost] [real] NOT NULL,
[cdr_tag] [varchar](10) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
[importID] [bigint] IDENTITY(-9223372036854775807,1) NOT NULL,
CONSTRAINT [PK_CDR_1] PRIMARY KEY CLUSTERED 
(
[uniqueid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [ReadPartition]
) ON [ReadPartition]
GO
SET ANSI_PADDING ON
GO
/****** Object:  Index [Idx_Dst_incl_uniqueId]    Script Date: 2017-01-12 02:30:50 PM ******/
CREATE NONCLUSTERED INDEX [Idx_Dst_incl_uniqueId] ON [dbo].[CDR]
(
[dst] ASC
)
INCLUDE (   [uniqueid]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90) ON [ReadPartition]
GO

十六进制转储以了解发生了什么,而不是上述问题的一部分:

ff fe 2f 00 2a 00 2a 00 2a 00 2a 00 2a 00 2a 00 
20 00 4f 00 62 00 6a 00 65 00 63 00 74 00 3a 00 
20 00 20 00 54 00 61 00 62 00 6c 00 65 00 20 00 
5b 00 64 00 62 00 6f 00 5d 00 2e 00 5b 00 43 00 
44 00 52 00 5d 00 20 00 20 00 20 00 20 00 53 00 
63 00 72 00 69 00 70 00 74 00 20 00 44 00 61 00 
74 00 65 00 3a 00 20 00 32 00 30 00 31 00 37 00 
2d 00 30 00 31 00 2d 00 31 00 32 00 20 00 30 00 
32 00 3a 00 33 00 30 00 3a 00 34 00 39 00 20 00 
50 00 4d 00 20 00 2a 00 2a 00 2a 00 2a 00 2a 00 
2a 00 2f 00 0d 00 0a 00 53 00 45 00 54 00 20 00 
41 00 4e 00 53 00 49 00 5f 00 4e 00 55 00 4c 00 
4c 00 53 00 20 00 4f 00 4e 00 0d 00 0a 00 47 00 
4f 00 0d 00 0a 00 53 00 45 00 54 00 20 00 51 00 
55 00 4f 00 54 00 45 00 44 00 5f 00 49 00 44 00 

六进制转储结果:

../.*.*.*.*.*.*.
.O.b.j.e.c.t.:.
. .T.a.b.l.e. .
[.d.b.o.]...[.C.
D.R.]. . . . .S.
c.r.i.p.t. .D.a.
t.e.:. .2.0.1.7.
-.0.1.-.1.2. .0.
2.:.3.0.:.4.9. .
P.M. .*.*.*.*.*.
*./.....S.E.T. .
A.N.S.I._.N.U.L.
L.S. .O.N.....G.
O.....S.E.T. .Q.
U.O.T.E.D._.I.D.

您的问题是原始文件在UTF-16中使用初始字节顺序标记进行编码。它在Windows上通常是透明的,因为由于初始BOM表,几乎所有文件编辑器都会自动读取它。

但是Python脚本的转换不是自动的!这意味着每个字符都被读取为字符本身,后面跟着一个null。除了行尾之外,它几乎是透明的,因为null只是被再次写回以形成正常的UTF16字符。但是,n前面不再有原始r,而是有一个null,就像在文本模式中编写一样,Python用一对不再是有效UTF16字符的rn替换它,这会导致块显示。

这是微不足道的修复,只需在读取文件时声明UTF16编码:

for file in os.listdir(tmp):
with open(file, encoding='utf_16_le') as tmp_file:

或者,如果你想保留UTF16编码,你也可以用它打开主文件。默认情况下,Python会将其编码为utf8。但我的建议是,如果您以后想要处理输出文件,请恢复到8位编码文件,以避免出现进一步的问题。

最新更新