将 CSV 导入 SAS;警告:无法转码的字符已在记录 XXX 中替换



当我将大型 csv 导入 SAS 时,它总是显示"警告:无法转码的字符已在记录 XXXXX 中被替换"。我应该为它做什么?

提前谢谢。

1 /**********************************************************************
2 * PRODUCT: SAS
3 * VERSION: 9.4
4 * CREATOR: External File Interface
5 * DATE: 06MAR18
6 * DESC: Generated SAS Datastep Code
7 * TEMPLATE SOURCE: (None Specified.)
8 ***********************************************************************/
9 data WORK.Companies ;
10 %let _EFIERR_ = 0; /* set the ERROR detection macro variable */
11 infile 'E:PATSTATCompanies.csv' delimiter = ',' MISSOVER DSD lrecl=13106 firstobs=2 ;
12 informat person_id best32. ;
13 informat person_name $46. ;
...
36 informat nuts3 $5. ;
37 informat nuts3_name $30. ;
38 format person_id best12. ;
39 format person_name $46. ;
...
62 format nuts3 $5. ;
63 format nuts3_name $30. ;
64 input
...
89 nuts3 $
90 nuts3_name $
91 ;
92 if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
93 run;
NOTE: A byte-order mark in the file "E:PATSTATCompanies.csv" (for fileref "#LN00025") indicates that the data is encoded in "utf-8". This encoding will be used to process the file.
NOTE: The infile 'E:PATSTATCompanies.csv' is: Filename=E:PATSTATCompanies.csv, RECFM=V, LRECL=52424, File Size (bytes)=228293377, Last Modified=03 March 2018 19:12:47 o'clock, Create Time=27 November 2017 14:10:57 o'clock
WARNING: A character that could not be transcoded has been replaced in record 775.
WARNING: A character that could not be transcoded has been replaced in record 857.
...
WARNING: A character that could not be transcoded has been replaced in record 10881.
NOTE: Limit set by ERRORS= option reached. Further warnings of this type will not be printed.
NOTE: 1048575 records were read from the infile 'E:PATSTATCompanies.csv'.
The minimum record length was 103.
The maximum record length was 680.
NOTE: The data set WORK.COMPANIES has 1048575 observations and 26 variables.
NOTE: DATA statement used (Total process time): real time 7.28 seconds cpu time 3.19 seconds
1048575 rows created in WORK.Companies from E:PATSTATCompanies.csv.
NOTE: WORK.COMPANIES data set was successfully created.
NOTE: The data set WORK.COMPANIES has 1048575 observations and 26 variables.

您需要启动支持 unicode 的 SAS 才能读取 UTF-8 字符。

您可以尝试在当前 SAS 会话中的INFILEFILENAME语句上设置ENCODING=ANY。 编码对于数字来说应该无关紧要。但是,如果您确实有无法转码为单字节WLATIN1字符的 UTF-8 字符,那么您可能在使用这些字符串时会遇到麻烦。

相关内容

最新更新