在我的数据库中,我在表的字段中有这个值
CIÖN
元音中的重音";O〃;,当我通过连接DBI将表中的信息输入R时,字符以这种方式错误地显示
CI\xe0N
我已经尝试定义其他编码,比如";latin1";以及";windows-1252";但他们都失败了。
代码尝试:
library(DBI)
con <- dbConnect(odbc::odbc(), Driver = "{MariaDB ODBC 3.1 Driver}",
Server = "{server}", database = "database", UID = "user",
PWD = rstudioapi::askForPassword("password"),
Port = "port"
, encoding = "windows-1252")
# , encoding = "latin1") #I've tried "latin1" too
# , encoding = "UTF-8") #I've tried "UTF-8" too
sql_r<- "select field from table"
res <- dbGetQuery(con,sql_r)
尝试编码:
LATIN1 display -> IàN
LATIN2 display -> IŕN
WINDOWS-1252 display -> IàN
UTF-8 display -> Ixe0N (is the default display)
如果有助于,我将在服务器SQL中共享sessionInfo(((区域设置(和表的信息
sessionInfo()
locale:
[1] LC_COLLATE=Spanish_Mexico.utf8 LC_CTYPE=Spanish_Mexico.utf8 LC_MONETARY=Spanish_Mexico.utf8
[4] LC_NUMERIC=C LC_TIME=Spanish_Mexico.utf8
DB中表的排序规则。
utf8mb3_general_ci
您面临一个mojibake案例。以下R
代码可以帮助检测dbConnect
函数中参数encoding
的正确值:
x <- "CIxe0N"
c(x, uchardet::detect_str_enc(x))
# [1] "CIxe0N" "IBM852"
事实上,Ó
(U+00D3,带锐音符的拉丁文大写字母O(的特定mojibake大小写遇到了更多的编码(Python中的示例,因为它具有通用的可理解性(:
['Ó'.encode(e).decode('LATIN1') for e in ['cp775','cp850','cp852','cp857','cp858']]
# ['à', 'à', 'à', 'à', 'à']
['Ó'.encode(e).decode('LATIN2') for e in ['cp775','cp850','cp852','cp857','cp858']]
# ['ŕ', 'ŕ', 'ŕ', 'ŕ', 'ŕ']