无法正确读取 python 中的 SQL 表：以逗号分隔字符/元组形式导入的 varchar 列

我使用以下代码连接到Oracle数据库：

jar = ojdbc8.jar path
jvm_path = jvm.dll path
args = '-Djava.class.path=%s' % jar
jpype.startJVM(jvm_path, args)
con = jaydebeapi.connect("oracle.jdbc.driver.OracleDriver", url,[user, password], jar)

连接工作正常，但是数据以这种奇怪的格式返回。

pd.read_sql("SELECT * FROM table1", con)

产生

+---+-----------------+-----------------+-----------------+
|   | (C,O,L,U,M,N,1) | (C,O,L,U,M,N,2) | (C,O,L,U,M,N,3) |
+---+-----------------+-----------------+-----------------+
| 1 | (t,e,s,t)       | (t,e,s,t,2)     | 1               |
+---+-----------------+-----------------+-----------------+
| 2 | (f,o,o)         | (b,a,r)         | 100             |
+---+-----------------+-----------------+-----------------+

数字和日期被正确导入，但varchar列没有被正确导入。我试过不同的桌子，但都有这个问题。

我在任何地方都没见过这样的事。希望你能帮助我。

将jaydebeapi与jpype一起使用时，这似乎是一个问题。当连接到Oracle数据库时，我可以用与您相同的方式重现这一点(在我的案例中是Oracle 11gR2，但由于您使用的是ojdbc8.jar，我想其他版本也会发生这种情况(。

有不同的方法可以解决这个问题：

更改连接

由于错误似乎只发生在特定的包组合中，因此最明智的做法是尽量避免这些错误，从而完全避免错误。

备选方案1：使用不带jpype的jaydebeapi：

如前所述，我只在使用jaydebeapi和jpype时观察到这一点。然而，在我的情况下，根本不需要jpype。我在本地有.jar文件，没有它我的连接可以正常工作：

import jaydebeapi as jdba
import pandas as pd
import os
db_host = 'db.host.com'
db_port = 1521
db_sid = 'YOURSID'
jar=os.getcwd()+'/ojdbc6.jar'
conn = jdba.connect('oracle.jdbc.driver.OracleDriver', 
'jdbc:oracle:thin:@' + db_host + ':' + str(db_port) + ':' + db_sid, 
{'user': 'USERNAME', 'password': 'PASSWORD'}, 
jar
)
df_jay = pd.read_sql('SELECT * FROM YOURSID.table1', conn)
conn.close()

在我的情况下，这可以正常工作并创建数据帧。

备选方案2：改为使用cx_Oracle：

如果我使用cx_Oracle连接到Oracle数据库：，也不会出现此问题

import cx_Oracle
import pandas as pd
import os
db_host = 'db.host.com'
db_port = 1521
db_sid = 'YOURSID'
dsn_tns = cx_Oracle.makedsn(db_host, db_port, db_sid)
cx_conn = cx_Oracle.connect('USERNAME', 'PASSWORD', dsn_tns)
df_cxo = pd.read_sql('SELECT * FROM YOURSID.table1', con=cx_conn)
cx_conn.close()

注意：要使cx_Oracle工作，您必须安装并正确设置Oracle Instant Client(例如，请参阅Ubuntu的cx_Oracle文档(。

事后修复数据帧：

如果由于某种原因，您无法使用上述连接选项，您也可以转换数据帧。

备选方案3：加入元组条目：

可以使用''.join()将元组转换为字符串。您需要对条目和列名执行此操作。

# for all entries that are not None, join the tuples
for col in df.select_dtypes(include=['object']).columns:
df[col] = df[col].apply(lambda x: ''.join(x) if x is not None else x)
# also rename the column headings in the same way
df.rename(columns=lambda x: ''.join(x) if x is not None else x, inplace=True)

备选方案4：更改列的数据类型：

通过将受影响列的dtype从object更改为string，所有条目也将被转换。请注意，这可能具有不想要的副作用，例如将None值更改为字符串<N/A>。此外，您必须分别重命名列标题，如上所述。
```
for col in df.select_dtypes(include=['object']).columns:
df[col] = df[col].astype('string')
# again, rename headings
df.rename(columns=lambda x: ''.join(x) if x is not None else x, inplace=True)
```

所有这些最终都应该产生或多或少相同的df(除了dtypes和可能替换的None值(：

+---+---------+---------+---------+
|   | COLUMN1 | COLUMN2 | COLUMN3 |
+---+---------+---------+---------+
| 1 | test    | test2   | 1       |
+---+---------+---------+---------+
| 2 | foo     | bar     | 100     |
+---+---------+---------+---------+

尝试使用convertStrings参数启动JVM。

jpype.startJVM(jvm_path, args, convertStrings=True)

请参阅https://jpype.readthedocs.io/en/latest/userguide.html#string-详细信息转换

更改连接

事后修复数据帧：

相关内容

最新更新

热门标签：