将 astropy 表转换为熊猫数据帧到 hdf 文件时出错



我正在尝试从盖亚目录中获取一些数据,然后将 astropy 表转换为熊猫数据帧,然后我想将其存储在 hdf5 文件中。我不能直接将 astropy 表(查询结果(存储到 hdf5 文件中,因为我需要对其进行一些处理。

问题是当我想将数据帧存储到 hdf 文件中时,我收到此错误:

Traceback (most recent call last):
File "C:/Users/Administrateur.UTILISA-D5U7HV7/Documents/MEGA/ipsa/cours/aero4/stage/working_directory/python/tests/stackoverflow_issue/1_panda_to_hdf/tohdf.py", line 8, in <module>
pd_table.to_hdf("test.h5", key="test", format='table', data_columns=True, mode="w", encoding="utf-8")
File "C:UsersAdministrateur.UTILISA-D5U7HV7DocumentsMEGAipsacoursaero4stageworking_directorypythonvenvlibsite-packagespandascoregeneric.py", line 2505, in to_hdf
encoding=encoding,
File "C:UsersAdministrateur.UTILISA-D5U7HV7DocumentsMEGAipsacoursaero4stageworking_directorypythonvenvlibsite-packagespandasiopytables.py", line 282, in to_hdf
f(store)
File "C:UsersAdministrateur.UTILISA-D5U7HV7DocumentsMEGAipsacoursaero4stageworking_directorypythonvenvlibsite-packagespandasiopytables.py", line 274, in <lambda>
encoding=encoding,
File "C:UsersAdministrateur.UTILISA-D5U7HV7DocumentsMEGAipsacoursaero4stageworking_directorypythonvenvlibsite-packagespandasiopytables.py", line 1042, in put
errors=errors,
File "C:UsersAdministrateur.UTILISA-D5U7HV7DocumentsMEGAipsacoursaero4stageworking_directorypythonvenvlibsite-packagespandasiopytables.py", line 1709, in _write_to_group
data_columns=data_columns,
File "C:UsersAdministrateur.UTILISA-D5U7HV7DocumentsMEGAipsacoursaero4stageworking_directorypythonvenvlibsite-packagespandasiopytables.py", line 4143, in write
data_columns=data_columns,
File "C:UsersAdministrateur.UTILISA-D5U7HV7DocumentsMEGAipsacoursaero4stageworking_directorypythonvenvlibsite-packagespandasiopytables.py", line 3813, in _create_axes
errors=self.errors,
File "C:UsersAdministrateur.UTILISA-D5U7HV7DocumentsMEGAipsacoursaero4stageworking_directorypythonvenvlibsite-packagespandasiopytables.py", line 4800, in _maybe_convert_for_string_atom
for i in range(len(block.shape[0])):
TypeError: object of type 'int' has no len()

我首先认为是我的计算产生了问题,但即使没有它,我也得到了错误。

这是我的代码:

  • 首先,您需要通过运行以下命令获取包含查询结果的文件:(查询可能需要几分钟(
from astroquery.gaia import Gaia
job3 = Gaia.launch_job_async("SELECT * 
FROM gaiadr1.gaia_source 
WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 
AND abs(pmra_error/pmra)<0.10 
AND abs(pmdec_error/pmdec)<0.10 
AND pmra IS NOT NULL AND abs(pmra)>0 
AND pmdec IS NOT NULL AND abs(pmdec)>0 
AND pmra BETWEEN 15 AND 25 
AND pmdec BETWEEN -55 AND -40;", dump_to_file=True)
print(job3)
p = job3.get_results()
  • 然后,您可以运行以下代码,它会显示上述错误。请注意Table.read()函数中的文件名,因为查询不会提供与以下示例相同的名称。
from astropy.table import Table
import pandas as pd
table = Table.read("async_20200611171019.vot", format='votable')
pd_table = table.to_pandas()
print(pd_table)
pd_table.to_hdf("test.h5", key="test", format='table', data_columns=True, mode="w", encoding="utf-8")
hdf_table = pd.DataFrame(pd.read_hdf("test.h5"))
print(hdf_table)

有没有人知道这个问题可能出现在哪里?谢谢!

看起来phot_variable_flag列具有对象 dtype,即它是一个对象的 numpy 数组。 它也被屏蔽了:

In [30]: table['phot_variable_flag'].dtype                                                                                                                    
Out[30]: dtype('O')
In [31]: type(table['phot_variable_flag'])                                                                                                                    
Out[31]: astropy.table.column.MaskedColumn

当我删除该列时,它被熊猫成功写入为HDF5。

最新更新