如何将 astropy.table.table.table 文件类型转换为 pandas.core.frame.Data

我有10个文件，都是astropy.table.table.table文件类型，都由相同的六列(mjd，filter，flux，flux_error，zp，zpsys(组成，但长度不同。首先，我想将每个文件转换为 pandas.core.frame.DataFrame 文件类型，以便我可以将它们全部添加到一个列表中，并使用 pd.concat 函数将所有 10 个文件转换为 1 个大 pandas.core.frame.DataFrame 文件。我试过这个：

import numpy as np
import pandas as pd
from astropy.table import Table
n=10
li=[]
for i in range(0,n):
file = "training_data/%s.dat"%i # This way I can call each file automatically
data = Table.read(file, format="ascii") 
data = pd.read_table(file) # I convert the file to pandas compatible
li.append(data) # I add the file into the empty list above
# now I have my list ready so I compress it into 1 file
all_data = pd.concat(li)

这种方法的问题是由于某种原因，所有列(6 列(都被压缩为 1 列，这使我无法完成其余的工作。

当我检查all_data的形状时，我得到(879,1(。它看起来像这样：

all_data.head()
mjd filter flux flux_error zp zpsys
0   0.0 desg -4.386 4.679 27.5 ab
1   0.011000000005878974 desr -0.5441 2.751 27.5 ab
2   0.027000000001862645 desi 0.4547 4.627 27.5 ab
3   0.043000000005122274 desz -1.047 4.462 27.5 ab
4   13.043000000005122 desg -4.239 4.366 27.5 ab

那么我怎样才能制作这样的文件，但将我的列维护为单独的列呢？

以下是文件 0 中的一些数据示例：

mjd     filter  flux   flux_error zp    zpsys
float64     str4    float64 float64 float64 str2
0.0       desg      -4.386  4.679   27.5    ab
0.0110000 desr  -0.5441 2.751   27.5    ab
0.0270000 desi  0.4547  4.627   27.5    ab
0.0430000 desz  -1.047  4.462   27.5    ab
13.043000 desg  -4.239  4.366   27.5    ab
13.050000 desr  4.695   3.46    27.5    ab
13.058000 desi  6.291   6.248   27.5    ab
13.074000 desz  6.412   5.953   27.5    ab
21.050000 desg  1.588   2.681   27.5    ab
21.058000 desr  -0.6124 2.171   27.5    ab

可能是Table.read()无法猜测数据的格式/分隔符。我能够使用Table.read(file, format='ascii', data_start=2)将包含的示例(文件 0 中的数据(读取到包含 6 列的表中，但我不确定是否正确捕获了空格。

我怀疑文件 0 中的示例数据并不是您正在阅读的字面意思，因为如果没有data_start=2，该文件将显示第 1 行为"float64 str4 float64 float64 float64 str2"。

您可以做的一件事是尝试Table.read(file, format='ascii', data_start=2, guess=False).

解决方案是在 data = pd.read_table(( 中包含 SEP，以便它将每列保留为单独的列，并将 SEP 的类型指定为"\s+"

n=10
li=[]
for i in range(0,n):
file = "training_data/%s.dat"%i # This way I can call each file automatically 
data = pd.read_table(file, sep="s+") # I convert the file to pandas compatible
li.append(data) # I add the file into the empty list above
# now I have my list ready so I compress it into 1 file
all_data = pd.concat(li)

相关内容

最新更新

热门标签：