我正在尝试将我的csv文件转换为numpy数组,以便我可以操作数字然后绘制它们。我打印了我的csv文件并得到:
ra dec
0 15:09:11.8 -34:13:44.9
1 09:19:46.8 +33:44:58.452
2 05:15:43.488 +19:21:46.692
3 04:19:12.096 +55:52:43.32
.... 还有更多的代码(101 行 x 2 列(,但它只是数字。我想将 ra和 dec 数字从当前单位转换为度数,我认为我可以通过将每列变成一个 numpy 数组来做到这一点。但是当我编码它时:
import numpy as np
np_array = np.genfromtxt(r'C:UsersnstevDownloadsS190930t.csv',delimiter=".", skip_header=1, usecols=(4))
print(np_array)
我得到:
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan]
我一直在更改我的分隔符,我已将其更改为冒号并得到相同的东西,分号和加号,我收到一个错误,说它有 2 列而不是 1。我不知道如何更改它,以免我得到这套!请有人帮忙!
使用文件示例的复制粘贴:
In [208]: data = np.genfromtxt('stack59761369.csv',encoding=None,dtype=None,names=True)
In [209]: data
Out[209]:
array([('15:09:11.8', '-34:13:44.9'), ('09:19:46.8', '+33:44:58.452'),
('05:15:43.488', '+19:21:46.692'),
('04:19:12.096', '+55:52:43.32')],
dtype=[('ra', '<U12'), ('dec', '<U13')])
有了这个 dtype 和名称,我得到了一个结构化数组 1d,有 2 个字段。
In [210]: data['ra']
Out[210]:
array(['15:09:11.8', '09:19:46.8', '05:15:43.488', '04:19:12.096'],
dtype='<U12')
In [211]: np.char.split(data['ra'],':')
Out[211]:
array([list(['15', '09', '11.8']), list(['09', '19', '46.8']),
list(['05', '15', '43.488']), list(['04', '19', '12.096'])],
dtype=object)
此拆分给出了一个带有列表的对象 dtype 数组。 它们可以连接到一个 2D 数组中,vstack
:
In [212]: np.vstack(np.char.split(data['ra'],':'))
Out[212]:
array([['15', '09', '11.8'],
['09', '19', '46.8'],
['05', '15', '43.488'],
['04', '19', '12.096']], dtype='<U6')
并转换为浮点数:
In [213]: np.vstack(np.char.split(data['ra'],':')).astype(float)
Out[213]:
array([[15. , 9. , 11.8 ],
[ 9. , 19. , 46.8 ],
[ 5. , 15. , 43.488],
[ 4. , 19. , 12.096]])
In [214]: np.vstack(np.char.split(data['dec'],':')).astype(float)
Out[214]:
array([[-34. , 13. , 44.9 ],
[ 33. , 44. , 58.452],
[ 19. , 21. , 46.692],
[ 55. , 52. , 43.32 ]])
熊猫
In [256]: df = pd.read_csv('stack59761369.csv',delim_whitespace=True)
In [257]: df
Out[257]:
ra dec
0 15:09:11.8 -34:13:44.9
1 09:19:46.8 +33:44:58.452
2 05:15:43.488 +19:21:46.692
3 04:19:12.096 +55:52:43.32
In [258]: df['ra'].str.split(':',expand=True).astype(float)
Out[258]:
0 1 2
0 15.0 9.0 11.800
1 9.0 19.0 46.800
2 5.0 15.0 43.488
3 4.0 19.0 12.096
In [259]: df['dec'].str.split(':',expand=True).astype(float)
Out[259]:
0 1 2
0 -34.0 13.0 44.900
1 33.0 44.0 58.452
2 19.0 21.0 46.692
3 55.0 52.0 43.320
直接线路读取
In [279]: lines = []
In [280]: with open('stack59761369.csv') as f:
...: header=f.readline()
...: for row in f:
...: alist = row.split()
...: alist = [[float(i) for i in astr.split(':')] for astr in alist]
...: lines.append(alist)
...:
In [281]: lines
Out[281]:
[[[15.0, 9.0, 11.8], [-34.0, 13.0, 44.9]],
[[9.0, 19.0, 46.8], [33.0, 44.0, 58.452]],
[[5.0, 15.0, 43.488], [19.0, 21.0, 46.692]],
[[4.0, 19.0, 12.096], [55.0, 52.0, 43.32]]]
In [282]: np.array(lines)
Out[282]:
array([[[ 15. , 9. , 11.8 ],
[-34. , 13. , 44.9 ]],
[[ 9. , 19. , 46.8 ],
[ 33. , 44. , 58.452]],
[[ 5. , 15. , 43.488],
[ 19. , 21. , 46.692]],
[[ 4. , 19. , 12.096],
[ 55. , 52. , 43.32 ]]])
In [283]: _.shape
Out[283]: (4, 2, 3)
第一个维度是行数;第二个维度是 2 列,第三个维度是列中的 3 个值
转换为学位
In [285]: _282@[1,1/60,1/360]
Out[285]:
array([[ 15.18277778, -33.65861111],
[ 9.44666667, 33.8957 ],
[ 5.3708 , 19.4797 ],
[ 4.35026667, 55.987 ]])
哎呀,-34 度值是错误的;元素的所有项都必须具有相同的符号。
校正
识别具有负度的元素:
In [296]: mask = np.sign(_282[:,:,0])
In [297]: mask
Out[297]:
array([[ 1., -1.],
[ 1., 1.],
[ 1., 1.],
[ 1., 1.]])
相应地调整所有 3 个术语:
In [298]: x = np.abs(_282)*mask[:,:,None]
In [299]: x
Out[299]:
array([[[ 15. , 9. , 11.8 ],
[-34. , -13. , -44.9 ]],
[[ 9. , 19. , 46.8 ],
[ 33. , 44. , 58.452]],
[[ 5. , 15. , 43.488],
[ 19. , 21. , 46.692]],
[[ 4. , 19. , 12.096],
[ 55. , 52. , 43.32 ]]])
In [300]: x@[1, 1/60, 1/360]
Out[300]:
array([[ 15.18277778, -34.34138889],
[ 9.44666667, 33.8957 ],
[ 5.3708 , 19.4797 ],
[ 4.35026667, 55.987 ]])
nan
可能是NaN
(不是数字(。尝试将数据类型设置为"无"(dtype=None
(。
另外,请尝试省略delimiter
。默认情况下,任何连续的空格都充当分隔符。
不确定你期待什么,但也许这将是一个更好的起点......
import numpy as np
np_array = np.genfromtxt(r"C:UsersnstevDownloadsS190930t.csv", skip_header=1, dtype=None, encoding="utf-8", usecols=(1, 2))
print(np_array)
打印输出...
[['15:09:11.8' '-34:13:44.9']
['09:19:46.8' '+33:44:58.452']
['05:15:43.488' '+19:21:46.692']
['04:19:12.096' '+55:52:43.32']]
免责声明:我不使用 numpy。我的答案基于 https://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html