Python Numpy :如何将带有一对逗号分隔浮点数的文本文件转换为多维'ndarray'



我是 numpy 的新手,我需要转换带有数据的文本文件

219062.60893,395935.54879 219332.52719,395961.82402 219301.47465,395688.32278

219036.33371,395677.57382 218761.63814,395494.84155 219164.12686,395438.70811 219086.49551,395244.03255 218758.05515,395308.52630

到 numpy ndarray of

[[219062.60893,395935.54879],[219332.52719,395961.82402],[219301.47465,395688.32278],[219036.33371,395677.57382]],

[[218761.63814,395494.84155],[219164.12686,395438.70811],[219086.49551,395244.03255],[218758.05515,395308.52630]]]我试过的是这个

textLineArray = np.loadtxt(filePath, str, None, None, None, 0, None, False,0,'bytes',None)

给我

[['219062.60893,395935.54879' '219332.52719,395961.82402'
'219301.47465,395688.32278' '219036.33371,395677.57382'],
['218761.63814,395494.84155' '219164.12686,395438.70811'
'219086.49551,395244.03255' '218758.05515,395308.52630']]

并在进一步吐槽后与太空

spaceTextLineArray = np.char.split(textLineArray, ' ', maxsplit=None)

我明白这个

[[list(['219062.60893,395935.54879']) list(['219332.52719,395961.82402'])
list(['219301.47465,395688.32278']) list(['219036.33371,395677.57382'])],[list(['218761.63814,395494.84155']) list(['219164.12686,395438.70811'])
list(['219086.49551,395244.03255']) list(['218758.05515,395308.52630'])]]

相当他们但不完全是他们的不知道如何摆脱单引号

第一个解决方案

试试这个代码:

import numpy as np
data = []
with open('data.txt') as my_file:      
for line in my_file:  
data.append([list(map(float ,x.split(','))) for x in line.split(' ')])
arr_data = np.array(data)

arr_data将包含您的 numpy 数组:

array([[[219062.60893, 395935.54879],
[219332.52719, 395961.82402],
[219301.47465, 395688.32278],
[219036.33371, 395677.57382]],
[[218761.63814, 395494.84155],
[219164.12686, 395438.70811],
[219086.49551, 395244.03255],
[218758.05515, 395308.5263 ]]])

简要说明:

  1. 逐行读取文件
  2. 在列表中格式化和存储一行数据
  3. 将列表转换为数字数组

第二种解决方案

另一种解决方案,没有外部for循环,产生相同的结果:

arr_data = [[list(map(float, a.split(','))) for a in s] for s in np.loadtxt('myData.csv', dtype=str)]

执行时间比较

我使用了像您这样的文件格式,有 5000 行,获得的结果如下:

  • 第一个解决方案:

    # 41.4 ms ± 4.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

  • 第二种解决方案:

    # 84.6 ms ± 6.06 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

我向你提出的第一个解决方案似乎快了两倍。


额外

相反,如果您使用的是标准的csv格式,并且想将它们直接上传到numpy数组,则可以这样做:

from numpy import genfromtxt
arr_data = genfromtxt('file_data.csv',delimiter=',')

my_data将包含您的 numpy 数组。

最新更新