Python 3.2 - Numpy 1.9 genfromtxt



我正试图在一个文件中读取,该文件具有。csv格式的多种数据格式。我使用Python3.2和Numpy 1.9。我使用numpy genfromtxt函数来读取数据。我希望我能转换数据,因为我读到适当地存储它,而不是以后处理它,为此我在选项中使用转换器函数。

使用多个转换器函数似乎会产生问题。下面列出了代码、代码的输入和输出。如您所见,第一行输出来自输入文件的不同列。

以前有人使用过这个功能吗?是否有一个bug在我的代码的某个地方?

代码:

 converterfunc_time=   lambda x : (datetime.strptime(x.decode('UTF-8'),'%m/%d/%Y %I:%M:%S %p'))
    def converterfunc_lat(x):
        print(x);    print(x.decode('UTF-8'))
        #return float(x.decode('utf-8').split('N')[1])
    def converterfunc_san(x):
        #print(x)
        return (x.decode('UTF-8'))  

class input_file_processing():
        def __init__(self): 
             self.input_data=(np.genfromtxt('filename',skip_header=1,dtype=None,usecols=(0,1,6,7,8,9,10,13), names="Date,SAN,LatDeg,LatMin,LonDeg,LonMin,Beam,EsNo",
                              converters=0:converterfunc_time,1:converterfunc_san,6:converterfunc_lat},    delimiter=','))

* *输入* *

input, file, 1
4/2/2015 2:13:44 PM,DSN001000557867,03-01-01,0010155818,0,0,N33,00.546,W118,00.638,3,11,1,104,102,82,6,18,2048,4039587
4/2/2015 2:13:55 PM,DSN001000861511,03-01-02,0010416164,0,0,N33,00.883,W118,00.208,3,11,1,106,102,88,6,18,2048,2792940
4/2/2015 2:14:44 PM,DSN001000871692,03-01-04,0010408734,0,0,N33,00.876,W118,00.110,3,11,1,105,102,80,6,18,2048,312623
4/2/2015 2:14:52 PM,DSN001000864906,03-01-05,0010055143,0,0,N33,08.000,W118,03.000,3,11,1,107,99,83,6,18,2048,3056425
4/2/2015 2:15:00 PM,DSN001000838651,03-01-06,0010265541,0,0,N33,09.749,W118,00.317,3,11,1,100,110,74,6,14,2048,3737937
4/2/2015 2:15:08 PM,DSN001000609313,03-01-07,0010152885,0,0,N33,05.854,W118,04.107,3,11,1,94,95,62,6,14,2048,8221318
4/2/2015 2:15:19 PM,DSS31967278,03-01-08,0010350817,0,0,N33,04.551,W118,02.359,3,11,1,127,105,77,6,21,2048,21157710
4/2/2015 2:16:08 PM,DSN001000822728,03-01-10,0010051377,0,0,N33,00.899,W118,00.132,3,11,1,116,95,61,6,19,2048,3526254

b'03-01-01'
03-01-01
b'N33'
N33
b'N33'
N33
b'N33'
N33
b'N33'
N33
b'N33'

谢谢

我不太清楚发生了什么。但是这个脚本运行:

import numpy as np
from datetime import datetime
txt = b"""input, file, 1
4/2/2015 2:13:44 PM,DSN001000557867,03-01-01,0010155818,0,0,N33,00.546,W118,00.638,3,11,1,104,102,82,6,18,2048,4039587
4/2/2015 2:13:55 PM,DSN001000861511,03-01-02,0010416164,0,0,N34,00.883,W118,00.208,3,11,1,106,102,88,6,18,2048,2792940
4/2/2015 2:14:44 PM,DSN001000871692,03-01-04,0010408734,0,0,N35,00.876,W118,00.110,3,11,1,105,102,80,6,18,2048,312623
4/2/2015 2:14:52 PM,DSN001000864906,03-01-05,0010055143,0,0,N36,08.000,W118,03.000,3,11,1,107,99,83,6,18,2048,3056425
4/2/2015 2:15:00 PM,DSN001000838651,03-01-06,0010265541,0,0,N33,09.749,W118,00.317,3,11,1,100,110,74,6,14,2048,3737937
4/2/2015 2:15:08 PM,DSN001000609313,03-01-07,0010152885,0,0,N33,05.854,W118,04.107,3,11,1,94,95,62,6,14,2048,8221318
"""
txt = txt.splitlines()
#txt = txt[1:]
txt = txt[:3]
converterfunc_time = lambda x : (datetime.strptime(x.decode('UTF-8'),'%m/%d/%Y %I:%M:%S %p'))
def converterfunc_lat(x):
    print('lat ',x, x.decode('UTF-8'))
    x1 = x.decode('utf-8').split('N')
    if len(x1)>1:
        x1 = float(x1[1])
        print('float',x1)
        return x1
    else:
        print('error')
        return "error"
def converterfunc_san(x):
    #print(x)
    return x.decode('UTF-8')
data = np.genfromtxt(txt, skip_header=1,
                    dtype=None,
                    usecols=(0,1,6,7,8,9,10,13),
                    names="Date,SAN,LatDeg,LatMin,LonDeg,LonMin,Beam,EsNo",
                    delimiter=',')
print(data)
print()
input_data=np.genfromtxt(txt,
            skip_header=1,
            dtype='O,a20,f',
            usecols=(0,1,6,), #(0,1,6,7,8,9,10,13),
            names="Date,SAN,LatDeg,LatMin,LonDeg,LonMin,Beam,EsNo",
            converters={0:converterfunc_time,
                        1:converterfunc_san,
                        6:converterfunc_lat},
            delimiter=',')
print(input_data)

,

1552:~/mypy$ python3 stack30269235.py 
[ (b'4/2/2015 2:13:44 PM', b'DSN001000557867', b'N33', 0.546, b'W118', 0.638, 3, 104)
 (b'4/2/2015 2:13:55 PM', b'DSN001000861511', b'N34', 0.883, b'W118', 0.208, 3, 106)]
lat  b'03-01-01' 03-01-01
error
lat  b'N33' N33
float 33.0
lat  b'N34' N34
float 34.0
[(datetime.datetime(2015, 4, 2, 14, 13, 44), b'DSN001000557867', 33.0)
 (datetime.datetime(2015, 4, 2, 14, 13, 55), b'DSN001000861511', 34.0)]

我必须把你问题中缺少的一些部分补上。

我已经添加了一个显式的dtype,以确保我得到字符串和浮点列。

我修改了lat转换器,使其不会在'03-01-01'输入上阻塞. ...


genfromtxt使您的转换器的某种测试运行:

    # Find the value to test:
    if len(first_line):
        testing_value = first_values[i]
    else:
        testing_value = None
    converters[i].update(conv, locked=True,
                         testing_value=testing_value,
                         default=filling_values[i],
                         missing_values=missing_values[i],)
    uc_update.append((i, conv))

看起来它正在取第一行数据:

4/2/2015 2:13:44 PM,DSN001000557867,03-01-01,0010155818,0,0,N33

在分隔符上分割它,并使用第三个字符串03-01-01作为测试值。也就是说,它在usecols参数中使用索引6而不是6。它有匹配usecols,转换器id, names和dtype的问题。

该测试值的目的是确定该列的dtype。这在dtype=None的情况下是需要的。我不知道如果你指定dtype是如何使用的。显然它还在运行。

在我没有跳过列的测试中,匹配转换器和测试值没有问题。

相关内容

  • 没有找到相关文章

最新更新