我正试图在一个文件中读取,该文件具有。csv格式的多种数据格式。我使用Python3.2和Numpy 1.9。我使用numpy genfromtxt函数来读取数据。我希望我能转换数据,因为我读到适当地存储它,而不是以后处理它,为此我在选项中使用转换器函数。
使用多个转换器函数似乎会产生问题。下面列出了代码、代码的输入和输出。如您所见,第一行输出来自输入文件的不同列。
以前有人使用过这个功能吗?是否有一个bug在我的代码的某个地方?
代码:
converterfunc_time= lambda x : (datetime.strptime(x.decode('UTF-8'),'%m/%d/%Y %I:%M:%S %p'))
def converterfunc_lat(x):
print(x); print(x.decode('UTF-8'))
#return float(x.decode('utf-8').split('N')[1])
def converterfunc_san(x):
#print(x)
return (x.decode('UTF-8'))
class input_file_processing():
def __init__(self):
self.input_data=(np.genfromtxt('filename',skip_header=1,dtype=None,usecols=(0,1,6,7,8,9,10,13), names="Date,SAN,LatDeg,LatMin,LonDeg,LonMin,Beam,EsNo",
converters=0:converterfunc_time,1:converterfunc_san,6:converterfunc_lat}, delimiter=','))
* *输入* *
input, file, 1
4/2/2015 2:13:44 PM,DSN001000557867,03-01-01,0010155818,0,0,N33,00.546,W118,00.638,3,11,1,104,102,82,6,18,2048,4039587
4/2/2015 2:13:55 PM,DSN001000861511,03-01-02,0010416164,0,0,N33,00.883,W118,00.208,3,11,1,106,102,88,6,18,2048,2792940
4/2/2015 2:14:44 PM,DSN001000871692,03-01-04,0010408734,0,0,N33,00.876,W118,00.110,3,11,1,105,102,80,6,18,2048,312623
4/2/2015 2:14:52 PM,DSN001000864906,03-01-05,0010055143,0,0,N33,08.000,W118,03.000,3,11,1,107,99,83,6,18,2048,3056425
4/2/2015 2:15:00 PM,DSN001000838651,03-01-06,0010265541,0,0,N33,09.749,W118,00.317,3,11,1,100,110,74,6,14,2048,3737937
4/2/2015 2:15:08 PM,DSN001000609313,03-01-07,0010152885,0,0,N33,05.854,W118,04.107,3,11,1,94,95,62,6,14,2048,8221318
4/2/2015 2:15:19 PM,DSS31967278,03-01-08,0010350817,0,0,N33,04.551,W118,02.359,3,11,1,127,105,77,6,21,2048,21157710
4/2/2015 2:16:08 PM,DSN001000822728,03-01-10,0010051377,0,0,N33,00.899,W118,00.132,3,11,1,116,95,61,6,19,2048,3526254
b'03-01-01'
03-01-01
b'N33'
N33
b'N33'
N33
b'N33'
N33
b'N33'
N33
b'N33'
谢谢
我不太清楚发生了什么。但是这个脚本运行:
import numpy as np
from datetime import datetime
txt = b"""input, file, 1
4/2/2015 2:13:44 PM,DSN001000557867,03-01-01,0010155818,0,0,N33,00.546,W118,00.638,3,11,1,104,102,82,6,18,2048,4039587
4/2/2015 2:13:55 PM,DSN001000861511,03-01-02,0010416164,0,0,N34,00.883,W118,00.208,3,11,1,106,102,88,6,18,2048,2792940
4/2/2015 2:14:44 PM,DSN001000871692,03-01-04,0010408734,0,0,N35,00.876,W118,00.110,3,11,1,105,102,80,6,18,2048,312623
4/2/2015 2:14:52 PM,DSN001000864906,03-01-05,0010055143,0,0,N36,08.000,W118,03.000,3,11,1,107,99,83,6,18,2048,3056425
4/2/2015 2:15:00 PM,DSN001000838651,03-01-06,0010265541,0,0,N33,09.749,W118,00.317,3,11,1,100,110,74,6,14,2048,3737937
4/2/2015 2:15:08 PM,DSN001000609313,03-01-07,0010152885,0,0,N33,05.854,W118,04.107,3,11,1,94,95,62,6,14,2048,8221318
"""
txt = txt.splitlines()
#txt = txt[1:]
txt = txt[:3]
converterfunc_time = lambda x : (datetime.strptime(x.decode('UTF-8'),'%m/%d/%Y %I:%M:%S %p'))
def converterfunc_lat(x):
print('lat ',x, x.decode('UTF-8'))
x1 = x.decode('utf-8').split('N')
if len(x1)>1:
x1 = float(x1[1])
print('float',x1)
return x1
else:
print('error')
return "error"
def converterfunc_san(x):
#print(x)
return x.decode('UTF-8')
data = np.genfromtxt(txt, skip_header=1,
dtype=None,
usecols=(0,1,6,7,8,9,10,13),
names="Date,SAN,LatDeg,LatMin,LonDeg,LonMin,Beam,EsNo",
delimiter=',')
print(data)
print()
input_data=np.genfromtxt(txt,
skip_header=1,
dtype='O,a20,f',
usecols=(0,1,6,), #(0,1,6,7,8,9,10,13),
names="Date,SAN,LatDeg,LatMin,LonDeg,LonMin,Beam,EsNo",
converters={0:converterfunc_time,
1:converterfunc_san,
6:converterfunc_lat},
delimiter=',')
print(input_data)
,
1552:~/mypy$ python3 stack30269235.py
[ (b'4/2/2015 2:13:44 PM', b'DSN001000557867', b'N33', 0.546, b'W118', 0.638, 3, 104)
(b'4/2/2015 2:13:55 PM', b'DSN001000861511', b'N34', 0.883, b'W118', 0.208, 3, 106)]
lat b'03-01-01' 03-01-01
error
lat b'N33' N33
float 33.0
lat b'N34' N34
float 34.0
[(datetime.datetime(2015, 4, 2, 14, 13, 44), b'DSN001000557867', 33.0)
(datetime.datetime(2015, 4, 2, 14, 13, 55), b'DSN001000861511', 34.0)]
我必须把你问题中缺少的一些部分补上。
我已经添加了一个显式的dtype
,以确保我得到字符串和浮点列。
我修改了lat
转换器,使其不会在'03-01-01'输入上阻塞. ...
genfromtxt
使您的转换器的某种测试运行:
# Find the value to test:
if len(first_line):
testing_value = first_values[i]
else:
testing_value = None
converters[i].update(conv, locked=True,
testing_value=testing_value,
default=filling_values[i],
missing_values=missing_values[i],)
uc_update.append((i, conv))
看起来它正在取第一行数据:
4/2/2015 2:13:44 PM,DSN001000557867,03-01-01,0010155818,0,0,N33
在分隔符上分割它,并使用第三个字符串03-01-01
作为测试值。也就是说,它在usecols参数中使用索引6而不是6
。它有匹配usecols
,转换器id, names
和dtype的问题。
该测试值的目的是确定该列的dtype
。这在dtype=None
的情况下是需要的。我不知道如果你指定dtype
是如何使用的。显然它还在运行。
在我没有跳过列的测试中,匹配转换器和测试值没有问题。