(Python)使用numpy.genfromtxt填充数据列表(不同的数据类型)

我有一个数据.txt像这样

16.37.235.153|119.222.242.130|38673|161|17|62|4646|
16.37.235.153|119.222.242.112|56388|161|17|62|4646|
16.37.235.200|16.37.235.153|59009|514|17|143|21271|

我想得到一个带有以下表格的列表：

list=[['16.37.235.153','119.222.242.130',38673,161,17,62,4646]
['16.37.235.153','119.222.242.112',56388,161,17,62,4646]
['16.37.235.200','16.37.235.153',59009,514,17,143,21271]]

我尝试将numpy.genfromtxt与dtype=None一起使用，但后来我得到：

VisibleDeprecationWarning: Reading unicode strings without specifying the encoding argument is deprecated. Set the encoding, use None for the system default.
list = numpy.genfromtxt('results.rw', dtype=None, delimiter = '|')

而这个作为列表：

[['8.254.200.14' 'False']
['8.254.200.14' 'False']
['8.254.200.46' 'False']
...
['217.243.224.144' 'False']
['217.243.224.144' 'False']
['217.243.224.144' 'False']]

感谢每一个帮助，提前谢谢你。

问候:)

In [71]: txt = '''16.37.235.153|119.222.242.130|38673|161|17|62|4646|
...: 16.37.235.153|119.222.242.112|56388|161|17|62|4646|
...: 16.37.235.200|16.37.235.153|59009|514|17|143|21271|
...: '''

encoding警告令人讨厌，但并不重要。

使用 dtype=None，您应该得到一个结构化数组，每列一个field：

In [74]: data = np.genfromtxt(txt.splitlines(), encoding=None, dtype=None,delimiter='|')
In [75]: data
Out[75]: 
array([('16.37.235.153', '119.222.242.130', 38673, 161, 17,  62,  4646, False),
('16.37.235.153', '119.222.242.112', 56388, 161, 17,  62,  4646, False),
('16.37.235.200', '16.37.235.153', 59009, 514, 17, 143, 21271, False)],
dtype=[('f0', '<U13'), ('f1', '<U15'), ('f2', '<i8'), ('f3', '<i8'), ('f4', '<i8'), ('f5', '<i8'), ('f6', '<i8'), ('f7', '?')])

这是 1d。

并作为列表(或元组(列表

In [76]: data.tolist()
Out[76]: 
[('16.37.235.153', '119.222.242.130', 38673, 161, 17, 62, 4646, False),
('16.37.235.153', '119.222.242.112', 56388, 161, 17, 62, 4646, False),
('16.37.235.200', '16.37.235.153', 59009, 514, 17, 143, 21271, False)]

看起来它正在用布尔False填充最后一个字段(在最后一个|之后(。这可能可以通过一些filling参数进行更改。

或者限制使用列以省略它

In [77]: data = np.genfromtxt(txt.splitlines(), encoding=None, dtype=None,delimiter='|',u
...: secols=range(7))
In [78]: data
Out[78]: 
array([('16.37.235.153', '119.222.242.130', 38673, 161, 17,  62,  4646),
('16.37.235.153', '119.222.242.112', 56388, 161, 17,  62,  4646),
('16.37.235.200', '16.37.235.153', 59009, 514, 17, 143, 21271)],
dtype=[('f0', '<U13'), ('f1', '<U15'), ('f2', '<i8'), ('f3', '<i8'), ('f4', '<i8'), ('f5', '<i8'), ('f6', '<i8')])

你可能会使用类似的东西更接近

a = np.genfromtxt('data.txt', dtype=['S16', 'S16', 'i8', 'i8', 'i8', 'i8','i8'], delimiter='|')

但是您似乎混合了字符串和整数，所以也许您应该使用两个数组

编辑您的进一步(不相关的问题(：

获取 numpy 数组中项目频率的一种方法是对由 where 或相等性测试生成的布尔数组求和。

a = np.random.randint(1, 10, (20000000,2))
(a == 7).sum()
=> 4442874
(a[:,0] == 7).sum()
=> 2220661
(a[:,1] == 7).sum()
=> 2222213
etc.

谢谢你们，我已经修复了它。我在genfromtxt中使用了错误的文件。我使用的文件只有 1 列...

另一个问题：有人可以告诉我如何计算 numpy ndarray 中值的出现次数。

相关内容

最新更新

热门标签：