我有以下循环
# `results` are obtained from some mySQldb command.
for row in results:
print row
像这样打印元组:
('1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0)
('1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107)
('1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883)
('1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837)
我的问题是,我如何才能创造一个颠簸的和。数组看起来像这样:
array([['1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0],
['1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107],
['1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883],
['1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837]])
最后,narray的形状为:(4,8)
读入结构化数组:
In [30]:
a=[('1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0),
('1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107),
('1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883),
('1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837)]
np.array(a, dtype=('a10,a10,f4,f4,f4,f4,f4,f4'))
Out[30]:
array([('1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0),
('1A9N', 'RBP', 0.045626699924468994, 0.053926799446344376, 0.331932008266449, 0.04640309885144234, 4.413359874888556e-06, 0.5221070051193237),
('1AQ3', 'RBP', 0.044447898864746094, 0.20111200213432312, 0.26858100295066833, 0.004975699819624424, 1.2850499744171406e-12, 0.48088300228118896),
('1AQ4', 'RBP', 0.01772320084273815, 0.3637459874153137, 0.30899500846862793, 0.0016986100235953927, 0.0, 0.30783700942993164)],
dtype=[('f0', 'S10'), ('f1', 'S10'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<f4'), ('f7', '<f4')])
你可以把它们都放在object
dtype
:
In [46]:
np.array(a, dtype=object)
Out[46]:
array([['1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0],
['1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031,
4.41336e-06, 0.522107],
['1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757,
1.28505e-12, 0.480883],
['1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0,
0.307837]], dtype=object)
,但它不是理想的float
值,也可能导致不希望的行为:
In [48]:
b=np.array(a, dtype=object)
b[0]+b[1] #addition for float values and concatenation for string values
Out[48]:
array(['1A341A9N', 'RBPRBP', 0.0456267, 1.0539268, 0.331932, 0.0464031,
4.41336e-06, 0.522107], dtype=object)
pandas
也是一个备选方案:
In [43]:
import pandas as pd
print pd.DataFrame(a)
0 1 2 3 4 5 6 7
0 1A34 RBP 0.000000 1.000000 0.000000 0.000000 0.000000e+00 0.000000
1 1A9N RBP 0.045627 0.053927 0.331932 0.046403 4.413360e-06 0.522107
2 1AQ3 RBP 0.044448 0.201112 0.268581 0.004976 1.285050e-12 0.480883
3 1AQ4 RBP 0.017723 0.363746 0.308995 0.001699 0.000000e+00 0.307837
In [44]:
pd.DataFrame(a).dtypes
Out[44]:
0 object
1 object
2 float64
3 float64
4 float64
5 float64
6 float64
7 float64
dtype: object
允许列有不同的dtype