将对象列表的字典转换为pandas数据框架



我有一个以以下组织格式存储的文件:

Dictionary
List
Object
Attribute

具体看起来像这样:

dict = {
'0': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'1': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'2': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'3': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
}

TestObject对象定义为:

import random
class TestObject:
def __init__(self):
self.id = random.randint()
self.date = random.randint()
self.size = random.randint()

本例中的属性并不重要,它们只是占位符。我所关心的是将这种数据格式转换为数据框架。具体来说,我希望按照以下格式组织数据:

|key|  object  |  id  | date | size |
|-- |  ------  | ---- | ---- | ---- |
| 0 |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
| 1 |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
| 2 |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
| 3 |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |
|   |TestObject| rand | rand | rand |

我找到了将列表字典转换为数据帧的方法:

pandas.DataFrame.from_dict(dictionary)

但是在这种情况下,我感兴趣的是从存储在列表中的对象中提取属性。

可以使用列表推导式:

pd.DataFrame([(k, o, o.id, o.date, o.size)
for k, l in dic.items() for o in l],
columns=['key', 'object', 'id', 'date', 'size']
)

您首先需要修复初始代码中的一些东西:

import random
class TestObject:
def __init__(self):
self.id = random.randint(0,1)   # randint has 2 mandatory parameters
self.date = random.randint(0,1) #
self.size = random.randint(0,1) #
# better use "dic", "dict" is a python builtin
dic = {
'0': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'1': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'2': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'3': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
}

示例输出:

key                                          object  id  date  size
0    0  <__main__.TestObject object at 0x7fc79371af10>   0     0     0
1    0  <__main__.TestObject object at 0x7fc79371aeb0>   1     0     1
2    0  <__main__.TestObject object at 0x7fc79371af70>   1     0     0
3    0  <__main__.TestObject object at 0x7fc79371c040>   1     0     1
4    0  <__main__.TestObject object at 0x7fc79371c0d0>   1     1     0
5    1  <__main__.TestObject object at 0x7fc79371c220>   1     1     1
6    1  <__main__.TestObject object at 0x7fc79371c1c0>   1     1     0
7    1  <__main__.TestObject object at 0x7fc79371c310>   0     1     0
8    1  <__main__.TestObject object at 0x7fc79371c400>   0     1     0
9    1  <__main__.TestObject object at 0x7fc79371c370>   0     0     1
10   2  <__main__.TestObject object at 0x7fc79371c4f0>   1     1     0
11   2  <__main__.TestObject object at 0x7fc79371c490>   0     0     1
12   2  <__main__.TestObject object at 0x7fc79371c5e0>   1     0     0
13   2  <__main__.TestObject object at 0x7fc79371c580>   1     0     1
14   2  <__main__.TestObject object at 0x7fc79371c640>   0     1     1
15   3  <__main__.TestObject object at 0x7fc79371c3d0>   0     1     1
16   3  <__main__.TestObject object at 0x7fc79371c730>   1     1     1
17   3  <__main__.TestObject object at 0x7fc79371c880>   1     0     1
18   3  <__main__.TestObject object at 0x7fc79371c850>   0     1     0
19   3  <__main__.TestObject object at 0x7fc79371c9a0>   0     1     1

在python中,每个对象都包含一个__dict__属性,该属性列出了所有属性及其值:

print(pd.DataFrame(TestObject().__dict__, index=[0]))
id  date  size
0   0     0     0

使用字典推导式,您可以轻松地实现您的目标,而不必指定所需的所有属性并添加带有类名的对象列:

not_nested_dict = {(key, n): {'object': obj.__class__.__name__, **obj.__dict__} for key, value in dict_example.items() for n, obj in enumerate(value)}
print(not_nested_dict)
{('0', 0): {'object': 'TestObject', 'id': 1, 'date': 0, 'size': 0}, ('0', 1): {'object': 'TestObject', 'id': 0, 'date': 1, 'size': 0}, ...

然后调用pd。使用您的新字典并将其转置:

print(pd.DataFrame.from_dict(not_nested_dict).T)
object id date size
0 0  TestObject  0    1    1
1  TestObject  0    0    0
2  TestObject  0    0    0
3  TestObject  0    0    0
4  TestObject  1    1    1
1 0  TestObject  0    0    0
1  TestObject  0    1    1
2  TestObject  0    0    1
3  TestObject  0    1    1
4  TestObject  1    0    0
2 0  TestObject  1    0    0
1  TestObject  0    1    0
2  TestObject  1    0    1
3  TestObject  1    0    0
4  TestObject  1    1    1
3 0  TestObject  0    0    0
1  TestObject  1    1    1
2  TestObject  1    0    0
3  TestObject  1    1    1
4  TestObject  1    1    0

相关内容

  • 没有找到相关文章

最新更新