我有一个以以下组织格式存储的文件:
Dictionary
List
Object
Attribute
具体看起来像这样:
dict = {
'0': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'1': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'2': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
'3': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
}
TestObject对象定义为:
import random
class TestObject:
def __init__(self):
self.id = random.randint()
self.date = random.randint()
self.size = random.randint()
本例中的属性并不重要,它们只是占位符。我所关心的是将这种数据格式转换为数据框架。具体来说,我希望按照以下格式组织数据:
|key| object | id | date | size |
|-- | ------ | ---- | ---- | ---- |
| 0 |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| 1 |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| 2 |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| 3 |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
| |TestObject| rand | rand | rand |
我找到了将列表字典转换为数据帧的方法:
pandas.DataFrame.from_dict(dictionary)
但是在这种情况下,我感兴趣的是从存储在列表中的对象中提取属性。
可以使用列表推导式:
pd.DataFrame([(k, o, o.id, o.date, o.size)
for k, l in dic.items() for o in l],
columns=['key', 'object', 'id', 'date', 'size']
)
您首先需要修复初始代码中的一些东西:
import random
class TestObject:
def __init__(self):
self.id = random.randint(0,1) # randint has 2 mandatory parameters
self.date = random.randint(0,1) #
self.size = random.randint(0,1) #
# better use "dic", "dict" is a python builtin
dic = {
'0': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'1': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'2': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()],
'3': [TestObject(),TestObject(),TestObject(),TestObject(),TestObject()]
}
示例输出:
key object id date size
0 0 <__main__.TestObject object at 0x7fc79371af10> 0 0 0
1 0 <__main__.TestObject object at 0x7fc79371aeb0> 1 0 1
2 0 <__main__.TestObject object at 0x7fc79371af70> 1 0 0
3 0 <__main__.TestObject object at 0x7fc79371c040> 1 0 1
4 0 <__main__.TestObject object at 0x7fc79371c0d0> 1 1 0
5 1 <__main__.TestObject object at 0x7fc79371c220> 1 1 1
6 1 <__main__.TestObject object at 0x7fc79371c1c0> 1 1 0
7 1 <__main__.TestObject object at 0x7fc79371c310> 0 1 0
8 1 <__main__.TestObject object at 0x7fc79371c400> 0 1 0
9 1 <__main__.TestObject object at 0x7fc79371c370> 0 0 1
10 2 <__main__.TestObject object at 0x7fc79371c4f0> 1 1 0
11 2 <__main__.TestObject object at 0x7fc79371c490> 0 0 1
12 2 <__main__.TestObject object at 0x7fc79371c5e0> 1 0 0
13 2 <__main__.TestObject object at 0x7fc79371c580> 1 0 1
14 2 <__main__.TestObject object at 0x7fc79371c640> 0 1 1
15 3 <__main__.TestObject object at 0x7fc79371c3d0> 0 1 1
16 3 <__main__.TestObject object at 0x7fc79371c730> 1 1 1
17 3 <__main__.TestObject object at 0x7fc79371c880> 1 0 1
18 3 <__main__.TestObject object at 0x7fc79371c850> 0 1 0
19 3 <__main__.TestObject object at 0x7fc79371c9a0> 0 1 1
在python中,每个对象都包含一个__dict__
属性,该属性列出了所有属性及其值:
print(pd.DataFrame(TestObject().__dict__, index=[0]))
id date size
0 0 0 0
使用字典推导式,您可以轻松地实现您的目标,而不必指定所需的所有属性并添加带有类名的对象列:
not_nested_dict = {(key, n): {'object': obj.__class__.__name__, **obj.__dict__} for key, value in dict_example.items() for n, obj in enumerate(value)}
print(not_nested_dict)
{('0', 0): {'object': 'TestObject', 'id': 1, 'date': 0, 'size': 0}, ('0', 1): {'object': 'TestObject', 'id': 0, 'date': 1, 'size': 0}, ...
然后调用pd。使用您的新字典并将其转置:
print(pd.DataFrame.from_dict(not_nested_dict).T)
object id date size
0 0 TestObject 0 1 1
1 TestObject 0 0 0
2 TestObject 0 0 0
3 TestObject 0 0 0
4 TestObject 1 1 1
1 0 TestObject 0 0 0
1 TestObject 0 1 1
2 TestObject 0 0 1
3 TestObject 0 1 1
4 TestObject 1 0 0
2 0 TestObject 1 0 0
1 TestObject 0 1 0
2 TestObject 1 0 1
3 TestObject 1 0 0
4 TestObject 1 1 1
3 0 TestObject 0 0 0
1 TestObject 1 1 1
2 TestObject 1 0 0
3 TestObject 1 1 1
4 TestObject 1 1 0