我首先要说我不是Python开发人员。 但是我需要合成数据,并尝试使用合成数据保险箱(https://github.com/sdv-dev/SDV(。
我安装了Python 3.7(在Windows上,我现在在我的笔记本电脑上做这件事,同时学习它是如何工作的(。
蟒蛇 --版本
蟒蛇 3.7.6
我能够使用 pip 下载 sdv 包,我可以运行前几行演示代码来加载和查看元数据和演示表(。 但是,当我在演示中到达这些行时:
sdv = SDV()
sdv.fit(metadata, tables)
我收到以下错误:
类型错误: 无法键入从 [datetime64[ns]] 到 [int32] 的类似日期时间
我根本没有修改 git 中的任何代码,也没有尝试过我自己的任何代码。 我实际上只是想让演示按照自述文件中的说明工作。 我刚刚安装了该软件包,并且正在完成第一个示例。 有人尝试过这个并遇到同样的问题吗? 关于我可以做些什么来通过此错误的任何想法?
全栈跟踪为:
sdv.fit(metadata, tables)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:toolsPython3.7libsite-packagessdvsdv.py", line 69, in fit
self.modeler.model_database(tables)
File "C:toolsPython3.7libsite-packagessdvmodeler.py", line 128, in model_database
self.cpa(table_name, tables)
File "C:toolsPython3.7libsite-packagessdvmodeler.py", line 99, in cpa
child_table = self.cpa(child_name, tables, child_key)
File "C:toolsPython3.7libsite-packagessdvmodeler.py", line 99, in cpa
child_table = self.cpa(child_name, tables, child_key)
File "C:toolsPython3.7libsite-packagessdvmodeler.py", line 92, in cpa
extended = self.metadata.transform(table_name, table)
File "C:toolsPython3.7libsite-packagessdvmetadata.py", line 477, in transform
hyper_transformer.fit(data[fields])
File "C:toolsPython3.7libsite-packagesrdthyper_transformer.py", line 128, in fit
transformer.fit(column)
File "C:toolsPython3.7libsite-packagesrdttransformersdatetime.py", line 55, in fit
transformed = self._transform(data)
File "C:toolsPython3.7libsite-packagesrdttransformersdatetime.py", line 40, in _transform
integers = datetimes.astype(int).astype(float).values
File "C:toolsPython3.7libsite-packagespandascoregeneric.py", line 5691, in astype
**kwargs)
File "C:toolsPython3.7libsite-packagespandascoreinternalsmanagers.py", line 531, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "C:toolsPython3.7libsite-packagespandascoreinternalsmanagers.py", line 395, in apply
applied = getattr(b, f)(**kwargs)
File "C:toolsPython3.7libsite-packagespandascoreinternalsblocks.py", line 534, in astype
**kwargs)
File "C:toolsPython3.7libsite-packagespandascoreinternalsblocks.py", line 2139, in _astype
return super(DatetimeBlock, self)._astype(dtype=dtype, **kwargs)
File "C:toolsPython3.7libsite-packagespandascoreinternalsblocks.py", line 633, in _astype
values = astype_nansafe(values.ravel(), dtype, copy=True)
File "C:toolsPython3.7libsite-packagespandascoredtypescast.py", line 646, in astype_nansafe
to_dtype=dtype))
TypeError: cannot astype a datetimelike from [datetime64[ns]] to [int32]
以下是我的会话的完整输出:
from sdv import load_demo
metadata, tables = load_demo(metadata=True)
metadata.to_dict()
{
"tables": {
"users": {
"primary_key": "user_id",
"fields": {
"user_id": {
"type": "id",
"subtype": "integer"
},
"country": {
"type": "categorical"
},
"gender": {
"type": "categorical"
},
"age": {
"type": "numerical",
"subtype": "integer"
}
}
},
"sessions": {
"primary_key": "session_id",
"fields": {
"session_id": {
"type": "id",
"subtype": "integer"
},
"user_id": {
"ref": {
"field": "user_id",
"table": "users"
},
"type": "id",
"subtype": "integer"
},
"device": {
"type": "categorical"
},
"os": {
"type": "categorical"
}
}
},
"transactions": {
"primary_key": "transaction_id",
"fields": {
"transaction_id": {
"type": "id",
"subtype": "integer"
},
"session_id": {
"ref": {
"field": "session_id",
"table": "sessions"
},
"type": "id",
"subtype": "integer"
},
"timestamp": {
"type": "datetime",
"format": "%Y-%m-%d"
},
"amount": {
"type": "numerical",
"subtype": "float"
},
"approved": {
"type": "boolean"
}
}
}
}
}
>>> tables
{'users': user_id country gender age
0 0 USA M 34
1 1 UK F 23
2 2 ES None 44
3 3 UK M 22
4 4 USA F 54
5 5 DE M 57
6 6 BG F 45
7 7 ES None 41
8 8 FR F 23
9 9 UK None 30, 'sessions': session_id user_id device os
0 0 0 mobile android
1 1 1 tablet ios
2 2 1 tablet android
3 3 2 mobile android
4 4 4 mobile ios
5 5 5 mobile android
6 6 6 mobile ios
7 7 6 tablet ios
8 8 6 mobile ios
9 9 8 tablet ios, 'transactions': transaction_id session_id timestamp amount approved
0 0 0 2019-01-01 12:34:32 100.0 True
1 1 0 2019-01-01 12:42:21 55.3 True
2 2 1 2019-01-07 17:23:11 79.5 True
3 3 3 2019-01-10 11:08:57 112.1 False
4 4 5 2019-01-10 21:54:08 110.0 False
5 5 5 2019-01-11 11:21:20 76.3 True
6 6 7 2019-01-22 14:44:10 89.5 True
7 7 8 2019-01-23 10:14:09 132.1 False
8 8 9 2019-01-27 16:09:17 68.0 True
9 9 9 2019-01-29 12:10:48 99.9 True}
metadata.visualize()
<graphviz.dot.Digraph object at 0x00000196E8755488>
from sdv import SDV
sdv = SDV()
sdv.fit(metadata, tables)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:toolsPython3.7libsite-packagessdvsdv.py", line 69, in fit
self.modeler.model_database(tables)
File "C:toolsPython3.7libsite-packagessdvmodeler.py", line 128, in model_database
self.cpa(table_name, tables)
File "C:toolsPython3.7libsite-packagessdvmodeler.py", line 99, in cpa
child_table = self.cpa(child_name, tables, child_key)
File "C:toolsPython3.7libsite-packagessdvmodeler.py", line 99, in cpa
child_table = self.cpa(child_name, tables, child_key)
File "C:toolsPython3.7libsite-packagessdvmodeler.py", line 92, in cpa
extended = self.metadata.transform(table_name, table)
File "C:toolsPython3.7libsite-packagessdvmetadata.py", line 477, in transform
hyper_transformer.fit(data[fields])
File "C:toolsPython3.7libsite-packagesrdthyper_transformer.py", line 128, in fit
transformer.fit(column)
File "C:toolsPython3.7libsite-packagesrdttransformersdatetime.py", line 55, in fit
transformed = self._transform(data)
File "C:toolsPython3.7libsite-packagesrdttransformersdatetime.py", line 40, in _transform
integers = datetimes.astype(int).astype(float).values
File "C:toolsPython3.7libsite-packagespandascoregeneric.py", line 5691, in astype
**kwargs)
File "C:toolsPython3.7libsite-packagespandascoreinternalsmanagers.py", line 531, in astype
return self.apply('astype', dtype=dtype, **kwargs)
File "C:toolsPython3.7libsite-packagespandascoreinternalsmanagers.py", line 395, in apply
applied = getattr(b, f)(**kwargs)
File "C:toolsPython3.7libsite-packagespandascoreinternalsblocks.py", line 534, in astype
**kwargs)
File "C:toolsPython3.7libsite-packagespandascoreinternalsblocks.py", line 2139, in _astype
return super(DatetimeBlock, self)._astype(dtype=dtype, **kwargs)
File "C:toolsPython3.7libsite-packagespandascoreinternalsblocks.py", line 633, in _astype
values = astype_nansafe(values.ravel(), dtype, copy=True)
File "C:toolsPython3.7libsite-packagespandascoredtypescast.py", line 646, in astype_nansafe
to_dtype=dtype))
TypeError: cannot astype a datetimelike from [datetime64[ns]] to [int32]
实际上,我找到了一个解决方案 - 不是 Python 开发人员,不确定它是否是最好的解决方案,但它清除了错误。
在第 41 行的 datetime.py 代码中,我更改了:
integers = datetimes.astype(int).astype(float).values
自
integers = datetimes.astype(np.int64).astype(float).values
不过,我认为有一种方法可以在不更改项目代码的情况下解决此问题(这意味着这不是我的代码,这是我下载的包(,但是我现在能够继续我的研究。