TF特工在训练时出现malloc错误



在尝试使用tf代理库训练DQN时,我遇到了malloc错误问题。

规格:M1 mac os 12TF 2.6.2,TF代理0.10.0Python 3.8(与3.9的结果相同(我使用自定义环境,包装到TF环境中。其他一切都是tf代理的默认组件,没有任何自定义。

错误出现在9/10次跑步中,有时训练循环会成功结束。但如果失败,它总是在最后一行代码上调用agent_tf.train(experience(失败。

非常感谢你的建议!

错误:

python3(11957,0x307c33000) malloc: Incorrect checksum for freed object 0x7fc8a9875110: probably modified after being freed.
Corrupt value: 0x7fc8ce5c5a80
python3(11957,0x307c33000) malloc: *** set a breakpoint in malloc_error_break to debug
[1]    11957 abort      python3 main_loop.py
/Users/jankolnik/miniconda3/envs/ml_int/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 3 leaked semaphore objects to clean up at shutdown 

代码:

# ENV init
env_tf = environment_tf.Environment(
n_actions=len(environment.Actions),
random_start_on_close=params.random_start_on_close.value,
bars=params.bars_count.value,
data=train_data,
reward_on_close_only=params.reward_on_close_only.value
)
env_tf = tf_py_environment.TFPyEnvironment(env_tf)
# NET init
net = tf_agents.networks.q_network.QNetwork(input_tensor_spec=env_tf.observation_spec(),
action_spec=env_tf.action_spec(),
fc_layer_params=(50, 2),
activation_fn=tf.keras.activations.relu)
tgt_net = tf_agents.networks.q_network.QNetwork(input_tensor_spec=env_tf.observation_spec(),
action_spec=env_tf.action_spec(),
fc_layer_params=(50, 2),
activation_fn=tf.keras.activations.relu)
# AGENT init
train_step_counter = tf.Variable(0)
global_step = tf.compat.v1.train.get_or_create_global_step()
epsilon = tf.compat.v1.train.polynomial_decay(
params.epsilon_start.value,
global_step,
decay_steps=params.episodes.value * 5,
end_learning_rate=params.epsilon_stop.value)
agent_tf = DqnAgent(
action_spec=env_tf.action_spec(),
gamma=params.gamma.value,
target_update_period=params.target_net_sync.value,
q_network=net,
target_q_network=tgt_net,
optimizer=tf.keras.optimizers.Adam(learning_rate=params.learning_rate.value),
td_errors_loss_fn=common.element_wise_squared_loss,
train_step_counter=train_step_counter,
time_step_spec=env_tf.time_step_spec(),
epsilon_greedy=epsilon
)
agent_tf.initialize()
# MEMORY init
memory = tf_uniform_replay_buffer.TFUniformReplayBuffer(
batch_size=1,
max_length=params.reply_size.value,
data_spec=agent_tf.collect_data_spec
)
# DRIVER TF
train_metrics = [
tf_metrics.NumberOfEpisodes(),
tf_metrics.EnvironmentSteps(),
tf_metrics.AverageReturnMetric(),
tf_metrics.AverageEpisodeLengthMetric()
]
driver = dynamic_episode_driver.DynamicEpisodeDriver(num_episodes=1,
env=env_tf,
policy=agent_tf.collect_policy,
observers=[memory.add_batch] + train_metrics)
# MEMORY –> DATASET
sample = memory.as_dataset(sample_batch_size=params.batch_size.value,
single_deterministic_pass=False,
num_parallel_calls=3,
num_steps=2).prefetch(3)
iterator = iter(sample)
agent_tf.train = common.function(agent_tf.train)
# MAIN LOOP
time_step = env_tf.reset()
driver.run(num_episodes=30, time_step=time_step)
for i in tqdm(range(params.episodes.value)):
time_step, _ = driver.run(time_step=time_step)
experience, _ = next(iterator)
loss, _ = agent_tf.train(experience)

这似乎是macbook使用metal api的已知错误(比如windows如何使用opengl、vulkan或directx(,您将无法在您的端上修复它,这是他们如何在intel macs和m1 macs上实现metal的问题。

https://github.com/apple/tensorflow_macos/issues/19

https://github.com/apple/tensorflow_macos/issues/177

由于无政府状态,您链接的问题发布者指示我使用二进制Bazel安装程序,然后点击pip install tf-agents==0.10.0。现在我可以使用tf代理和苹果硅的tf,这是FAST!

最新更新