ticactoe AI玩得很糟糕!-极大极小算法(cs50 AI)中的可能错误



我目前正在做人工智能课程的cs50介绍,我需要完成井字游戏的几个功能才能运行。然而,当玩它的时候,人工智能玩得很糟糕,通常会在左上角挑选方块,我很确定这与我的极大极小函数有关。经过调试表明,变量foobar(为了最大化玩家和最小化对手,试图获得min-value(result(s, a))的最高值(不会改变并保持在它们的原始值-无穷大和无穷大。然而,我不明白为什么会发生这种情况。下面是代码,任何帮助都会很棒!

def minimax(board):
"""
Returns the optimal action for the current player on the board.
"""
#Checking if game is over
if terminal(board):
return None
else:
#Check whose turn it is
turn = player(board)
board_actions = actions(board)
if turn == 'X':
action_score_max = -math.inf
return_value_min = board_actions[0]
#return_value_max 
for a in board_actions:
foo = min_value(result(board, a))
if foo > action_score_max:
action_score_max = foo
return_value_max = a

return return_value_max
else:
action_score_min = math.inf
return_value_min = board_actions[0]
for a in board_actions:
bar = max_value(result(board, a))
if bar < action_score_min:
action_score_min = bar
return_value_min = a

return return_value_min


def max_value(board):
"""
Helper function for minimax (pick max value value of all routes)
"""
v = -math.inf
for action in actions(board):
v = max(v, min_value(result(board, action)))

return v

def min_value(board):
"""
Helper function for minimax (pick min value value of all routes)
"""
v = math.inf
for action in actions(board):
v = min(v, max_value(result(board, action)))
return v

正如对minimax函数的描述所示,它的工作是返回当前玩家的最佳移动,为此,您有两个辅助函数max_valuemin_value,它们是您应该实现逻辑的函数,以便它获得并返回最佳移动。

你可以做这样的事情-

def minimax(board):
"""
Returns the optimal action for the current player on the board.
"""
if terminal(board):
return None

if player(board) == O:
move = min_value(board)[1]
else:
move = max_value(board)[1]
return move
def max_value(board):
if terminal(board):
return [utility(board), None]
v = float('-inf')
best_move = None
for action in actions(board):
hypothetical_value = min_value(result(board, action))[0]
if hypothetical_value > v:
v = hypothetical_value
best_move = action
return [v, best_move]

def min_value(board):
if terminal(board):
return [utility(board), None]
v = float('inf')
best_move = None
for action in actions(board):
hypothetical_value = max_value(result(board, action))[0]
if hypothetical_value < v:
v = hypothetical_value
best_move = action
return [v, best_move]