I am trying to implement training through an evolutionary algorithm for a neural network using TensorFlow and Keras, but I think something is not working with my implementation as it doesn't seem to improve in the task.
Currently, I am using the Tic-Tac-Toe game as the task. Essentially, my algorithm does the following:
It creates 2 models with the same architecture. In my case, these models consist of a positional embedding, a transformer block comprising an attention layer and a dense layer, and another transformer block with an attention layer and an LSTM-type network. Both transformers use causal masking, and their outputs are normalized. Finally, the output passes through a dense layer before the output layer. The output type is a probability distribution for the best move.
Once the "parent" models are created, they are saved as .h5 files. These models predict using the .predict function, receiving a one-dimensional array with the values of the board positions encoded so that 0 corresponds to player 1's positions, empty positions are represented by 1, and positions used by the opponent are represented by 2. At the end of each round, if there is a winner, their settings are saved with .save in an .h5 file, and it is adjusted randomly to create the opponent. The adjustment is conditioned by a penalty score that penalizes some actions such as choosing an occupied box, but ultimately, it is a random adjustment. If they tie or exceed 20 moves, both models are trained.
At first, it seems like they are improving, but almost immediately they plateau. I will leave the code I am using for the adjustment, hoping someone can tell me what is failing.
Code:
def training(model, penalization): # Variable initialization aleatory_value = tf.random.normal(shape=(1,), mean=0, stddev=3.0) learning_rate = 0.025 if penalization <= 1: learning_rate = 0.0025 aleatory_value = tf.random.normal(shape=(1,), mean=0, stddev=1.0) if penalization >= 2: learning_rate = 0.4 aleatory_value = tf.random.normal(shape=(1,), mean=0, stddev=3.0) # Generate a random decimal value between -3.0 and 3.0 with 4 decimal places of precision penalty = tf.cast(penalization, tf.float32) # Convert the penalty to float32 for layer in model.layers: if isinstance(layer, tf.keras.layers.Dense): for neuron in layer.weights: current_weight = neuron.numpy() adjusted_weight = current_weight * (1 - learning_rate * penalty) + learning_rate * aleatory_value neuron.assign(adjusted_weight) model.save("model1.h5") I'm not sure if this method really works for anything since I built it from scratch, and consider that for the real task, it is impossible to obtain a labeled dataset.