0
\$\begingroup\$

I'm working on an AI system in Unity where the enemy alternates between patrolling, chasing the player, and hiding behind objects. I’m trying to implement a system where the AI not only decides whether to engage or hide but also dynamically chooses whether to run or walk based on the situation.

Here’s the main challenge I’m facing:

The AI should dynamically decide when to chase the player, hide behind an object, or continue patrolling.

The AI also needs to make a decision on whether to run or walk based on its current state (e.g., running when chasing or if the player is close, walking while patrolling).

Currently, the decision-making feels too random as I'm using Random.Range, and I want to make it more intelligent and reactive.

Below is a snippet of my code, where the AI switches between patrolling, chasing, hiding, and adjusts movement speed accordingly:

public class EnemyAI : MonoBehaviour { public Transform[] patrolPoints; public Transform hideSpot; public float patrolSpeed = 3f; // Walking speed during patrol public float chaseSpeed = 5f; // Running speed while chasing public float detectionRange = 10f; public float engageRange = 5f; public Transform player; private int currentPatrolIndex; private bool isHiding = false; private bool isChasing = false; private float[] qTable = new float[2]; // Q-learning table private enum State { Patrol, Chase, Hide } private State currentState; void Start() { currentPatrolIndex = 0; currentState = State.Patrol; StartCoroutine(Patrol()); } void Update() { float distanceToPlayer = Vector3.Distance(transform.position, player.position); if (distanceToPlayer < detectionRange && !isHiding) { // AI makes a decision whether to chase or hide currentState = MakeDecision(distanceToPlayer); } switch (currentState) { case State.Patrol: PatrolBehavior(); break; case State.Chase: ChaseBehavior(distanceToPlayer); break; case State.Hide: HideBehavior(); break; } } // Q-learning decision-making (choosing between chasing and hiding) private State MakeDecision(float distanceToPlayer) { int action = 0; if (distanceToPlayer < engageRange) { action = Random.Range(0, 2); // Random decision for now } // Update Q-table and return the chosen action if (action == 0) { qTable[0] += 0.1f; // Preference for hiding return State.Hide; } else { qTable[1] += 0.1f; // Preference for chasing return State.Chase; } } // Patrol behavior (walking between points) void PatrolBehavior() { MoveTowards(patrolPoints[currentPatrolIndex].position, patrolSpeed); if (Vector3.Distance(transform.position, patrolPoints[currentPatrolIndex].position) < 1f) { currentPatrolIndex = (currentPatrolIndex + 1) % patrolPoints.Length; } } // Chase behavior (running or walking based on distance) void ChaseBehavior(float distanceToPlayer) { if (distanceToPlayer < engageRange) { MoveTowards(player.position, chaseSpeed); // Run if close } else { MoveTowards(player.position, patrolSpeed); // Walk if farther away } if (distanceToPlayer > detectionRange) { currentState = State.Patrol; // Return to patrol if player escapes } } // Hide behavior void HideBehavior() { MoveTowards(hideSpot.position, patrolSpeed); // Walk towards hiding spot if (Vector3.Distance(transform.position, hideSpot.position) < 1f) { StartCoroutine(StayHidden()); } } void MoveTowards(Vector3 target, float speed) { Vector3 direction = (target - transform.position).normalized; transform.position += direction * speed * Time.deltaTime; } IEnumerator StayHidden() { yield return new WaitForSeconds(3f); currentState = State.Patrol; } } 

What I’m trying to achieve:

  1. Dynamic Decision-Making: I’d like the AI to intelligently choose between running or walking based on the situation. For example, it should run when chasing the player but walk during patrols or when it’s not in immediate danger.

  2. Cover System: The AI should use cover dynamically, and currently, it moves towards a single hide spot (hideSpot). I'd like to make this system more advanced in the future.

  3. Improved Learning: The AI uses a basic Q-learning approach right now, but I’m not sure how to enhance this to make the decisions feel more natural and based on player interaction.

Additionally, the AI should also shoot at the player. This functionality is not currently included in the script but is essential for achieving the intended behavior, similar to what is found in Call of Duty: Modern Warfare.

Any advice or suggestions on improving the AI's movement speed decision-making, cover usage, or enhancing the Q-learning system would be greatly appreciated.

\$\endgroup\$

1 Answer 1

1
\$\begingroup\$

Reinforcement Learning

You have not implemented Q-learning correctly. You're incrementing a value but not doing anything with the value. Additionally, Q-learning is a type of reinforcement learning, where the agent is supposed to learn how particular decisions in a given state help it reach a "reward" (a positive outcome of some kind) in a future state. The only state information that you are tracking is distance from the player, which is probably not enough to make meaningful decisions, and you don't currently have any method of modeling "rewards" (how the AI knows when it's made a good decision).

Let's think about some criteria that might help the enemy make better make decisions about whether to chase or hide from the player:

  • Whether it can currently see the player, or how long it's been since it's last been able to see the player.
  • How much health the enemy currently has.
  • Whether the player is in range of the enemy's current weapon.
  • Whether the AI is currently reloading (if applicable).
  • Whether there is any source of danger nearby (such as a grenade or explosive barrel).

You'll probably need a Search state for when the AI has lost track of the player but has a general idea of where the player is. You might also need separate states for Take Cover and Flee.

If you really want to implement reinforcement learning, make sure you spend enough time learning about this technique to fully understand how it works and how to implement it. Think about how you might measure "rewards" for when the AI has made a good decision.

For example, you might specify the following rewards (note some rewards are negative to discourage an outcome):

  • Shooting the player (1 point)
  • Killing the player (3 points)
  • Getting shot (-2 points)
  • Getting hit by a grade (-3 points)
  • Dying (-5 points)

As you can probably imagine, to get optimal learning results, you'll need to properly balance the reward weights. If you want a more aggressive AI, assign more positive weight to harming the player than negative weight from taking damage. If you want a more defensive AI, do the opposite.

Basic Decision Making

Setting aside the reinforcement learning entirely - there are various strategies for implementing decisions. You should research Behavior Trees, Goal-Oriented Action Planning (GOAP), and the utility system.

I tend to use a GOAP-like system. I define a set of broad Goals for the AI, and a set of Actions that the AI can use to carry out goals. Each goal has its own "Priority" algorithm which takes current game state into account and determines how high-priority that goal currently is. The AI periodically checks each goal to see which goal currently has the highest priority, then activates that goal if it is not already active. Once the goal activates, its internal logic may consist of a finite state machine or another mechanism to choose and activate the actions that will help achieve the goal.

Goals

For example, let's say we have a Pursue goal, an Attack goal, and a Flee goal. Their priority formulas might look like this (flawed) example:

  • Pursue: priority = distanceToTarget * pursueScaleFactor
  • Attack: priority = 100
  • Flee: priority = ((maxHP - currentHP) / maxHP) * fleeScaleFactor

In this simple example, the AI's desire to pursue increases with distance to the target, while the AI's desire to flee increases as the AI loses health. The desire to attack is fixed at 100, so we weight the other two formulas relative to 100. We could make a particular enemy more or less aggressive by tweaking the scale factor values.

I said this example was flawed; why is that? "Flee" generally means "run as far away as possible". However, with the above formulas, the desire to Pursue increases as the enemy moves further away. Eventually it will reach an equilibrium point where it constantly switches back and forth between the two goals, causing it to keep turning around. You can observe this type of behavioral flaw even in some big-budget games.

Actions

Each Goal will have a corresponding set of one or more Actions. For the sake of example, let's say we create these actions:

  • Move to destination
  • Hold position
  • Take cover
  • Flank (circle around the target)

Each action will have some logic which dictates how the action is carried out, and under what criteria the action is considered complete. An action may have required parameters, such as the "destination" for the "Move to destination" action. The functionality for each goal then consists of a state machine or similar mechanism for choosing actions and supplying the necessary parameters.

The Pursue goal would consist only of the "Move to destination" action, with the destination periodically updated to be the target's position. Likewise, the Flee goal would consist only of the "Move to destination" action, but with the destination set as far away from any hostiles as possible. The Attack goal would use all of the actions, with simple logic to transition between actions:

  • If we don't have line-of-sight to target, use "Move to destination".
  • If we do have line-of-sight, use "Hold position" or randomly use "Flank".
  • If we need to reload, take cover.

I personally like to keep the actual "use my weapons" logic separate from the movement logic, so that the AI can attack any time it is in range and has LOS. This way we don't need, for example, separate actions for "Hold position and attack" and "Flank and attack".

\$\endgroup\$
1
  • 1
    \$\begingroup\$ I'm not happy with how abstract and rambling this answer is, but I've spent too much time on it already so I'll have to leave it as-is. \$\endgroup\$ Commented Oct 4, 2024 at 2:08

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.