The key idea is that A* is a cost minimization algorithm. The standard use in path finding, seeks to minimize the total distance travelled.
In your case, you don't want to just get the shortest path, you want the shortest path with the least number of turns. You can accomplish that by including an additional cost for path decisions that introduce turns.
The most straight forward way to do this: when considering linking to a new node, compare its position to the node prior to the current node. If they differ in both x & y, there must be a turn, so a penalty is applied. If either the x or y is the same, no turn is added, so no additional cost would be applied.
For instance, say your path under consideration is: ... (5,7) (5,8) and you are considering linking to the nodes: (4,8) (5,9) (6,8).
The distance for each of these options is the same (it's 1). But (4 8) and (6 8) both require a turn whereas (5 9) maintains a straight path. So the effective distance for (4 8) and (6 8) would become 1 + penalty and the effective distance would remain unchanged.
What value do you assign to the penalty? Usually you can simply set it to something a few orders of magnitude smaller than your unit distance and things will be fine. So for the example above, I might start with a penalty value of 0.0001.
There's a potential problem though: there's no way to differentiate costs that arise from actual distance as opposed to costs that arise from turns. By taking the simple approach, they're both just numerical values that get summed together. For most situations, this is just a theoretical problem. Using the values above, it would take 10000 turns to equal a unit distance.
A more robust solution would be to implement cost in a way that separated distance & turns. A 2d value, a tuple, a pair, a struct with 2 fields, etc. Then when comparing two values, you'd have something like this:
if(v1.d < v2.d) // v1 is less because it is less distance than v2 else if(v2.d < v1.d) // v2 is less because it is less distance than v1 else if(v1.t < v2.t) // v1 is less because it has the same distance, but fewer turns than v2 else if(v2.t < v1.t) // v2 is less because it has the same distance, but fewer turns than v1 else // v1 & v2 have the same distance & the same number of turns
Note that in either approach, we are never decreasing the final weights of the edges between nodes in the graph (they stay the same for straight connections or increase slightly for turns). This means that your A* heuristic will still be admissible & doesn't require any changes. If instead of using a penalty for turns, we had tried to using a discount to straight connections, things wouldn't work out as nicely. We avoided that by working with rather than against the context of A* - it's a minimization function, so we used it to minimize the total distance & number of turns.