I think "functional composition" tends to be a bit confusing.
By "compose" what we mean is piping the output of one function to the input of another.
Most modern programming languages have some facility for evaluating expressions, and we are accustomed to seeing composition occur in the form of Sqrt(Add(2, 2)), where the output of 'Add' forms the input for 'Sqrt'.
What's notable about this familiar form of composition is that the operands which form the ultimate input (in this case, a pair of '2's) must also be specified at the same time as the composition. You can use variables in place of literals, but you still have to provide something for the operands, as part of specifying the composition.
However, in functional languages, the composition operator allows these two functions to be composed without specifying anything for the operands.
The evaluation of AddAndSqrt = (Add ∘ Sqrt) gets the function pointers for both 'Add' and 'Sqrt' (so that these functions are not called in this expression, but instead their addresses are evaluated as function pointers, and then these are provided as operands to the composition operator), and returns a new function pointer, which takes two operands (effectively, the inputs to the 'Add' stage), and when called like so AddAndSqrt(2, 2), outputs the same result as would Sqrt(Add(2, 2)).
Behind the scenes, the output of the 'Add' stage is arranged so as to be piped to the input of the 'Sqrt' stage. That is what the composition operator does.
Now, composition is an associative operator simply because in the expression C(B(A(2, 2))) it doesn't matter whether you pipe A to B (yielding AB) then pipe AB to C (yielding ABC), or pipe B to C (yielding BC) then pipe A to BC (yielding ABC).
Or to put it another way, it doesn't matter if you write:
Result1 = B(A(2, 2)) Result2 = C(Result1) OR Result1 = A(2, 2) Result2 = C(B(Result1))
In both cases, the chain of calls you end up with is equivalent to C(B(A(2, 2))).
That's all it means for the composition operator to be associative.
All "operators" in mathematics have a set of "properties" - like associativity - that concern their behaviour under algebraic rearrangement. That is, concerning whether different kinds of rearrangement within an expression cause the result to change, or whether the result stays the same despite the rearrangement.
Has that answered the question?
Edit: a number of commentators have pointed out that the standard convention when using the function composition operator ∘ is that the first-applied argument goes on the right. So that the equivalent of C(B(A(x,y))) would be (C ∘ B ∘ A)(x, y) in typical functional languages, and certainly so in general mathematics.
However I think that many programmers would readily prefer the idea that the sequencing of operations proceeds in English order left-to-right, so I'm going to leave the main body of the answer as it is.
I was also pleased to find that in F#, composition can be done left to right in accordance with my preference, although using a different symbol for the composition operator (>>): https://fsharpforfunandprofit.com/posts/function-composition/
So that C(B(A(x,y))) would become (A >> B >> C)(x, y).
(3+4)*(5+6), where the order of the two additions there remains unspecified by mathematics.monadword but it's an excellent, practical example of assiociative function composition:map(map(array, f), g)should be equal tomap(array, g ∘ f)where one may be more readable and the other more performant. You can choose the more readable path while your compiler is free to rewrite your code into something equivalent but faster.mapis simply a functor operation).