Too long for a comment but not a complete answer. From the end
-I don't know how (or if it is possible) to distribute the operation in parallel computing.
-See thisthis, and thisthis question for defining a commutator.
In particular there is advice in the second one by Szabolics on how to use better notation if you don't like the one below.
-An anticommutator would be sufficiently defined by a rule I think, like so:
HoldAll[(l m + m l + l^2 + m m l )] /. {(a_ b_ + b_ a_) -> Anitcom[a, b], a_ a_ -> 0, a_^_->0} Where you can define Anticom on a case by case basis.