Background: The image below is, in essence, a simplified schematic of the, so called asynchronous state machine (AM_fsm.v). The design does not have any clock input signal. It contains many SR latches and a lot of combinatorial clouds. (aka OR/AND/MUX gates). 
I would like to reduce the delay from the Q of the SR latch to the output m1. In essence, I need to reduce the propagation delay from pk_in to m1. The delay_A and the delay_B are not critical. In short, I need to reduce the delay_C as much as possible. The main reason why the delays A and B are not critical is very simple. The logic gates, selected/calculated by the synthesizer are such that every in-to-out propagation delay is in the range between 15 and 80 ps per gate. On the other side (delay_C), the propagation delay per gate is between 150-350 ps. So, there is still room for improvement I've tried various commands to convince the 'genus' (CADENCE's synthesis tool) to size the gates to reduce the delay.
set_max_delay -from [get_pins AM_fsm/SR_latch/Q] -to [get_ports m1] 0.2 create_clock -name virtual_clk -period 3 -domain virtual set_max_delay 0.2 -from [get_ports pk_in] -to [get_ports m1] set_min_delay 0.0 -from [get_ports pk_in] -to [get_ports m1] I even tried to put big load on the m1. It did help (slightly).
RESULT: Nothing from above reduced the delay_C.
Is there some other way to convince 'genus' to shorten the delay_C.