You can't easily limit RMS voltage, because RMS voltage is related to its average value. Average voltage will be different for a square wave, than for a sinusoid, for example. To know how much attenuation will be necessary in order to avoid exceeding some "ceiling" average, you would need to look into the future, to see the waveform as it will be in the next few milliseconds, calculate its average, and then set the appropriate attenuation level in the present.
That's obviously absurd, but other tricks do exist. You can do the averaging for a recent (in the past) interval of signal input, and then begin attenuation, just a little late. That won't prevent sudden bursts of excessive power getting through, but such a system can react quickly enough to ensure that it doesn't last for more than a few milliseconds. That's the approach suggested by Andy aka.
You absolutely can limit instantaneous voltage, because that is known right now, and can be "clamped" to some maximum, right now. The simplest approach is probably a voltage follower using a rail-to-rail output op-amp:

simulate this circuit – Schematic created using CircuitLab
The op-amp cannot output voltages beyond its own supplies, so output voltage here can never fall outside the range −2.1V to +2.1V, which is 4.2V peak-to-peak. The hardest part about this design is finding/building a dual ±2.1V power supply. That will be significantly more complex than the actual amplifier.
Clamping like this causes clipping, if the input signal is too large, which is of course heavy distortion. I suspect that this will be a deal breaker, so I'm not inclined to make any effort here to elaborate on this approach.