The movs* and cmps* instructions are quite handy as they let you perform such common tasks as copying data and comparing data.
The ins* and outs* are similar to movs* in nature, they simply move data between memory and I/O devices. They are especially helpful for reading/writing to a disk in complete sectors (typically 512 bytes). Of course, DMAs obliterate these since DMA-based I/O is even more efficient, but back in the day they weren't as common as they are today.
Simulating these instructions (especially their repeated forms (look up the rep prefix)) would've required more code and would've been slower. Hence their existence.
Btw, the xchg instruction and any other read-modify-write instruction (e.g. add) with the destination in memory are also effectively memory-to-memory instructions. Not all CPUs have these, many mainly offer instructions that either read from memory or write to memory but not both (the exception would be the instructions that are used to implement exclusive/atomic access to memory, think xchg, xadd, cmpxchg8/16). CPUs with such instruction sets belong to so-called load-store architectures.
Also, the push and pop instructions may have their explicit operand designate a memory location. That's another form of memory-to-memory instructions.
As for segments, nearly all instructions that read or write memory involve segments (some system instructions work differently), so the segment management and overhead is not something you could somehow avoid if you decided not to use the instructions you're mentioning and opt for some other instructions instead.
movsbis there to implement amemcpyand friends andoutsbis very helpful when doing IO to dumb devices lacking sufficient helper hardware (which was the usual case in those days of old). Of course, it was not an original idea or something - before the advent of RISC movement this sort of instructions were rather common everywhere (apart from really small 8 bit "micros").