Pyth, 24 bytes
f&q4lJcT\.!-J`M256jL\../ How it works
./Q all partitions of input jL\. join each on . f filter for results T such that: cT\. split T on . J assign to J l length q4 equals 4 & … and: -J`M256 J minus the list of representations of [0, …, 255] ! is false (empty) Pyth, 17 bytes, very slow
@FjLL\.,^U256 4./ Warning. Do not run. Requires approximately 553 GiB of RAM.
How it works
, two-element list of: ^U256 4 all four-element lists of [0, …, 255] ./Q all partitions of input jLL\. join each element of both on . @F fold intersection