I'm attempting to do simple category recommendation on textual data. I have around 20k training samples of the form desc_String -> {tag1_String, tag2_String, ...}.
When given a natural language input (in my case a product description string), the trained NetChain should produce a probability distribution over the category tags.
Here's a minimal example of the issue:
data = { "Axel Skin Tight Leggings for the Outdoors" -> {"summer-clothes", "skinny"}, "Velvet Mini Skirt" -> {"summer-clothes", "bottoms"}, "Stretch Cotton Scarf" -> {"winter-clothes", "accessories"}, "California Striped Colorblock Bomber Jacket " -> {"sale-outerwear", "tops", "winter-clothes"}}; allLabels = Union @ Flatten[data[[All, 2]]] dim = Length @ allLabels net = NetChain[{ EmbeddingLayer[60], DropoutLayer[0.3], LongShortTermMemoryLayer[40], SequenceLastLayer[], LinearLayer[dim], SoftmaxLayer[] }, "Input" -> NetEncoder[{"Tokens"}], "Output" -> NetDecoder[{"Class", allLabels}] ] trained = NetTrain[net, data] The NetTrain seems to complain that each sample's target is a list and not a string, but this is exactly what multi-label training is.
I'm sure NetTrain and En/decoders must support multi-labels, so what am I doing wrong here?
Updates:
It was suggested in the comment to convert my training data's labels into one-hot vectors. This clearly works for "Inputs" (e.g. UnitVectorLayer) however, I've never seen this work for target "Outputs".

{0,1,0,0,1}with ones being in place of current example labels. $\endgroup$