NetTrain with multi-label data?

Question

I'm attempting to do simple category recommendation on textual data. I have around 20k training samples of the form desc_String -> {tag1_String, tag2_String, ...}.

When given a natural language input (in my case a product description string), the trained NetChain should produce a probability distribution over the category tags.

Here's a minimal example of the issue:

data = { "Axel Skin Tight Leggings for the Outdoors" -> {"summer-clothes", "skinny"}, "Velvet Mini Skirt" -> {"summer-clothes", "bottoms"}, "Stretch Cotton Scarf" -> {"winter-clothes", "accessories"}, "California Striped Colorblock Bomber Jacket " -> {"sale-outerwear", "tops", "winter-clothes"}}; allLabels = Union @ Flatten[data[[All, 2]]] dim = Length @ allLabels net = NetChain[{ EmbeddingLayer[60], DropoutLayer[0.3], LongShortTermMemoryLayer[40], SequenceLastLayer[], LinearLayer[dim], SoftmaxLayer[] }, "Input" -> NetEncoder[{"Tokens"}], "Output" -> NetDecoder[{"Class", allLabels}] ] trained = NetTrain[net, data]

The NetTrain seems to complain that each sample's target is a list and not a string, but this is exactly what multi-label training is.

I'm sure NetTrain and En/decoders must support multi-labels, so what am I doing wrong here?

Updates:

It was suggested in the comment to convert my training data's labels into one-hot vectors. This clearly works for "Inputs" (e.g. UnitVectorLayer) however, I've never seen this work for target "Outputs".

I'm not sure it's supported. But you can make your data single labeled and just extract top probability outputs from your network. — swish
– swish, Commented Apr 17, 2018 at 20:00
But then the set-wise structural information will be lost, and the dimensionally would explode — M.R.
– M.R., Commented Apr 17, 2018 at 20:01
Would it though? Isn't training on several identical examples with different labels the same as training on one with multiple labels? Another solution is to provide labels as vector in the form of {0,1,0,0,1} with ones being in place of current example labels. — swish
– swish, Commented Apr 17, 2018 at 20:08
No it’s not the same, there is more information content in the list — M.R.
– M.R., Commented Apr 17, 2018 at 20:10

swish · Accepted Answer · 2018-04-18 23:47:05Z

Make your data multi-hot:

dataMultiHot = MapAt[Total@*NetEncoder[{"Class", allLabels, "UnitVector"}], data, {All, 2}]

Use LogisticSigmoid as last activation:

net = NetInitialize@ NetChain[{EmbeddingLayer[60], DropoutLayer[0.3], LongShortTermMemoryLayer[40], SequenceLastLayer[], LinearLayer[dim], LogisticSigmoid}, "Input" -> NetEncoder[{"Tokens"}], "Output" -> NetDecoder[{"Class", allLabels}]] net = NetTrain[net, dataMultiHot]

Output confidence for each class:

net["Winter scarf", "Probabilities"] (*<|"accessories" -> 0.708859, "bottoms" -> 0.200409, "sale-outerwear" -> 0.14959, "skinny" -> 0.168188, "summer-clothes" -> 0.220286, "tops" -> 0.153834, "winter-clothes" -> 0.784767|>*)

UPDATE

In order for network to output only 0's and 1's, you can either stick additional ElementwiseLayer[Round] at the end and drop NetDecoder altogether

net = NetInitialize@ NetChain[{EmbeddingLayer[60], DropoutLayer[0.3], LongShortTermMemoryLayer[40], SequenceLastLayer[], LinearLayer[dim], LogisticSigmoid, ElementwiseLayer[Round]}, "Input" -> NetEncoder[{"Tokens"}]]

or try to learn thresholds separately:

net = NetInitialize@NetGraph[{ EmbeddingLayer[60], DropoutLayer[0.3], LongShortTermMemoryLayer[40], SequenceLastLayer[], LinearLayer[dim], LogisticSigmoid, LinearLayer[dim], LogisticSigmoid, ThreadingLayer[(Sign[#1 - #2] + 1)/2&]}, {1 -> 2 -> 3 -> 4 -> 5 -> 6, 4 -> 7 -> 8, {6, 8} -> 9}, "Input" -> NetEncoder[{"Tokens"}]]

There is a limitations on what functions you can use inside ElementwiseLayer and ThreadingLayer, Sign works but not UnitStep or Piecewise or If etc., that's why it looks weird.

Then you extract labels from vector:

Extract[allLabels, Position[net["Winter scarf"], 1.0]]

this looked right but then I realized the net only outputs a single string, but it's multi-output so I'd like it to output multiple strings. Is there any NetDecoder that can do that, that is, learn to threshold the LogisticSigmoid vector and return the labels for those indexes? — M.R.
– M.R., Commented Apr 18, 2018 at 23:01

ulvi · Accepted Answer · 2018-04-17 21:46:45Z

Not sure this is what you need but:

assign[a_, listb_] := Map[a -> # &, listb]

You can make your data single-labeled thus:

Flatten[Map[assign[Keys[#], Values[#]] &, data]]

{"Axel Skin Tight Leggings for the Outdoors" -> "summer-clothes", "Axel Skin Tight Leggings for the Outdoors" -> "skinny", "Velvet Mini Skirt" -> "summer-clothes", "Velvet Mini Skirt" -> "bottoms", "Stretch Cotton Scarf" -> "winter-clothes", "Stretch Cotton Scarf" -> "accessories", "California Striped Colorblock Bomber Jacket " -> "sale-outerwear", "California Striped Colorblock Bomber Jacket " -> "tops", "California Striped Colorblock Bomber Jacket " -> "winter-clothes"}

and run the training as you did:

allLabels = Union@Flatten[data[[All, 2]]] dim = Length@allLabels

net = NetChain[{EmbeddingLayer[60], DropoutLayer[0.3], LongShortTermMemoryLayer[40], SequenceLastLayer[], LinearLayer[dim], SoftmaxLayer[]}, "Input" -> NetEncoder[{"Tokens"}], "Output" -> NetDecoder[{"Class", allLabels}]]

trained = NetTrain[net, Flatten[Map[assign[Keys[#], Values[#]] &, data]]]

For the resulting probabilities you can do:

Thread[ List[allLabels, trained["Axel Skin Tight Leggings for the Outdoors", None]]]

{{"accessories", 7.54239*10^-6}, {"bottoms", 0.0016615}, {"sale-outerwear", 0.000471794}, {"skinny", 0.492985}, {"summer-clothes", 0.503679}, {"tops", 0.00100407}, {"winter-clothes", 0.000190549}}

Stack Exchange Network

NetTrain with multi-label data?

2 Answers 2

Hot Network Questions

NetTrain with multi-label data?

2 Answers 2

Related

Hot Network Questions