oxnn -- Oxford NN Library

This library contains extensions to the Torch nn and cunn libraries. This is development code and is not fully tested.

Highlights

RNNs
- oxnn.SequenceOfWords - Deep RNN class for sentences. It can handle batches where sentence lengths vary across and within batches, with output and loss appropriately masked.
- Optimized LSTM cell oxnn.ModelUtil.LSTM12cl, oxnn.LSTM12Part2.
- oxnn.RecurrentPropagator - Module for executing custom computations graphs, which is useful for RNNs. It handles cloning and weight sharing of modules that are used multiple times. Each batch can have a different computation graph,
NN
- oxnn.LinearBlockDiagonal
- oxnn.LinearCAddInplace
- oxnn.LogSoftMaxInplace
- oxnn.NarrowTable - multiple narrows on one tensor.
- oxnn.NoAccGradParameters - do not train parameters of a module.
- oxnn.VecsToVecs - maps vectors to vectors, given and retuned as a table or a tensor of batch of vectors for each time step.
Text
- oxnn.Vocabulary
- oxnn.TextUtil

Documentation can be found at the top of the files, and we provide some (examples)[examples/].

License

We release this code under the BSD license (see the LICENSE file). Some of the files are modification of files from nn or cunn. File util/cloneManyTimes.lua has a separate license.

Installation

Clone this code into your $HOME directory and run the ./build.sh command.

For your .bashrc:

# substitute $HOME with the path to the cloned repository export LUA_PATH="$HOME/?/init.lua;$HOME/?.lua;$LUA_PATH" export LUA_CPATH="$HOME/oxnn/?.so;$HOME/?.so;$LUA_CPATH"

To test the installation try running the tests and the examples.

Short examples

oxnn.SequenceOfWords

Two layer LSTM:

lstm = oxnn.SequenceOfWords{ lookuptable = nn.Sequential():add(nn.LookupTable(10, 128)) :add(nn.SplitTable(2)), recurrent = { oxnn.ModelUtil.LSTMCell12cl(128, true), -- layer 1 oxnn.ModelUtil.LSTMCell12cl(128, true) },-- layer 2 output = nn.Sequential() :add(nn.Linear(128, 10)) :add(oxnn.LogSoftMaxInplace(true,true)), loss = 'nllloss', layers = 2 } pad = 10 -- batch of 2 sequences of lengths 4 and 3 input = { { { torch.zeros(2,128), torch.zeros(2,128) }, -- initial state layer 1 { torch.zeros(2,128), torch.zeros(2,128) } }, -- initial state layer 2 torch.Tensor{ { 1, 7, 9, 8 }, -- sentence 1 { 2, 3, 5, pad } }, -- sentence 2 { 4, 3 } -- sentence lengths } print(lstm:forward(input)) { 1 : { 1 : { 1 : DoubleTensor - size: 2x128 -- last recurrent state for layer 1 2 : DoubleTensor - size: 2x128 -- not including the padding step } 2 : { 1 : DoubleTensor - size: 2x128 -- last recurrent state for layer 2 2 : DoubleTensor - size: 2x128 -- not including the padding step } } 2 : 2.3162038392024 -- NLL/(3+2) (per output token); -- as loss gradOutput pass 0 }

For more advanced examples see in the examples/ directory.

oxnn.RecurrentPropagator

We will implement a one layer LSTM that takes batches of sequences. Sequences within a batch need to have the same lengths, but can change across batches.

local cuda = false require 'oxnn' if cuda then oxnn.InitCuda() end -- We implement a simple LSTM -- -- Predict B C -- ^ ^ -- | | -- init -> Cell -> Cell -> Cell -> representation -- ^ ^ ^ -- | | | -- A B C lookuptable = nn.Sequential():add(nn.LookupTable(10, 128)) :add(nn.SplitTable(2)) recurrent = oxnn.ModelUtil.LSTMCell12cl(128, true) output = nn.Sequential():add(nn.Linear(128, 10)) :add(oxnn.LogSoftMaxInplace(true,true)) criterion = oxnn.CriterionTable(nn.ClassNLLCriterion()) targets = nn.Sequential():add(nn.SplitTable(2)) rp = oxnn.RecurrentPropagator() rp, mod_lt = rp:add(lookuptable) -- modifies rp inplace rp, mod_rec = rp:add(recurrent) rp, mod_out = rp:add(output) rp, mod_crit = rp:add(criterion) rp, mod_targ = rp:add(targets) rp._cg = function(batch, type) -- This function creates a computation graph for the given batch (i.e. -- input); it is called for each input. The modules are not executed during -- the run of this function. type is the last type we used to type the -- RecurrentPropagator with. local edges = {} -- the edges of the CG local r = oxnn.RPUtil.CGUtil(edges) -- helper functions -- We will assume the batch has the same format as above example, but for one -- layer. local len = batch[2]:size(2) -- The outputs are stored in virtual stacks. We create a stack. The string -- argument shows only when debugging. -- (For debugging set RecurrentPropagator.debug to 1 or 3. -- -- When storing we need specify the index, where inputs[1] would be the top -- of the stack. local inputs = r.S('inputs') -- Add the first edge to the edges table. r.E { r.i(2), mod_lt, inputs[{len,1}] } -- Output of the lookuptable is a table with input for each time step. -- inputs[{len,1}] is equivalent to {inputs[len],inputs[len-1]...inputs[1]}. -- (Note that we can store only at the top of the stack, however, when we are -- storing multiple values, we need to index them with contiguous decreasing -- indices up to 1. This is so that on backward pass we can reverse graph.) -- -- Inputs from the batch are accessed similarly. The elements of the input -- table correspond the r.i(1), r.i(2),... Each input is regarded also as a -- stack and needs to be indexed, e.g. r.i(1)[2]; except in a special case -- when batch[j] is a tensor, then we can use simply r.i(j), as we did above. local rec_state = r.S('rec_state') r.E { r.i(1)[1], nn.Identity(), rec_state[1] } -- initial LSTM state -- As a module in an edge we usually use the module string returned when -- adding a module to the RecurrentPropagator (here mod_lt,...); we can -- also use a new module, but this module is created anew for each batch, and -- it's particularly bad if it allocates memory. This is not an issue with -- nn.Identity(). local outputs = r.S('outputs') for i=len,1,-1 do r.E { { rec_state[1], inputs[i] }, mod_rec, { rec_state[1], outputs[1] } } -- Each time mod_rec is used, a clone that shares the parameters of the -- original is used (and reused for subsequent batches). -- We take the appropriate input and store the output on the top of the -- stack. Note that rec_state[1] on both input and output is a table of -- two elements: h and c (hidden layers of the LSTM). end local after_out = r.S('after_out') for i=len,2,-1 do r.E { outputs[i], mod_out, after_out[1] } end local targets_all = r.S('targets_all') -- To demonstrate r.Split r.E { r.i(2), mod_targ, targets_all[1] } local targets = r.Split(targets_all[1], len) -- this is equivalent to -- r.E { targets_all[1], nn.Identity(), targets[{len,1}] } -- where targets is a new stack. We could have "saved" to multiple stack -- places directly. local losses = r.S('loss') --target of ouput of first time step (after_out[len-1]) is the input of  --second timestep(target[len-1]), and so forth for i=len-1,1,-1 do r.E { { after_out[i], targets[i] }, mod_crit, losses[1] } end -- Some of the computed values are not used and expect a gradient flowing -- back so we put zero loss on them. The only module allowed without a -- gradient flowing back is oxnn.CriterionTable . r.E { {outputs[1],nil}, oxnn.CriterionTable(oxnn.ZeroLoss()), r.S('0')[1] } r.E { {rec_state[1],nil}, oxnn.CriterionTable(oxnn.ZeroLoss()), r.S('0')[1] } r.E { {targets[len],nil}, oxnn.CriterionTable(oxnn.ZeroLoss()), r.S('0')[1] } -- Ideally, we would add the above modules to the RecurrentPropagator since -- they allocate memory; this way they do it for each batch. local lengths = {} for i=1,batch[2]:size(1) do table.insert(lengths, len) end r.E { {losses[{len-1,1}], nil}, -- table of table of losses since -- CriterionTable expects a table. oxnn.CriterionTable(oxnn.SumLosses(true, lengths)), -- sum the losses -- and average r.S('final output')[1] } -- output from RecurrentPropagator is the output of the last edge/module. return edges end -- batch of 2 sequences of lengths 4 and 4 input = { { { torch.zeros(2,128), torch.zeros(2,128) } }, torch.Tensor{ { 1, 7, 9, 8 }, -- sentence 1 { 2, 3, 5, 6 } }, -- sentence 2 { 4, 3 } -- sentence lengths } if cuda then rp:cuda() input[1][1][1] = input[1][1][2]:cuda() input[1][1][2] = input[1][1][2]:cuda() input[2] = input[2]:cuda() end print(rp:forward(input)) print(rp:backward(input, 0)) -- 0 since output if only a number and the loss -- does not require a gradient.

For more advanced example see the implementation of oxnn.SequenceOfWords.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
examples		examples
nn		nn
nn_modif		nn_modif
nngraph_modif		nngraph_modif
rnn		rnn
scripts/logviz		scripts/logviz
src		src
test		test
text		text
util		util
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
build.sh		build.sh
init.lua		init.lua
oxnn-scm-1.rockspec		oxnn-scm-1.rockspec

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

oxnn -- Oxford NN Library

Highlights

License

Installation

Short examples

oxnn.SequenceOfWords

oxnn.RecurrentPropagator

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

oxnn -- Oxford NN Library

Highlights

License

Installation

Short examples

oxnn.SequenceOfWords

oxnn.RecurrentPropagator

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages