Listen, attend and spell

Minimal tf 2.0 implementation of Listen, attend and spell (https://arxiv.org/abs/1508.01211). To get a better understanding of the naming of the models variables please see the paper above.

Done:

Model architecture looks right to me. If you find an error in the code please dont hesitate to open an issue 😊

ToDo:

Implement data handing for easier training of model.
Train on LibriSpeech 100h
Implement specAugment features (prev SOTA LibriSpeech) (https://arxiv.org/abs/1904.08779)

Usage

The file model.py contains the architecture of the model. Example usage below.

""" def LAS(dim, f_1, no_tokens):  dim: Number of hidden neurons for most LSTM's.  f_1: pBLSTM takes (Batch, timesteps, f_1) as input, f_1 is number of features of the mel spectrogram   per timestep. Timestep is the width of the spectrogram.  No_tokens: Number of unique tokens for input and output vector. """ model = LAS(256, 256, 16) model.compile(loss="mse", optimizer="adam") # x_1 should have shape (Batch-size, timesteps, f_1) x_1 = np.random.random((1, 550, 256)) # x_2 should have shape (Batch-size, no_prev_tokens, No_tokens). The token vector should be one-hot encoded. x_2 = np.zeros((1,12,16)) for n in range(12): x_2[0, n, np.random.randint(1, 16)] = 1 # By passing x_1 and x_2 the model will predict the 12th token  # given by the spectogram and the prev predicted tokens model.predict([x_1, x_2])

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
README.md		README.md
model.py		model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Listen, attend and spell

Done:

ToDo:

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ServerSideHannes/las

Folders and files

Latest commit

History

Repository files navigation

Listen, attend and spell

Done:

ToDo:

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages