Skip to content

Unlabelled Data Processors#87

Open
monica-sekoyan wants to merge 13 commits intomainfrom
ml_processors
Open

Unlabelled Data Processors#87
monica-sekoyan wants to merge 13 commits intomainfrom
ml_processors

Conversation

@monica-sekoyan
Copy link
Contributor

  • Added processor for Babel Dataset (language independent)
  • Added processor for Voxpopuli Dataset Unlabellet subset (language independent)
  • Added generic config for yodas (or dataset alike) data processing
  • Added processors for audio segmentation, untarring audios, emojis removal
  • Added corresponding new tests
  • Corrected Armenian audio books test data in the s3 bucket (was failing because of the incorrect reference)

p.s.
all tests are passed locally

Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Signed-off-by: monica-sekoyan <msekoyan@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant