Skip to content

Karpnv/cc#35

Open
karpnv wants to merge 137 commits intomainfrom
karpnv/cc
Open

Karpnv/cc#35
karpnv wants to merge 137 commits intomainfrom
karpnv/cc

Conversation

@karpnv
Copy link
Collaborator

@karpnv karpnv commented Nov 10, 2023

Common Crawl dataset preprocessing

karpnv and others added 30 commits September 12, 2023 04:28
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
karpnv and others added 30 commits March 19, 2024 09:32
* YouTube German config and new processors Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Added Merge Manifests processor Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Clean de.yaml pipeline config Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix Lang2Iso Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix typo * fix empty list error - IndexError: list index out of range * Added requirements.txt Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fixed paths for audio TN Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Updated requirements.txt Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> --------- Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
* YouTube German config and new processors Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Added Merge Manifests processor Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Clean de.yaml pipeline config Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fix Lang2Iso Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * fix typo * fix empty list error - IndexError: list index out of range * Added requirements.txt Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Fixed paths for audio TN Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Updated requirements.txt Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * ew processors for calculating metrics WER, CER, eedge CER, len diff ratio Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com> * Update utils.py * Update aggregate_segments.py * Update aggregate_segments.py * Update aggregate_segments.py --------- Signed-off-by: Sasha Meister <sasha.meister.work@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Sasha Meister <ameister@nvidia.com> Co-authored-by: Sasha Meister <ameister@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Signed-off-by: Nikolay Karpov <karpnv@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants