Skip to content

Keras 3, Kaggle, CLI, Streaming #295

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 170 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
170 commits
Select commit Hold shift + click to select a range
232a80c
feat(streaming): training, inference, gradient accumulation
nglehuy Jun 15, 2024
9ea9b87
fix: remove ds2 dropout on conv module
nglehuy Jun 18, 2024
7c21c1d
fix: add sync batch norm, remove wrong bn in ds2
nglehuy Jun 22, 2024
4e0e8f5
fix: only wrap tf.function in jit compile
nglehuy Jun 24, 2024
de38407
fix: use autograph do_not_convert for batchnorm sync to work
nglehuy Jun 24, 2024
43d6054
chore: config
nglehuy Jul 2, 2024
d55fd40
fix: update train/test step
nglehuy Jul 2, 2024
2f502bb
fix: nan to num
nglehuy Jul 2, 2024
7441c95
fix: update compute mask ds2
nglehuy Jul 2, 2024
f2241cf
fix: nan to num
nglehuy Jul 2, 2024
ebb6930
fix: ctc loss
nglehuy Jul 2, 2024
305ddab
fix: update train step
nglehuy Jul 3, 2024
8268afe
chore: config
nglehuy Jul 3, 2024
c9e4d38
chore: config
nglehuy Jul 3, 2024
9b46cbe
fix: ctc
nglehuy Jul 3, 2024
b5bfe92
fix: add custom batch norm to avoid tf.cond
nglehuy Jul 4, 2024
dc08e6a
fix: env utils
nglehuy Jul 4, 2024
cf37206
fix: log batch that cause invalid loss
nglehuy Jul 4, 2024
4596609
chore: buffer size
nglehuy Jul 4, 2024
ba9d6b2
fix: handle unknown dataset size with no metadata provided
nglehuy Jul 6, 2024
fe594ad
chore: add option use loss scale
nglehuy Jul 6, 2024
7aed458
fix: support log debug
nglehuy Jul 6, 2024
4330dec
fix: update gradient accumulation
nglehuy Jul 10, 2024
522f080
fix: ga
nglehuy Jul 14, 2024
90cabc2
feat: tf2.16 with keras 3
nglehuy Jul 14, 2024
8285f6d
fix: ga
nglehuy Jul 14, 2024
fd446d6
Merge branch 'tf2.16' into feat-streaming
nglehuy Jul 15, 2024
5037896
feat: fix layers, models to tf2.16 with keras 3
nglehuy Jul 15, 2024
a853eba
feat: update models to compatible with keras 3
nglehuy Jul 21, 2024
f8a7b91
fix: loss compute using add_loss, loss tracking
nglehuy Jul 28, 2024
03f0d60
fix: output shapes of models to log to summary
nglehuy Jul 28, 2024
8b0ed02
fix: contextnet
nglehuy Jul 28, 2024
77baaa5
fix: ds2
nglehuy Jul 28, 2024
3768878
fix: jasper
nglehuy Jul 28, 2024
f3cb239
fix: rnnt
nglehuy Jul 28, 2024
1d7e3a6
fix: transformer
nglehuy Jul 28, 2024
7be3eda
fix: update deps
nglehuy Jul 29, 2024
9b03b31
fix: requirements
nglehuy Aug 25, 2024
401180b
fix: super init
nglehuy Aug 25, 2024
786f5d4
fix: update regularizers
nglehuy Aug 25, 2024
a9d1733
fix: update regularizers
nglehuy Aug 25, 2024
689b366
fix: print shapes
nglehuy Nov 24, 2024
4e75c0f
fix: conformer ctc
nglehuy Nov 24, 2024
82d91c8
fix: add ctc tpu impl
nglehuy Nov 25, 2024
c915be3
fix: ctc tpu impl
nglehuy Nov 25, 2024
dda33b7
fix: save weights, tpu connect
nglehuy Nov 28, 2024
c667984
fix: save weights, tpu connect
nglehuy Nov 28, 2024
dc77b84
fix: update req
nglehuy Dec 3, 2024
d455ae1
fix: update req
nglehuy Dec 3, 2024
33394a2
fix: update req
nglehuy Dec 3, 2024
35160ce
fix: update req
nglehuy Dec 3, 2024
6ffb3b8
fix: update req
nglehuy Dec 3, 2024
bb732a7
fix: update req
nglehuy Dec 4, 2024
9179425
fix: strategy scope
nglehuy Dec 4, 2024
ace3887
fix: requirements
nglehuy Dec 4, 2024
67a8470
fix: update savings
nglehuy Dec 7, 2024
9824819
feat: bundle scripts inside package
nglehuy Dec 29, 2024
05b068b
feat: introduce chunk-wise masking for mha layer
nglehuy Dec 31, 2024
1edf16a
feat: introduce chunk-wise masking to conformer & transformer
nglehuy Dec 31, 2024
6338f55
chore: update install script
nglehuy Jan 1, 2025
e844f77
chore: add conformer small streaming
nglehuy Jan 1, 2025
0543c31
chore: add conformer small streaming
nglehuy Jan 1, 2025
a2e2022
fix: use history size instead of memory length
nglehuy Jan 1, 2025
4824929
chore: update logging
nglehuy Jan 1, 2025
eca664c
fix: streaming masking mha
nglehuy Jan 1, 2025
a302962
fix: conformer ctc configs
nglehuy Jan 7, 2025
ab83d87
feat: add kaggle backup and restore callback
nglehuy Jan 11, 2025
52de4c0
fix: support flash attention, update deps
nglehuy Jan 12, 2025
aaa06a5
chore: add conformer-ctc-small-streaming-kaggle
nglehuy Jan 12, 2025
2fd4f2b
fix: restore from kaggle model
nglehuy Jan 12, 2025
6e5c3b9
fix: restore from kaggle model
nglehuy Jan 12, 2025
f62fdf9
fix: ignore backup kaggle when nan loss occurs
nglehuy Jan 12, 2025
2947160
fix: only use tqdm when needed
nglehuy Jan 12, 2025
c532d5c
fix: deps
nglehuy Jan 23, 2025
0e5e826
fix: support static shape
nglehuy Jan 24, 2025
eb55a4b
fix: mha streaming mask
nglehuy Jan 24, 2025
12fbb85
fix: feature extraction mixed precision, configs
nglehuy Jan 25, 2025
2100d75
fix: expose relmha_causal, flash attention
nglehuy Jan 25, 2025
a5d1e84
fix: allow ctc to force use native tf impl
nglehuy Jan 25, 2025
5ff3163
chore: list devices
nglehuy Jan 25, 2025
3dbab33
fix: attention mask
nglehuy Feb 9, 2025
1545543
fix: general layers to show outputshape, invalid loss show outputs
nglehuy Feb 15, 2025
5f784b7
fix: models configs
nglehuy Feb 15, 2025
70ac41e
fix: config streaming
nglehuy Feb 20, 2025
eebc361
fix: configs
nglehuy Feb 23, 2025
6b0bec4
fix: configs
nglehuy Feb 23, 2025
2ac8e7f
Merge branch 'main' into feat-streaming
nglehuy Mar 9, 2025
7dcd145
fix: streaming masking mha
nglehuy Mar 9, 2025
e209040
fix: streaming masking mha
nglehuy Mar 9, 2025
5649fdd
fix: update mha attention mask
nglehuy Mar 13, 2025
26c4a5f
feat: add support for layer norm in conformer conv module
nglehuy Mar 13, 2025
4f77a52
chore: update configs
nglehuy Mar 13, 2025
c3ab865
fix: feature extraction layer dtype tf.float32 to ensure loss converg…
nglehuy Mar 17, 2025
778c1a2
fix: ctc loss tpu - case logits to float32
nglehuy Mar 17, 2025
e68ceee
fix: use auto mask
nglehuy Mar 19, 2025
f1e2a88
fix: pad logits length to label length
nglehuy Mar 20, 2025
f1a0ed6
fix: ctc loss tpu
nglehuy Mar 20, 2025
6f7f246
chore: config
nglehuy Mar 21, 2025
d538e69
fix: disable bias/activity regularizer as not needed
nglehuy Mar 21, 2025
7611ff8
chore: config
nglehuy Mar 23, 2025
0c8e7c1
chore: setup mxp
nglehuy Mar 24, 2025
56d2afa
chore: setup mxp
nglehuy Mar 24, 2025
454163c
fix: small kaggle
nglehuy Mar 25, 2025
a333bfc
chore: transformer-ctc streaming
nglehuy Mar 25, 2025
cf435a3
chore: config
nglehuy Mar 27, 2025
c35af45
fix: ctc-tpu clean label
nglehuy Mar 30, 2025
0556481
chore: configs
nglehuy Mar 30, 2025
3e88f65
chore: configs
nglehuy Mar 30, 2025
111f3ac
chore: configs
nglehuy Mar 30, 2025
8ee9813
fix: train step
nglehuy Mar 30, 2025
dde7760
fix: apply ga loss division before loss scaling
nglehuy Mar 30, 2025
aade071
fix: update train function with ga steps
nglehuy Mar 30, 2025
dc0c304
fix: update train step ga
nglehuy Mar 30, 2025
d541928
chore: configs
nglehuy Mar 30, 2025
91a39a2
chore: configs
nglehuy Mar 30, 2025
2a40da6
chore: configs
nglehuy Mar 30, 2025
8bcf0f3
chore: configs
nglehuy Mar 30, 2025
de58fed
fix: rnn kwargs
nglehuy Mar 30, 2025
a05494a
chore: update
nglehuy Mar 31, 2025
076a6f9
fix: update masking and layer
nglehuy Apr 5, 2025
51a5258
fix: update masking and layer
nglehuy Apr 5, 2025
a9218d9
fix: make function
nglehuy Apr 5, 2025
ce6752b
fix: use default make function
nglehuy Apr 5, 2025
90e6695
fix: soft device placement
nglehuy Apr 5, 2025
f4e459d
fix: option TF_CUDNN
nglehuy Apr 5, 2025
7473458
chore: logging
nglehuy Apr 5, 2025
57aef49
fix: configs
nglehuy Apr 5, 2025
a2eaf15
fix: feature extraction layer dtype
nglehuy Apr 5, 2025
5a1f054
fix: numeric stability with dtype compatible
nglehuy Apr 6, 2025
b70bdd6
chore: summary
nglehuy Apr 6, 2025
edf5eb2
fix: softmax numberic overflow with mask
nglehuy Apr 6, 2025
81a336c
chore: remove commented code
nglehuy Apr 6, 2025
6be5428
chore: config
nglehuy Apr 6, 2025
0d75686
chore: remove commented code
nglehuy Apr 7, 2025
f5886a5
fix: use keras-nightly
nglehuy Apr 8, 2025
e524af6
fix: requirements
nglehuy Apr 11, 2025
20d30c6
fix: requirements
nglehuy Apr 12, 2025
7f145f7
fix: requirements with docker
nglehuy Apr 12, 2025
2d557cc
fix: kaggle backup & restore callback
nglehuy Apr 14, 2025
6dc3410
fix: deps
nglehuy Apr 17, 2025
36f3cc0
fix: deps
nglehuy Apr 17, 2025
cea79ef
fix: deps
nglehuy Apr 17, 2025
5e31671
fix: deps
nglehuy Apr 17, 2025
6ca9c61
fix: deps
nglehuy Apr 17, 2025
63f6280
fix: file util
nglehuy Apr 22, 2025
4390d1b
fix: env util
nglehuy Apr 22, 2025
64aeb2f
fix: add setup.sh script, update make function
nglehuy Apr 22, 2025
65fdf46
fix: update kaggel callback
nglehuy Apr 22, 2025
c8d3614
fix: update gradient accumulation
nglehuy Apr 22, 2025
1eb2d15
fix: gradient accumulation
nglehuy Apr 23, 2025
1ae5a66
fix: deps
nglehuy Apr 23, 2025
e88e99a
fix: config
nglehuy Apr 23, 2025
38f641c
fix: backup and restore callback
nglehuy Apr 25, 2025
3bbfd77
chore: config
nglehuy Apr 25, 2025
6a90a51
fix: train script
nglehuy Apr 25, 2025
2e2d6e4
fix: configs, add gradn step
nglehuy Apr 25, 2025
aa15483
fix: configs
nglehuy Apr 25, 2025
d36085e
fix: gradn
nglehuy Apr 25, 2025
98390fc
fix: config
nglehuy Apr 25, 2025
c15e841
fix: config
nglehuy Apr 26, 2025
23dd668
fix: disable tqdm, logging for kagglehub
nglehuy Apr 27, 2025
af93c2b
chore: unittest
nglehuy Apr 27, 2025
dd17fb6
feat: refactor tokenizer, dataset with custom build vocabulary and
nglehuy May 1, 2025
bb3b848
fix: scripts
nglehuy May 5, 2025
7d3c954
fix: callback
nglehuy May 5, 2025
1dc854e
fix: deps
nglehuy May 5, 2025
c5024ac
fix: update vocab generator
nglehuy May 6, 2025
b325143
fix: update vocab generator
nglehuy May 6, 2025
7d2f029
fix: update vocab generator
nglehuy May 6, 2025
3a5b511
chore: vietbud500 metadata
nglehuy May 6, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
LibriSpeech
Models
.venv*
venv*
7 changes: 6 additions & 1 deletion .pylintrc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code.
extension-pkg-allow-list=pydantic,tensorflow
extension-pkg-allow-list=pydantic

# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
Expand Down Expand Up @@ -120,6 +120,11 @@ disable=too-few-public-methods,
consider-using-f-string,
fixme,
unused-variable,
pointless-string-statement,
too-many-lines,
abstract-method,
too-many-ancestors,
import-outside-toplevel,

# Enable the message, report, category or checker with the given id(s). You can
# either give multiple identifier separated by comma (,) or put this option
Expand Down
66 changes: 30 additions & 36 deletions .vscode/settings.json
Original file line number Diff line number Diff line change
@@ -1,37 +1,31 @@
{
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter"
},
"autoDocstring.docstringFormat": "numpy",
"black-formatter.args": [
"--config",
"${workspaceFolder}/pyproject.toml"
],
"black-formatter.path": [
"${interpreter}",
"-m",
"black"
],
"editor.codeActionsOnSave": {
"source.fixAll": "explicit",
"source.organizeImports": "explicit"
},
"editor.formatOnSave": true,
"isort.args": [
"--settings-file",
"${workspaceFolder}/pyproject.toml"
],
"pylint.args": [
"--rcfile=${workspaceFolder}/.pylintrc"
],
"pylint.path": [
"${interpreter}",
"-m",
"pylint"
],
"python.analysis.fixAll": [
"source.unusedImports",
"source.convertImportFormat"
],
"python.analysis.importFormat": "absolute"
}
"[python]": {
"editor.defaultFormatter": "ms-python.black-formatter",
"editor.tabSize": 4
},
"[markdown]": {
"editor.tabSize": 2,
"editor.indentSize": 2,
"editor.detectIndentation": false
},
"[json]": {
"editor.tabSize": 2
},
"[yaml]": {
"editor.tabSize": 2
},
"autoDocstring.docstringFormat": "numpy",
"black-formatter.args": ["--config", "${workspaceFolder}/pyproject.toml"],
"black-formatter.path": ["${interpreter}", "-m", "black"],
"editor.codeActionsOnSave": {
"source.fixAll": "explicit",
"source.organizeImports": "explicit"
},
"editor.formatOnSave": true,
"isort.args": ["--settings-file", "${workspaceFolder}/pyproject.toml"],
"pylint.args": ["--rcfile=${workspaceFolder}/.pylintrc"],
"pylint.path": ["${interpreter}", "-m", "pylint"],
"python.analysis.fixAll": ["source.unusedImports", "source.convertImportFormat"],
"python.analysis.importFormat": "absolute",
"markdown.extension.list.indentationSize": "inherit"
}
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FROM tensorflow/tensorflow:2.3.2-gpu
FROM tensorflow/tensorflow:2.18.0-gpu

RUN apt-get update \
&& apt-get upgrade -y \
Expand All @@ -9,8 +9,8 @@ RUN apt-get update \
RUN apt clean && apt-get clean

# Install dependencies
COPY requirements.txt /
RUN pip --no-cache-dir install -r /requirements.txt
COPY requirements*.txt /
RUN pip --no-cache-dir install -r /requirements.txt -r /requirements.cuda.txt

# Install rnnt_loss
COPY scripts /scripts
Expand All @@ -21,4 +21,4 @@ RUN if [ "$install_rnnt_loss" = "true" ] ; \
&& ./scripts/install_rnnt_loss.sh \
else echo 'Using pure TensorFlow'; fi

RUN echo "export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" >> /root/.bashrc
RUN echo "export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}" >> /root/.bashrc
46 changes: 14 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ TensorFlowASR implements some automatic speech recognition architectures such as
- [Installing from source (recommended)](#installing-from-source-recommended)
- [Installing via PyPi](#installing-via-pypi)
- [Installing for development](#installing-for-development)
- [Install for Apple Sillicon](#install-for-apple-sillicon)
- [Running in a container](#running-in-a-container)
- [Training \& Testing Tutorial](#training--testing-tutorial)
- [Features Extraction](#features-extraction)
Expand Down Expand Up @@ -74,62 +73,46 @@ TensorFlowASR implements some automatic speech recognition architectures such as

For training and testing, you should use `git clone` for installing necessary packages from other authors (`ctc_decoders`, `rnnt_loss`, etc.)

**NOTE ONLY FOR APPLE SILICON**: TensorFlowASR requires python >= 3.12

See the `requirements.[extra].txt` files for extra dependencies

### Installing from source (recommended)

```bash
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
pip3 install ".[tf2.x]" # or ".[tf2.x-gpu]"
pip3 install -e . # or ".[cuda]" if using GPU
```

For anaconda3:
For **anaconda3**:

```bash
conda create -y -n tfasr tensorflow-gpu python=3.8 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow
conda create -y -n tfasr python=3.11 # tensorflow if using CPU, this makes sure conda install all dependencies for tensorflow
conda activate tfasr
pip install -U tensorflow-gpu # upgrade to latest version of tensorflow
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
# Tensorflow 2.x (with 2.x.x >= 2.5.1)
pip3 install ".[tf2.x]" # or ".[tf2.x-gpu]"
pip3 install -e . # or ".[cuda]" if using GPU
```

### Installing via PyPi
For **colab with TPUs**:

```bash
# Tensorflow 2.x (with 2.x >= 2.3)
pip3 install "TensorFlowASR[tf2.x]" # or pip3 install "TensorFlowASR[tf2.x-gpu]"
pip3 install -e ".[tpu]" -f https://storage.googleapis.com/libtpu-tf-releases/index.html
```

### Installing for development
### Installing via PyPi

```bash
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip3 install -e ".[dev]"
pip3 install -e ".[tf2.x]" # or ".[tf2.x-gpu]" or ".[tf2.x-apple]" for apple m1 machine
pip3 install "TensorFlowASR" # or "TensorFlowASR[cuda]" if using GPU
```

### Install for Apple Sillicon

Due to tensorflow-text is not built for Apple Sillicon, we need to install it with the prebuilt wheel file from [sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon](https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon)
### Installing for development

```bash
git clone https://github.com/TensorSpeech/TensorFlowASR.git
cd TensorFlowASR
pip3 install -e "." # or pip3 install -e ".[dev] for development # or pip3 install "TensorFlowASR[dev]" from PyPi
pip3 install tensorflow~=2.14.0 # change minor version if you want
```

Do this after installing TensorFlowASR with tensorflow above

```bash
TF_VERSION="$(python3 -c 'import tensorflow; print(tensorflow.__version__)')" && \
TF_VERSION_MAJOR="$(echo $TF_VERSION | cut -d'.' -f1,2)" && \
PY_VERSION="$(python3 -c 'import platform; major, minor, patch = platform.python_version_tuple(); print(f"{major}{minor}");')" && \
URL="https://github.com/sun1638650145/Libraries-and-Extensions-for-TensorFlow-for-Apple-Silicon" && \
pip3 install "${URL}/releases/download/v${TF_VERSION_MAJOR}/tensorflow_text-${TF_VERSION_MAJOR}.0-cp${PY_VERSION}-cp${PY_VERSION}-macosx_11_0_arm64.whl"
pip3 install -e ".[apple,dev]"
```

### Running in a container
Expand All @@ -139,7 +122,6 @@ docker-compose up -d
```



## Training & Testing Tutorial

- For training, please read [tutorial_training](./docs/tutorials/training.md)
Expand Down
11 changes: 5 additions & 6 deletions docs/tokenizers.md
Original file line number Diff line number Diff line change
@@ -1,27 +1,26 @@
# Tokenizers

- [Tokenizers](#tokenizers)
- [1. Character Tokenizer](#1-character-tokenizer)
- [2. Wordpiece Tokenizer](#2-wordpiece-tokenizer)
- [3. Sentencepiece Tokenizer](#3-sentencepiece-tokenizer)

# Tokenizers

## 1. Character Tokenizer

See [librespeech config](../examples/configs/librispeech/characters/char.yml.j2)
See [librespeech config](../examples/datasets/librispeech/characters/char.yml.j2)

This splits the text into characters and then maps each character to an index. The index starts from 1 and 0 is reserved for blank token. This tokenizer only used for languages that have a small number of characters and each character is not a combination of other characters. For example, English, Vietnamese, etc.

## 2. Wordpiece Tokenizer

See [librespeech config](../examples/configs/librispeech/wordpiece/wp.yml.j2) for wordpiece splitted by whitespace
See [librespeech config](../examples/datasets/librispeech/wordpiece/wp.yml.j2) for wordpiece splitted by whitespace

See [librespeech config](../examples/configs/librispeech/wordpiece/wp_whitespace.yml.j2) for wordpiece that whitespace is a separate token
See [librespeech config](../examples/datasets/librispeech/wordpiece/wp_whitespace.yml.j2) for wordpiece that whitespace is a separate token

This splits the text into words and then splits each word into subwords. The subwords are then mapped to indices. Blank token can be set to <unk> as index 0. This tokenizer is used for languages that have a large number of words and each word can be a combination of other words, therefore it can be applied to any language.

## 3. Sentencepiece Tokenizer

See [librespeech config](../examples/configs/librispeech/sentencepiece/sp.yml.j2)
See [librespeech config](../examples/datasets/librispeech/sentencepiece/sp.yml.j2)

This splits the whole sentence into subwords and then maps each subword to an index. Blank token can be set to <unk> as index 0. This tokenizer is used for languages that have a large number of words and each word can be a combination of other words, therefore it can be applied to any language.
18 changes: 13 additions & 5 deletions docs/tutorials/testing.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
- [Testing Tutorial](#testing-tutorial)
- [1. Install packages](#1-install-packages)
- [2. Prepare transcripts files](#2-prepare-transcripts-files)
- [3. Prepare config file](#3-prepare-config-file)
- [4. \[Optional\]\[Required if not exists\] Generate vocabulary and metadata](#4-optionalrequired-if-not-exists-generate-vocabulary-and-metadata)
- [5. Run testing](#5-run-testing)


# Testing Tutorial

These commands are example for librispeech dataset, but we can apply similar to other datasets
Expand All @@ -16,14 +24,14 @@ pip install ".[tf2.x]"
This is the example for preparing transcript files for librispeech data corpus

```bash
python scripts/create_librispeech_trans.py \
tensorflow_asr utils create_librispeech_trans \
--directory=/path/to/dataset/test-clean \
--output=/path/to/dataset/test-clean/transcripts.tsv
```

Do the same thing with `test-clean`, `test-other`

For other datasets, you must prepare your own python script like the `scripts/create_librispeech_trans.py`
For other datasets, you must prepare your own python script like the `tensorflow_asr/scripts/utils/create_librispeech_trans.py`

## 3. Prepare config file

Expand All @@ -38,7 +46,7 @@ The config file is the same as the config used for training
Use the same vocabulary file used in training

```bash
python scripts/prepare_vocab_and_metadata.py \
tensorflow_asr utils prepare_vocab_and_metadata \
--config-path=/path/to/config.yml.j2 \
--datadir=/path/to/datadir
```
Expand All @@ -48,12 +56,12 @@ The inputs, outputs and other options of vocabulary are defined in the config fi
## 5. Run testing

```bash
python examples/test.py \
tensorflow_asr test \
--config-path /path/to/config.yml.j2 \
--dataset_type slice \
--datadir /path/to/datadir \
--outputdir /path/to/modeldir/tests \
--h5 /path/to/modeldir/weights.h5
## See others params
python examples/test.py --help
tensorflow_asr test --help
```
4 changes: 2 additions & 2 deletions docs/tutorials/tflite.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,14 @@
## Conversion

```bash
python3 examples/tflite.py \
tensorflow_asr tflite \
--config-path=/path/to/config.yml.j2 \
--h5=/path/to/weight.h5 \
--bs=1 \ # Batch size
--beam-width=0 \ # Beam width, set >0 to enable beam search
--output=/path/to/output.tflite
## See others params
python examples/tflite.py --help
tensorflow_asr tflite --help
```

## Inference
Expand Down
21 changes: 15 additions & 6 deletions docs/tutorials/training.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,12 @@
- [Training Tutorial](#training-tutorial)
- [1. Install packages](#1-install-packages)
- [2. Prepare transcripts files](#2-prepare-transcripts-files)
- [3. Prepare config file](#3-prepare-config-file)
- [4. \[Optional\]\[Required if using TPUs\] Create tfrecords](#4-optionalrequired-if-using-tpus-create-tfrecords)
- [5. Generate vocabulary and metadata](#5-generate-vocabulary-and-metadata)
- [6. Run training](#6-run-training)


# Training Tutorial

These commands are example for librispeech dataset, but we can apply similar to other datasets
Expand All @@ -16,14 +25,14 @@ pip install ".[tf2.x]"
This is the example for preparing transcript files for librispeech data corpus

```bash
python scripts/create_librispeech_trans.py \
tensorflow_asr utils create_librispeech_trans \
--directory=/path/to/dataset/train-clean-100 \
--output=/path/to/dataset/train-clean-100/transcripts.tsv
```

Do the same thing with `train-clean-360`, `train-other-500`, `dev-clean`, `dev-other`, `test-clean`, `test-other`

For other datasets, you must prepare your own python script like the `scripts/create_librispeech_trans.py`
For other datasets, you must prepare your own python script like the `tensorflow_asr/scripts/utils/create_librispeech_trans.py`

## 3. Prepare config file

Expand All @@ -34,7 +43,7 @@ Please take a look in some examples for config files in `examples/*/*.yml.j2`
## 4. [Optional][Required if using TPUs] Create tfrecords

```bash
python scripts/create_tfrecords.py \
tensorflow_asr utils create_tfrecords \
--config-path=/path/to/config.yml.j2 \
--mode=\["train","eval","test"\] \
--datadir=/path/to/datadir
Expand All @@ -47,7 +56,7 @@ You can reduce the flag `--modes` to `--modes=\["train","eval"\]` to only create
This step requires defining path to vocabulary file and other options for generating vocabulary in config file.

```bash
python scripts/prepare_vocab_and_metadata.py \
tensorflow_asr utils prepare_vocab_and_metadata \
--config-path=/path/to/config.yml.j2 \
--datadir=/path/to/datadir
```
Expand All @@ -58,13 +67,13 @@ The inputs, outputs and other options of vocabulary are defined in the config fi
## 6. Run training

```bash
python examples/train.py \
tensorflow_asr train \
--mxp=auto \
--jit-compile \
--config-path=/path/to/config.yml.j2 \
--dataset-type=tfrecord \
--modeldir=/path/to/modeldir \
--datadir=/path/to/datadir
## See others params
python examples/train.py --help
tensorflow_asr train --help
```
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% set vocabsize = 29 %}
{% set vocabprefix = repodir ~ "/examples/configs/librispeech/characters/english" %}
{% set vocabprefix = repodir ~ "/examples/datasets/librispeech/characters/english" %}
{% set metadata = vocabprefix ~ ".metadata.json" %}

decoder_config:
Expand All @@ -9,6 +9,7 @@ decoder_config:
norm_score: True
lm_config: null
vocabulary: {{vocabprefix}}.vocab
vocab_size: {{vocabsize}}

{% import "examples/configs/librispeech/data.yml.j2" as data_config with context %}
{% import "examples/datasets/librispeech/config.yml.j2" as data_config with context %}
{{data_config}}
Loading