Commit @ac78ddffa3e7b012471704ec1d2be8cb6ed238e5 - yjyoon/whisper_streaming

Dominik Macháček 2024-01-02

buffer trimming option, sent. segmenter not required anymore

- both for whisper_online + server
- removed argparse code repetition
- README updated

@ac78ddffa3e7b012471704ec1d2be8cb6ed238e5

d6ec999

ac78ddf

README.md

--- README.md

+++ README.md


 
 The backend is loaded only when chosen. The unused one does not have to be installed.
 
-3) Sentence segmenter (aka sentence tokenizer) 
+3) Optional, not recommended: sentence segmenter (aka sentence tokenizer) 
 
-It splits punctuated text to sentences by full stops, avoiding the dots that are not full stops. The segmenters are language specific.
-The unused one does not have to be installed. We integrate the following segmenters, but suggestions for better alternatives are welcome.
+Two buffer trimming options are integrated and evaluated. They have impact on
+the quality and latency. The default "segment" option performs better according
+to our tests and does not require any sentence segmentation installed. 
+
+The other option, "sentence" -- trimming at the end of confirmed sentences,
+requires sentence segmenter installed.  It splits punctuated text to sentences by full
+stops, avoiding the dots that are not full stops. The segmenters are language
+specific.  The unused one does not have to be installed. We integrate the
+following segmenters, but suggestions for better alternatives are welcome.
 
 - `pip install opus-fast-mosestokenizer` for the languages with codes `as bn ca cs de el en es et fi fr ga gu hi hu is it kn lt lv ml mni mr nl or pa pl pt ro ru sk sl sv ta te yue zh`
 

 
 - we did not find a segmenter for languages `as ba bo br bs fo haw hr ht jw lb ln lo mi nn oc sa sd sn so su sw tk tl tt` that are supported by Whisper and not by wtpsplit. The default fallback option for them is wtpsplit with unspecified language. Alternative suggestions welcome.
 
+In case of installation issues of opus-fast-mosestokenizer, especially on Windows and Mac, we recommend using only the "segment" option that does not require it.
 
 ## Usage
 
 ### Real-time simulation from audio file
 
 ```
-usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}]
-                         [--start_at START_AT] [--backend {faster-whisper,whisper_timestamped}] [--offline] [--comp_unaware] [--vad]
+usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large}] [--model_cache_dir MODEL_CACHE_DIR]
+                         [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}] [--start_at START_AT] [--backend {faster-whisper,whisper_timestamped}] [--vad]
+                         [--buffer_trimming {sentence,segment}] [--buffer_trimming_sec BUFFER_TRIMMING_SEC] [--offline] [--comp_unaware]
                          audio_path
 
 positional arguments:

 options:
   -h, --help            show this help message and exit
   --min-chunk-size MIN_CHUNK_SIZE
-                        Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was received by this time.
-  --model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large}
+                        Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was
+                        received by this time.
+  --model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large}
                         Name size of the Whisper model to use (default: large-v2). The model is automatically downloaded from the model hub if not present in model cache dir.
   --model_cache_dir MODEL_CACHE_DIR
                         Overriding the default model cache dir where models downloaded from the hub are saved

   --start_at START_AT   Start processing audio at this time.
   --backend {faster-whisper,whisper_timestamped}
                         Load only this backend for Whisper processing.
+  --vad                 Use VAD = voice activity detection, with the default parameters.
+  --buffer_trimming {sentence,segment}
+                        Buffer trimming strategy -- trim completed sentences marked with punctuation mark and detected by sentence segmenter, or the completed segments returned by Whisper. Sentence segmenter
+                        must be installed for "sentence" option.
+  --buffer_trimming_sec BUFFER_TRIMMING_SEC
+                        Buffer trimming length threshold in seconds. If buffer length is longer, trimming sentence/segment is triggered.
   --offline             Offline mode.
   --comp_unaware        Computationally unaware simulation.
-  --vad                 Use VAD = voice activity detection, with the default parameters.
 ```
 
 Example:

 The code whisper_online.py is nicely commented, read it as the full documentation.
 
 
-This pseudocode describes the interface that we suggest for your implementation. You can implement e.g. audio from mic or stdin, server-client, etc.
+This pseudocode describes the interface that we suggest for your implementation. You can implement any features that you need for your application.
 
 ```
 from whisper_online import *

 # asr.set_translate_task()  # it will translate from lan into English
 # asr.use_vad()  # set using VAD
 
-tokenizer = create_tokenizer(tgt_lan)  # sentence segmenter for the target language
-
-online = OnlineASRProcessor(asr, tokenizer)  # create processing object
-
+online = OnlineASRProcessor(asr)  # create processing object with default buffer trimming option
 
 while audio_has_not_ended:   # processing loop:
 	a = # receive new audio chunk (and e.g. wait for min_chunk_size seconds first, ...)

 
 Contributions are welcome.
 
-### Tests
+### Performance evaluation
 
-[See the results in paper.](http://www.afnlp.org/conferences/ijcnlp2023/proceedings/main-demo/cdrom/pdf/2023.ijcnlp-demo.3.pdf)
+[See the paper.](http://www.afnlp.org/conferences/ijcnlp2023/proceedings/main-demo/cdrom/pdf/2023.ijcnlp-demo.3.pdf)
+
 
 ## Contact
 

d6ec999

ac78ddf

whisper_online.py

--- whisper_online.py

+++ whisper_online.py


 
     SAMPLING_RATE = 16000
 
-    def __init__(self, asr, tokenizer=None, logfile=sys.stderr, buffer_trimming=("segment", 15)):
+    def __init__(self, asr, tokenizer=None, buffer_trimming=("segment", 15), logfile=sys.stderr):
         """asr: WhisperASR object
-        tokenizer: sentence tokenizer object for the target language. Must have a method *split* that behaves like the one of MosesTokenizer.
+        tokenizer: sentence tokenizer object for the target language. Must have a method *split* that behaves like the one of MosesTokenizer. It can be None, if "segment" buffer trimming option is used, then tokenizer is not used at all.
+        ("segment", 15)
+        buffer_trimming: a pair of (option, seconds), where option is either "sentence" or "segment", and seconds is a number. Buffer is trimmed if it is longer than "seconds" threshold. Default is the most recommended option.
         logfile: where to store the log. 
         """
         self.asr = asr

     return WtPtok()
 
 
-
+def add_shared_args(parser):
+    """shared args for simulation (this entry point) and server
+    parser: argparse.ArgumentParser object
+    """
+    parser.add_argument('--min-chunk-size', type=float, default=1.0, help='Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was received by this time.')
+    parser.add_argument('--model', type=str, default='large-v2', choices="tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large".split(","),help="Name size of the Whisper model to use (default: large-v2). The model is automatically downloaded from the model hub if not present in model cache dir.")
+    parser.add_argument('--model_cache_dir', type=str, default=None, help="Overriding the default model cache dir where models downloaded from the hub are saved")
+    parser.add_argument('--model_dir', type=str, default=None, help="Dir where Whisper model.bin and other files are saved. This option overrides --model and --model_cache_dir parameter.")
+    parser.add_argument('--lan', '--language', type=str, default='en', help="Language code for transcription, e.g. en,de,cs.")
+    parser.add_argument('--task', type=str, default='transcribe', choices=["transcribe","translate"],help="Transcribe or translate.")
+    parser.add_argument('--start_at', type=float, default=0.0, help='Start processing audio at this time.')
+    parser.add_argument('--backend', type=str, default="faster-whisper", choices=["faster-whisper", "whisper_timestamped"],help='Load only this backend for Whisper processing.')
+    parser.add_argument('--vad', action="store_true", default=False, help='Use VAD = voice activity detection, with the default parameters.')
+    parser.add_argument('--buffer_trimming', type=str, default="segment", choices=["sentence", "segment"],help='Buffer trimming strategy -- trim completed sentences marked with punctuation mark and detected by sentence segmenter, or the completed segments returned by Whisper. Sentence segmenter must be installed for "sentence" option.')
+    parser.add_argument('--buffer_trimming_sec', type=float, default=15, help='Buffer trimming length threshold in seconds. If buffer length is longer, trimming sentence/segment is triggered.')
 
 ## main:
 

     import argparse
     parser = argparse.ArgumentParser()
     parser.add_argument('audio_path', type=str, help="Filename of 16kHz mono channel wav, on which live streaming is simulated.")
-    parser.add_argument('--min-chunk-size', type=float, default=1.0, help='Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was received by this time.')
-    parser.add_argument('--model', type=str, default='large-v2', choices="tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large".split(","),help="Name size of the Whisper model to use (default: large-v2). The model is automatically downloaded from the model hub if not present in model cache dir.")
-    parser.add_argument('--model_cache_dir', type=str, default=None, help="Overriding the default model cache dir where models downloaded from the hub are saved")
-    parser.add_argument('--model_dir', type=str, default=None, help="Dir where Whisper model.bin and other files are saved. This option overrides --model and --model_cache_dir parameter.")
-    parser.add_argument('--lan', '--language', type=str, default='en', help="Language code for transcription, e.g. en,de,cs.")
-    parser.add_argument('--task', type=str, default='transcribe', choices=["transcribe","translate"],help="Transcribe or translate.")
-    parser.add_argument('--start_at', type=float, default=0.0, help='Start processing audio at this time.')
-    parser.add_argument('--backend', type=str, default="faster-whisper", choices=["faster-whisper", "whisper_timestamped"],help='Load only this backend for Whisper processing.')
+    add_shared_args(parser)
     parser.add_argument('--offline', action="store_true", default=False, help='Offline mode.')
     parser.add_argument('--comp_unaware', action="store_true", default=False, help='Computationally unaware simulation.')
-    parser.add_argument('--vad', action="store_true", default=False, help='Use VAD = voice activity detection, with the default parameters.')
-    parser.add_argument('--buffer_trimming', type=str, default="sentence", choices=["sentence", "segment"],help='Buffer trimming strategy')
-    parser.add_argument('--buffer_trimming_sec', type=float, default=15, help='Buffer trimming lenght threshold in seconds. If buffer length longer, trimming sentence/segment is triggered.')
+
+    
     args = parser.parse_args()
 
     # reset to store stderr to different file stream, e.g. open(os.devnull,"w")

 
     
     min_chunk = args.min_chunk_size
-    online = OnlineASRProcessor(asr,create_tokenizer(tgt_language),logfile=logfile,buffer_trimming=(args.buffer_trimming, args.buffer_trimming_sec))
+    if args.buffer_trimming == "sentence":
+        tokenizer = create_tokenizer(tgt_language)
+    else:
+        tokenizer = None
+    online = OnlineASRProcessor(asr,tokenizer,logfile=logfile,buffer_trimming=(args.buffer_trimming, args.buffer_trimming_sec))
 
 
     # load the audio into the LRU cache before we start the timer

d6ec999

ac78ddf

whisper_online_server.py

--- whisper_online_server.py

+++ whisper_online_server.py


 
 
 # options from whisper_online
-# TODO: code repetition
-
-parser.add_argument('--min-chunk-size', type=float, default=1.0, help='Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was received by this time.')
-parser.add_argument('--model', type=str, default='large-v2', choices="tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large".split(","),help="Name size of the Whisper model to use (default: large-v2). The model is automatically downloaded from the model hub if not present in model cache dir.")
-parser.add_argument('--model_cache_dir', type=str, default=None, help="Overriding the default model cache dir where models downloaded from the hub are saved")
-parser.add_argument('--model_dir', type=str, default=None, help="Dir where Whisper model.bin and other files are saved. This option overrides --model and --model_cache_dir parameter.")
-parser.add_argument('--lan', '--language', type=str, default='en', help="Language code for transcription, e.g. en,de,cs.")
-parser.add_argument('--task', type=str, default='transcribe', choices=["transcribe","translate"],help="Transcribe or translate.")
-parser.add_argument('--backend', type=str, default="faster-whisper", choices=["faster-whisper", "whisper_timestamped"],help='Load only this backend for Whisper processing.')
-parser.add_argument('--vad', action="store_true", default=False, help='Use VAD = voice activity detection, with the default parameters.')
+add_shared_args(parser)
 args = parser.parse_args()
 
 

 
 
 min_chunk = args.min_chunk_size
-online = OnlineASRProcessor(asr,create_tokenizer(tgt_language))
+
+if args.buffer_trimming == "sentence":
+    tokenizer = create_tokenizer(tgt_language)
+else:
+    tokenizer = None
+online = OnlineASRProcessor(asr,tokenizer,buffer_trimming=(args.buffer_trimming, args.buffer_trimming_sec))
 
 
 

Add a comment

Open 0
Closed 0

List

...	...	@@ -41,10 +41,17 @@
41	41
42	42	The backend is loaded only when chosen. The unused one does not have to be installed.
43	43
44		-3) Sentence segmenter (aka sentence tokenizer)
	44	+3) Optional, not recommended: sentence segmenter (aka sentence tokenizer)
45	45
46		-It splits punctuated text to sentences by full stops, avoiding the dots that are not full stops. The segmenters are language specific.
47		-The unused one does not have to be installed. We integrate the following segmenters, but suggestions for better alternatives are welcome.
	46	+Two buffer trimming options are integrated and evaluated. They have impact on
	47	+the quality and latency. The default "segment" option performs better according
	48	+to our tests and does not require any sentence segmentation installed.
	49	+
	50	+The other option, "sentence" -- trimming at the end of confirmed sentences,
	51	+requires sentence segmenter installed. It splits punctuated text to sentences by full
	52	+stops, avoiding the dots that are not full stops. The segmenters are language
	53	+specific. The unused one does not have to be installed. We integrate the
	54	+following segmenters, but suggestions for better alternatives are welcome.
48	55
49	56	- `pip install opus-fast-mosestokenizer` for the languages with codes `as bn ca cs de el en es et fi fr ga gu hi hu is it kn lt lv ml mni mr nl or pa pl pt ro ru sk sl sv ta te yue zh`
50	57
...	...	@@ -54,14 +61,16 @@
54	61
55	62	- we did not find a segmenter for languages `as ba bo br bs fo haw hr ht jw lb ln lo mi nn oc sa sd sn so su sw tk tl tt` that are supported by Whisper and not by wtpsplit. The default fallback option for them is wtpsplit with unspecified language. Alternative suggestions welcome.
56	63
	64	+In case of installation issues of opus-fast-mosestokenizer, especially on Windows and Mac, we recommend using only the "segment" option that does not require it.
57	65
58	66	## Usage
59	67
60	68	### Real-time simulation from audio file
61	69
62	70	```
63		-usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}]
64		- [--start_at START_AT] [--backend {faster-whisper,whisper_timestamped}] [--offline] [--comp_unaware] [--vad]
	71	+usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large}] [--model_cache_dir MODEL_CACHE_DIR]
	72	+ [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}] [--start_at START_AT] [--backend {faster-whisper,whisper_timestamped}] [--vad]
	73	+ [--buffer_trimming {sentence,segment}] [--buffer_trimming_sec BUFFER_TRIMMING_SEC] [--offline] [--comp_unaware]
65	74	audio_path
66	75
67	76	positional arguments:
...	...	@@ -70,8 +79,9 @@
70	79	options:
71	80	-h, --help show this help message and exit
72	81	--min-chunk-size MIN_CHUNK_SIZE
73		- Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was received by this time.
74		- --model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large}
	82	+ Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was
	83	+ received by this time.
	84	+ --model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large}
75	85	Name size of the Whisper model to use (default: large-v2). The model is automatically downloaded from the model hub if not present in model cache dir.
76	86	--model_cache_dir MODEL_CACHE_DIR
77	87	Overriding the default model cache dir where models downloaded from the hub are saved
...	...	@@ -84,9 +94,14 @@
84	94	--start_at START_AT Start processing audio at this time.
85	95	--backend {faster-whisper,whisper_timestamped}
86	96	Load only this backend for Whisper processing.
	97	+ --vad Use VAD = voice activity detection, with the default parameters.
	98	+ --buffer_trimming {sentence,segment}
	99	+ Buffer trimming strategy -- trim completed sentences marked with punctuation mark and detected by sentence segmenter, or the completed segments returned by Whisper. Sentence segmenter
	100	+ must be installed for "sentence" option.
	101	+ --buffer_trimming_sec BUFFER_TRIMMING_SEC
	102	+ Buffer trimming length threshold in seconds. If buffer length is longer, trimming sentence/segment is triggered.
87	103	--offline Offline mode.
88	104	--comp_unaware Computationally unaware simulation.
89		- --vad Use VAD = voice activity detection, with the default parameters.
90	105	```
91	106
92	107	Example:
...	...	@@ -133,7 +148,7 @@
133	148	The code whisper_online.py is nicely commented, read it as the full documentation.
134	149
135	150
136		-This pseudocode describes the interface that we suggest for your implementation. You can implement e.g. audio from mic or stdin, server-client, etc.
	151	+This pseudocode describes the interface that we suggest for your implementation. You can implement any features that you need for your application.
137	152
138	153	```
139	154	from whisper_online import *
...	...	@@ -146,10 +161,7 @@
146	161	# asr.set_translate_task() # it will translate from lan into English
147	162	# asr.use_vad() # set using VAD
148	163
149		-tokenizer = create_tokenizer(tgt_lan) # sentence segmenter for the target language
150		-
151		-online = OnlineASRProcessor(asr, tokenizer) # create processing object
152		-
	164	+online = OnlineASRProcessor(asr) # create processing object with default buffer trimming option
153	165
154	166	while audio_has_not_ended: # processing loop:
155	167	a = # receive new audio chunk (and e.g. wait for min_chunk_size seconds first, ...)
...	...	@@ -209,9 +221,10 @@
209	221
210	222	Contributions are welcome.
211	223
212		-### Tests
	224	+### Performance evaluation
213	225
214		-[See the results in paper.](http://www.afnlp.org/conferences/ijcnlp2023/proceedings/main-demo/cdrom/pdf/2023.ijcnlp-demo.3.pdf)
	226	+[See the paper.](http://www.afnlp.org/conferences/ijcnlp2023/proceedings/main-demo/cdrom/pdf/2023.ijcnlp-demo.3.pdf)
	227	+
215	228
216	229	## Contact
217	230

...	...	@@ -212,9 +212,11 @@
212	212
213	213	SAMPLING_RATE = 16000
214	214
215		- def __init__(self, asr, tokenizer=None, logfile=sys.stderr, buffer_trimming=("segment", 15)):
	215	+ def __init__(self, asr, tokenizer=None, buffer_trimming=("segment", 15), logfile=sys.stderr):
216	216	"""asr: WhisperASR object
217		- tokenizer: sentence tokenizer object for the target language. Must have a method split that behaves like the one of MosesTokenizer.
	217	+ tokenizer: sentence tokenizer object for the target language. Must have a method split that behaves like the one of MosesTokenizer. It can be None, if "segment" buffer trimming option is used, then tokenizer is not used at all.
	218	+ ("segment", 15)
	219	+ buffer_trimming: a pair of (option, seconds), where option is either "sentence" or "segment", and seconds is a number. Buffer is trimmed if it is longer than "seconds" threshold. Default is the most recommended option.
218	220	logfile: where to store the log.
219	221	"""
220	222	self.asr = asr
...	...	@@ -441,7 +443,21 @@
441	443	return WtPtok()
442	444
443	445
444		-
	446	+def add_shared_args(parser):
	447	+ """shared args for simulation (this entry point) and server
	448	+ parser: argparse.ArgumentParser object
	449	+ """
	450	+ parser.add_argument('--min-chunk-size', type=float, default=1.0, help='Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was received by this time.')
	451	+ parser.add_argument('--model', type=str, default='large-v2', choices="tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large".split(","),help="Name size of the Whisper model to use (default: large-v2). The model is automatically downloaded from the model hub if not present in model cache dir.")
	452	+ parser.add_argument('--model_cache_dir', type=str, default=None, help="Overriding the default model cache dir where models downloaded from the hub are saved")
	453	+ parser.add_argument('--model_dir', type=str, default=None, help="Dir where Whisper model.bin and other files are saved. This option overrides --model and --model_cache_dir parameter.")
	454	+ parser.add_argument('--lan', '--language', type=str, default='en', help="Language code for transcription, e.g. en,de,cs.")
	455	+ parser.add_argument('--task', type=str, default='transcribe', choices=["transcribe","translate"],help="Transcribe or translate.")
	456	+ parser.add_argument('--start_at', type=float, default=0.0, help='Start processing audio at this time.')
	457	+ parser.add_argument('--backend', type=str, default="faster-whisper", choices=["faster-whisper", "whisper_timestamped"],help='Load only this backend for Whisper processing.')
	458	+ parser.add_argument('--vad', action="store_true", default=False, help='Use VAD = voice activity detection, with the default parameters.')
	459	+ parser.add_argument('--buffer_trimming', type=str, default="segment", choices=["sentence", "segment"],help='Buffer trimming strategy -- trim completed sentences marked with punctuation mark and detected by sentence segmenter, or the completed segments returned by Whisper. Sentence segmenter must be installed for "sentence" option.')
	460	+ parser.add_argument('--buffer_trimming_sec', type=float, default=15, help='Buffer trimming length threshold in seconds. If buffer length is longer, trimming sentence/segment is triggered.')
445	461
446	462	## main:
447	463
...	...	@@ -450,19 +466,11 @@
450	466	import argparse
451	467	parser = argparse.ArgumentParser()
452	468	parser.add_argument('audio_path', type=str, help="Filename of 16kHz mono channel wav, on which live streaming is simulated.")
453		- parser.add_argument('--min-chunk-size', type=float, default=1.0, help='Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was received by this time.')
454		- parser.add_argument('--model', type=str, default='large-v2', choices="tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large".split(","),help="Name size of the Whisper model to use (default: large-v2). The model is automatically downloaded from the model hub if not present in model cache dir.")
455		- parser.add_argument('--model_cache_dir', type=str, default=None, help="Overriding the default model cache dir where models downloaded from the hub are saved")
456		- parser.add_argument('--model_dir', type=str, default=None, help="Dir where Whisper model.bin and other files are saved. This option overrides --model and --model_cache_dir parameter.")
457		- parser.add_argument('--lan', '--language', type=str, default='en', help="Language code for transcription, e.g. en,de,cs.")
458		- parser.add_argument('--task', type=str, default='transcribe', choices=["transcribe","translate"],help="Transcribe or translate.")
459		- parser.add_argument('--start_at', type=float, default=0.0, help='Start processing audio at this time.')
460		- parser.add_argument('--backend', type=str, default="faster-whisper", choices=["faster-whisper", "whisper_timestamped"],help='Load only this backend for Whisper processing.')
	469	+ add_shared_args(parser)
461	470	parser.add_argument('--offline', action="store_true", default=False, help='Offline mode.')
462	471	parser.add_argument('--comp_unaware', action="store_true", default=False, help='Computationally unaware simulation.')
463		- parser.add_argument('--vad', action="store_true", default=False, help='Use VAD = voice activity detection, with the default parameters.')
464		- parser.add_argument('--buffer_trimming', type=str, default="sentence", choices=["sentence", "segment"],help='Buffer trimming strategy')
465		- parser.add_argument('--buffer_trimming_sec', type=float, default=15, help='Buffer trimming lenght threshold in seconds. If buffer length longer, trimming sentence/segment is triggered.')
	472	+
	473	+
466	474	args = parser.parse_args()
467	475
468	476	# reset to store stderr to different file stream, e.g. open(os.devnull,"w")
...	...	@@ -507,7 +515,11 @@
507	515
508	516
509	517	min_chunk = args.min_chunk_size
510		- online = OnlineASRProcessor(asr,create_tokenizer(tgt_language),logfile=logfile,buffer_trimming=(args.buffer_trimming, args.buffer_trimming_sec))
	518	+ if args.buffer_trimming == "sentence":
	519	+ tokenizer = create_tokenizer(tgt_language)
	520	+ else:
	521	+ tokenizer = None
	522	+ online = OnlineASRProcessor(asr,tokenizer,logfile=logfile,buffer_trimming=(args.buffer_trimming, args.buffer_trimming_sec))
511	523
512	524
513	525	# load the audio into the LRU cache before we start the timer

...	...	@@ -12,16 +12,7 @@
12	12
13	13
14	14	# options from whisper_online
15		-# TODO: code repetition
16		-
17		-parser.add_argument('--min-chunk-size', type=float, default=1.0, help='Minimum audio chunk size in seconds. It waits up to this time to do processing. If the processing takes shorter time, it waits, otherwise it processes the whole segment that was received by this time.')
18		-parser.add_argument('--model', type=str, default='large-v2', choices="tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large".split(","),help="Name size of the Whisper model to use (default: large-v2). The model is automatically downloaded from the model hub if not present in model cache dir.")
19		-parser.add_argument('--model_cache_dir', type=str, default=None, help="Overriding the default model cache dir where models downloaded from the hub are saved")
20		-parser.add_argument('--model_dir', type=str, default=None, help="Dir where Whisper model.bin and other files are saved. This option overrides --model and --model_cache_dir parameter.")
21		-parser.add_argument('--lan', '--language', type=str, default='en', help="Language code for transcription, e.g. en,de,cs.")
22		-parser.add_argument('--task', type=str, default='transcribe', choices=["transcribe","translate"],help="Transcribe or translate.")
23		-parser.add_argument('--backend', type=str, default="faster-whisper", choices=["faster-whisper", "whisper_timestamped"],help='Load only this backend for Whisper processing.')
24		-parser.add_argument('--vad', action="store_true", default=False, help='Use VAD = voice activity detection, with the default parameters.')
	15	+add_shared_args(parser)
25	16	args = parser.parse_args()
26	17
27	18
...	...	@@ -61,7 +52,12 @@
61	52
62	53
63	54	min_chunk = args.min_chunk_size
64		-online = OnlineASRProcessor(asr,create_tokenizer(tgt_language))
	55	+
	56	+if args.buffer_trimming == "sentence":
	57	+ tokenizer = create_tokenizer(tgt_language)
	58	+else:
	59	+ tokenizer = None
	60	+online = OnlineASRProcessor(asr,tokenizer,buffer_trimming=(args.buffer_trimming, args.buffer_trimming_sec))
65	61
66	62
67	63

Delete comment