Commit @1949e5f018e452c6595e61a6c5b08dea3a117b0b - yjyoon/whisper_streaming

Tijs Zwinkels 2024-01-25

Update documentation to include openai-api backend

@1949e5f018e452c6595e61a6c5b08dea3a117b0b

7ed1d45

1949e5f

README.md

--- README.md

+++ README.md


 
 ## Installation
 
-1) ``pip install librosa`` -- audio processing library
+1) ``pip install librosa soundfile`` -- audio processing library
 
 2) Whisper backend.
 
-Two alternative backends are integrated. The most recommended one is [faster-whisper](https://github.com/guillaumekln/faster-whisper) with GPU support. Follow their instructions for NVIDIA libraries -- we succeeded with CUDNN 8.5.0 and CUDA 11.7. Install with `pip install faster-whisper`.
+ Several alternative backends are integrated. The most recommended one is [faster-whisper](https://github.com/guillaumekln/faster-whisper) with GPU support. Follow their instructions for NVIDIA libraries -- we succeeded with CUDNN 8.5.0 and CUDA 11.7. Install with `pip install faster-whisper`.
 
 Alternative, less restrictive, but slower backend is [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped): `pip install git+https://github.com/linto-ai/whisper-timestamped`
+
+Thirdly, it's also possible to run this software from the [OpenAI Whisper API](https://platform.openai.com/docs/api-reference/audio/createTranscription). This solution is fast and requires no GPU, just a small VM will suffice, but you will need to pay OpenAI for api access. Also note that, since each audio fragment is processed multiple times, the [price](https://openai.com/pricing) will be higher than obvious from the pricing page, so keep an eye on costs while using. Setting a higher chunk-size will reduce costs significantly. 
+Install with: `pip install openai`
+
+For running with the openai-api backend, make sure that your [OpenAI api key](https://platform.openai.com/api-keys) is set in the `OPENAI_API_KEY` environment variable. For example, before running, do: `export OPENAI_API_KEY=sk-xxx` with *sk-xxx* replaced with your api key. 
 
 The backend is loaded only when chosen. The unused one does not have to be installed.
 

 
 ```
 usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}]
-                         [--backend {faster-whisper,whisper_timestamped}] [--vad] [--buffer_trimming {sentence,segment}] [--buffer_trimming_sec BUFFER_TRIMMING_SEC] [--start_at START_AT] [--offline] [--comp_unaware]
+                         [--backend {faster-whisper,whisper_timestamped,openai-api}] [--vad] [--buffer_trimming {sentence,segment}] [--buffer_trimming_sec BUFFER_TRIMMING_SEC] [--start_at START_AT] [--offline] [--comp_unaware]
                          audio_path
 
 positional arguments:

                         Source language code, e.g. en,de,cs, or 'auto' for language detection.
   --task {transcribe,translate}
                         Transcribe or translate.
-  --backend {faster-whisper,whisper_timestamped}
+  --backend {faster-whisper,whisper_timestamped,openai-api}
                         Load only this backend for Whisper processing.
   --vad                 Use VAD = voice activity detection, with the default parameters.
   --buffer_trimming {sentence,segment}

Add a comment

Open 0
Closed 0

List

...	...	@@ -31,13 +31,18 @@
31	31
32	32	## Installation
33	33
34		-1) ``pip install librosa`` -- audio processing library
	34	+1) ``pip install librosa soundfile`` -- audio processing library
35	35
36	36	2) Whisper backend.
37	37
38		-Two alternative backends are integrated. The most recommended one is [faster-whisper](https://github.com/guillaumekln/faster-whisper) with GPU support. Follow their instructions for NVIDIA libraries -- we succeeded with CUDNN 8.5.0 and CUDA 11.7. Install with `pip install faster-whisper`.
	38	+ Several alternative backends are integrated. The most recommended one is [faster-whisper](https://github.com/guillaumekln/faster-whisper) with GPU support. Follow their instructions for NVIDIA libraries -- we succeeded with CUDNN 8.5.0 and CUDA 11.7. Install with `pip install faster-whisper`.
39	39
40	40	Alternative, less restrictive, but slower backend is [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped): `pip install git+https://github.com/linto-ai/whisper-timestamped`
	41	+
	42	+Thirdly, it's also possible to run this software from the [OpenAI Whisper API](https://platform.openai.com/docs/api-reference/audio/createTranscription). This solution is fast and requires no GPU, just a small VM will suffice, but you will need to pay OpenAI for api access. Also note that, since each audio fragment is processed multiple times, the [price](https://openai.com/pricing) will be higher than obvious from the pricing page, so keep an eye on costs while using. Setting a higher chunk-size will reduce costs significantly.
	43	+Install with: `pip install openai`
	44	+
	45	+For running with the openai-api backend, make sure that your [OpenAI api key](https://platform.openai.com/api-keys) is set in the `OPENAI_API_KEY` environment variable. For example, before running, do: `export OPENAI_API_KEY=sk-xxx` with sk-xxx replaced with your api key.
41	46
42	47	The backend is loaded only when chosen. The unused one does not have to be installed.
43	48
...	...	@@ -69,7 +74,7 @@
69	74
70	75	```
71	76	usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}]
72		- [--backend {faster-whisper,whisper_timestamped}] [--vad] [--buffer_trimming {sentence,segment}] [--buffer_trimming_sec BUFFER_TRIMMING_SEC] [--start_at START_AT] [--offline] [--comp_unaware]
	77	+ [--backend {faster-whisper,whisper_timestamped,openai-api}] [--vad] [--buffer_trimming {sentence,segment}] [--buffer_trimming_sec BUFFER_TRIMMING_SEC] [--start_at START_AT] [--offline] [--comp_unaware]
73	78	audio_path
74	79
75	80	positional arguments:
...	...	@@ -89,7 +94,7 @@
89	94	Source language code, e.g. en,de,cs, or 'auto' for language detection.
90	95	--task {transcribe,translate}
91	96	Transcribe or translate.
92		- --backend {faster-whisper,whisper_timestamped}
	97	+ --backend {faster-whisper,whisper_timestamped,openai-api}
93	98	Load only this backend for Whisper processing.
94	99	--vad Use VAD = voice activity detection, with the default parameters.
95	100	--buffer_trimming {sentence,segment}

Delete comment