

Update documentation to include openai-api backend
@1949e5f018e452c6595e61a6c5b08dea3a117b0b
--- README.md
+++ README.md
... | ... | @@ -31,13 +31,18 @@ |
31 | 31 |
|
32 | 32 |
## Installation |
33 | 33 |
|
34 |
-1) ``pip install librosa`` -- audio processing library |
|
34 |
+1) ``pip install librosa soundfile`` -- audio processing library |
|
35 | 35 |
|
36 | 36 |
2) Whisper backend. |
37 | 37 |
|
38 |
-Two alternative backends are integrated. The most recommended one is [faster-whisper](https://github.com/guillaumekln/faster-whisper) with GPU support. Follow their instructions for NVIDIA libraries -- we succeeded with CUDNN 8.5.0 and CUDA 11.7. Install with `pip install faster-whisper`. |
|
38 |
+ Several alternative backends are integrated. The most recommended one is [faster-whisper](https://github.com/guillaumekln/faster-whisper) with GPU support. Follow their instructions for NVIDIA libraries -- we succeeded with CUDNN 8.5.0 and CUDA 11.7. Install with `pip install faster-whisper`. |
|
39 | 39 |
|
40 | 40 |
Alternative, less restrictive, but slower backend is [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped): `pip install git+https://github.com/linto-ai/whisper-timestamped` |
41 |
+ |
|
42 |
+Thirdly, it's also possible to run this software from the [OpenAI Whisper API](https://platform.openai.com/docs/api-reference/audio/createTranscription). This solution is fast and requires no GPU, just a small VM will suffice, but you will need to pay OpenAI for api access. Also note that, since each audio fragment is processed multiple times, the [price](https://openai.com/pricing) will be higher than obvious from the pricing page, so keep an eye on costs while using. Setting a higher chunk-size will reduce costs significantly. |
|
43 |
+Install with: `pip install openai` |
|
44 |
+ |
|
45 |
+For running with the openai-api backend, make sure that your [OpenAI api key](https://platform.openai.com/api-keys) is set in the `OPENAI_API_KEY` environment variable. For example, before running, do: `export OPENAI_API_KEY=sk-xxx` with *sk-xxx* replaced with your api key. |
|
41 | 46 |
|
42 | 47 |
The backend is loaded only when chosen. The unused one does not have to be installed. |
43 | 48 |
|
... | ... | @@ -69,7 +74,7 @@ |
69 | 74 |
|
70 | 75 |
``` |
71 | 76 |
usage: whisper_online.py [-h] [--min-chunk-size MIN_CHUNK_SIZE] [--model {tiny.en,tiny,base.en,base,small.en,small,medium.en,medium,large-v1,large-v2,large-v3,large}] [--model_cache_dir MODEL_CACHE_DIR] [--model_dir MODEL_DIR] [--lan LAN] [--task {transcribe,translate}] |
72 |
- [--backend {faster-whisper,whisper_timestamped}] [--vad] [--buffer_trimming {sentence,segment}] [--buffer_trimming_sec BUFFER_TRIMMING_SEC] [--start_at START_AT] [--offline] [--comp_unaware] |
|
77 |
+ [--backend {faster-whisper,whisper_timestamped,openai-api}] [--vad] [--buffer_trimming {sentence,segment}] [--buffer_trimming_sec BUFFER_TRIMMING_SEC] [--start_at START_AT] [--offline] [--comp_unaware] |
|
73 | 78 |
audio_path |
74 | 79 |
|
75 | 80 |
positional arguments: |
... | ... | @@ -89,7 +94,7 @@ |
89 | 94 |
Source language code, e.g. en,de,cs, or 'auto' for language detection. |
90 | 95 |
--task {transcribe,translate} |
91 | 96 |
Transcribe or translate. |
92 |
- --backend {faster-whisper,whisper_timestamped} |
|
97 |
+ --backend {faster-whisper,whisper_timestamped,openai-api} |
|
93 | 98 |
Load only this backend for Whisper processing. |
94 | 99 |
--vad Use VAD = voice activity detection, with the default parameters. |
95 | 100 |
--buffer_trimming {sentence,segment} |
Add a comment
Delete comment
Once you delete this comment, you won't be able to recover it. Are you sure you want to delete this comment?