

docs: usage pages (and more)
@8f4851657b67a4ba51d7e0ea25af381dbc219e43
--- .pre-commit-config.yaml
+++ .pre-commit-config.yaml
... | ... | @@ -44,4 +44,4 @@ |
44 | 44 |
rev: v1.5.0 |
45 | 45 |
hooks: |
46 | 46 |
- id: detect-secrets |
47 |
- exclude: 'README.md|tests/conftest.py|docs/usage.md' |
|
47 |
+ exclude: 'README.md|tests/conftest.py|docs/usage/*' |
--- README.md
+++ README.md
... | ... | @@ -78,15 +78,15 @@ |
78 | 78 |
### OpenAI API Python SDK |
79 | 79 |
|
80 | 80 |
```python |
81 |
+from pathlib import Path |
|
82 |
+ |
|
81 | 83 |
from openai import OpenAI |
82 | 84 |
|
83 |
-client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/") |
|
85 |
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty") |
|
84 | 86 |
|
85 |
-audio_file = open("audio.wav", "rb") |
|
86 |
-transcript = client.audio.transcriptions.create( |
|
87 |
- model="Systran/faster-distil-whisper-large-v3", file=audio_file |
|
88 |
-) |
|
89 |
-print(transcript.text) |
|
87 |
+with Path("audio.wav").open("rb") as f: |
|
88 |
+ transcript = client.audio.transcriptions.create(model="Systran/faster-distil-whisper-large-v3", file=f) |
|
89 |
+ print(transcript.text) |
|
90 | 90 |
``` |
91 | 91 |
|
92 | 92 |
### cURL |
--- docs/introduction.md
+++ docs/introduction.md
... | ... | @@ -2,18 +2,26 @@ |
2 | 2 |
|
3 | 3 |
Under development. I don't yet recommend using these docs as reference for now. |
4 | 4 |
|
5 |
+TODO: add HuggingFace Space URL |
|
6 |
+ |
|
5 | 7 |
# Faster Whisper Server |
6 | 8 |
|
7 |
-`faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend. |
|
8 |
-Features: |
|
9 |
+`faster-whisper-server` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used. |
|
10 |
+ |
|
11 |
+## Features: |
|
9 | 12 |
|
10 | 13 |
- GPU and CPU support. |
11 |
-- Easily deployable using Docker. |
|
12 |
-- **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**. |
|
13 |
-- OpenAI API compatible. |
|
14 |
+- [Deployable via Docker Compose / Docker](./installation.md) |
|
15 |
+- [Highly configurable](./configuration.md) |
|
16 |
+- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `faster-whisper-server`. |
|
14 | 17 |
- Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it). |
15 | 18 |
- Live transcription support (audio is sent via websocket as it's generated). |
16 | 19 |
- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity. |
20 |
+- (Coming soon) Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime) |
|
21 |
+ - Generate a spoken audio summary of a body of text (text in, audio out) |
|
22 |
+ - Perform sentiment analysis on a recording (audio in, text out) |
|
23 |
+ - Async speech to speech interactions with a model (audio in, audio out) |
|
24 |
+- (Coming soon) Realtime API | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime) |
|
17 | 25 |
|
18 | 26 |
Please create an issue if you find a bug, have a question, or a feature suggestion. |
19 | 27 |
|
--- docs/usage.md
... | ... | @@ -1,86 +0,0 @@ |
1 | -TODO: break this down into: transcription/translation, streaming transcription/translation, live transcription, audio generation, model listing | |
2 | -TODO: add video demos for all | |
3 | -TODO: add a note about OPENAI_API_KEY | |
4 | - | |
5 | -## Curl | |
6 | - | |
7 | -```bash | |
8 | -curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" | |
9 | -``` | |
10 | - | |
11 | -## Python | |
12 | - | |
13 | -=== "httpx" | |
14 | - | |
15 | - ```python | |
16 | - import httpx | |
17 | - | |
18 | - with open('audio.wav', 'rb') as f: | |
19 | - files = {'file': ('audio.wav', f)} | |
20 | - response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files) | |
21 | - | |
22 | - print(response.text) | |
23 | - ``` | |
24 | - | |
25 | -## OpenAI SDKs | |
26 | - | |
27 | -=== "Python" | |
28 | - | |
29 | - ```python | |
30 | - import httpx | |
31 | - | |
32 | - with open('audio.wav', 'rb') as f: | |
33 | - files = {'file': ('audio.wav', f)} | |
34 | - response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files) | |
35 | - | |
36 | - print(response.text) | |
37 | - ``` | |
38 | - | |
39 | -=== "CLI" | |
40 | - | |
41 | - ```bash | |
42 | - export OPENAI_BASE_URL=http://localhost:8000/v1/ | |
43 | - export OPENAI_API_KEY="cant-be-empty" | |
44 | - openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text | |
45 | - ``` | |
46 | - | |
47 | -=== "Other" | |
48 | - | |
49 | - See [OpenAI libraries](https://platform.openai.com/docs/libraries) and [OpenAI speech-to-text usage](https://platform.openai.com/docs/guides/speech-to-text). | |
50 | - | |
51 | -## Open WebUI | |
52 | - | |
53 | -### Using the UI | |
54 | - | |
55 | -1. Go to the [Admin Settings](http://localhost:8080/admin/settings) page | |
56 | -2. Click on the "Audio" tab | |
57 | -3. Update settings | |
58 | - - Speech-to-Text Engine: OpenAI | |
59 | - - API Base URL: http://faster-whisper-server:8000/v1 | |
60 | - - API Key: does-not-matter-what-you-put-but-should-not-be-empty | |
61 | - - Model: Systran/faster-distil-whisper-large-v3 | |
62 | -4. Click "Save" | |
63 | - | |
64 | -### Using environment variables (Docker Compose) | |
65 | - | |
66 | -!!! warning | |
67 | - | |
68 | - This doesn't seem to work when you've previously used the UI to set the STT engine. | |
69 | - | |
70 | -```yaml | |
71 | -# NOTE: Some parts of the file are omitted for brevity. | |
72 | -services: | |
73 | - open-webui: | |
74 | - image: ghcr.io/open-webui/open-webui:main | |
75 | - ... | |
76 | - environment: | |
77 | - ... | |
78 | - # Environment variables are documented here https://docs.openwebui.com/getting-started/env-configuration#speech-to-text | |
79 | - AUDIO_STT_ENGINE: "openai" | |
80 | - AUDIO_STT_OPENAI_API_BASE_URL: "http://faster-whisper-server:8000/v1" | |
81 | - AUDIO_STT_OPENAI_API_KEY: "does-not-matter-what-you-put-but-should-not-be-empty" | |
82 | - AUDIO_STT_MODEL: "Systran/faster-distil-whisper-large-v3" | |
83 | - faster-whisper-server: | |
84 | - image: fedirz/faster-whisper-server:latest-cuda | |
85 | - ... | |
86 | -``` |
+++ docs/usage/live-transcription.md
... | ... | @@ -0,0 +1,17 @@ |
1 | +## Live Transcription (using WebSocket) | |
2 | + | |
3 | +!!! note | |
4 | + | |
5 | + More content will be added here soon. | |
6 | + | |
7 | +TODO: fix link | |
8 | +From [live-audio](./examples/live-audio) example | |
9 | + | |
10 | +https://github.com/fedirz/faster-whisper-server/assets/76551385/e334c124-af61-41d4-839c-874be150598f | |
11 | + | |
12 | +[websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required. | |
13 | +Live transcription of audio data from a microphone. | |
14 | + | |
15 | +```bash | |
16 | +ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://localhost:8000/v1/audio/transcriptions | |
17 | +``` |
+++ docs/usage/open-webui-integration.md
... | ... | @@ -0,0 +1,36 @@ |
1 | +## Open WebUI | |
2 | + | |
3 | +### Using the UI | |
4 | + | |
5 | +1. Go to the [Admin Settings](http://localhost:8080/admin/settings) page | |
6 | +2. Click on the "Audio" tab | |
7 | +3. Update settings | |
8 | + - Speech-to-Text Engine: OpenAI | |
9 | + - API Base URL: http://faster-whisper-server:8000/v1 | |
10 | + - API Key: does-not-matter-what-you-put-but-should-not-be-empty | |
11 | + - Model: Systran/faster-distil-whisper-large-v3 | |
12 | +4. Click "Save" | |
13 | + | |
14 | +### Using environment variables (Docker Compose) | |
15 | + | |
16 | +!!! warning | |
17 | + | |
18 | + This doesn't seem to work when you've previously used the UI to set the STT engine. | |
19 | + | |
20 | +```yaml | |
21 | +# NOTE: Some parts of the file are omitted for brevity. | |
22 | +services: | |
23 | + open-webui: | |
24 | + image: ghcr.io/open-webui/open-webui:main | |
25 | + ... | |
26 | + environment: | |
27 | + ... | |
28 | + # Environment variables are documented here https://docs.openwebui.com/getting-started/env-configuration#speech-to-text | |
29 | + AUDIO_STT_ENGINE: "openai" | |
30 | + AUDIO_STT_OPENAI_API_BASE_URL: "http://faster-whisper-server:8000/v1" | |
31 | + AUDIO_STT_OPENAI_API_KEY: "does-not-matter-what-you-put-but-should-not-be-empty" | |
32 | + AUDIO_STT_MODEL: "Systran/faster-distil-whisper-large-v3" | |
33 | + faster-whisper-server: | |
34 | + image: fedirz/faster-whisper-server:latest-cuda | |
35 | + ... | |
36 | +``` |
+++ docs/usage/speech-to-text.md
... | ... | @@ -0,0 +1,54 @@ |
1 | +https://platform.openai.com/docs/api-reference/audio/createTranscription | |
2 | +https://platform.openai.com/docs/guides/speech-to-text | |
3 | + | |
4 | +TODO: add a note about automatic downloads | |
5 | +TODO: add a note about api-key | |
6 | +TODO: mention streaming | |
7 | +TODO: add a demo | |
8 | +TODO: talk about audio format | |
9 | + | |
10 | +## Curl | |
11 | + | |
12 | +```bash | |
13 | +curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav" | |
14 | +``` | |
15 | + | |
16 | +## Python | |
17 | + | |
18 | +=== "httpx" | |
19 | + | |
20 | + ```python | |
21 | + import httpx | |
22 | + | |
23 | + with open('audio.wav', 'rb') as f: | |
24 | + files = {'file': ('audio.wav', f)} | |
25 | + response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files) | |
26 | + | |
27 | + print(response.text) | |
28 | + ``` | |
29 | + | |
30 | +## OpenAI SDKs | |
31 | + | |
32 | +=== "Python" | |
33 | + | |
34 | + ```python | |
35 | + import httpx | |
36 | + | |
37 | + with open('audio.wav', 'rb') as f: | |
38 | + files = {'file': ('audio.wav', f)} | |
39 | + response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files) | |
40 | + | |
41 | + print(response.text) | |
42 | + ``` | |
43 | + | |
44 | +=== "CLI" | |
45 | + | |
46 | + ```bash | |
47 | + export OPENAI_BASE_URL=http://localhost:8000/v1/ | |
48 | + export OPENAI_API_KEY="cant-be-empty" | |
49 | + openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text | |
50 | + ``` | |
51 | + | |
52 | +=== "Other" | |
53 | + | |
54 | + See [OpenAI libraries](https://platform.openai.com/docs/libraries). |
+++ docs/usage/text-to-speech.md
... | ... | @@ -0,0 +1,98 @@ |
1 | +!!! warning | |
2 | + | |
3 | + This feature not supported on ARM devices only x86_64. I was unable to build [piper-phonemize](https://github.com/rhasspy/piper-phonemize)(my [fork](https://github.com/fedirz/piper-phonemize)) | |
4 | + | |
5 | +https://platform.openai.com/docs/api-reference/audio/createSpeech | |
6 | +https://platform.openai.com/docs/guides/text-to-speech | |
7 | +http://localhost:8001/faster-whisper-server/api/ | |
8 | +TODO: add a note about automatic downloads | |
9 | +TODO: add a note about api-key | |
10 | +TODO: add a demo | |
11 | + | |
12 | +## Prerequisite | |
13 | + | |
14 | +Download the piper voices from [HuggingFace model repository](https://huggingface.co/rhasspy/piper-voices) | |
15 | + | |
16 | +```bash | |
17 | +# Download all voices (~15 minutes / 7.7 Gbs) | |
18 | +docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices | |
19 | +# Download all English voices (~4.5 minutes) | |
20 | +docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/**/*' 'voices.json' | |
21 | +# Download all qualities of a specific voice (~4 seconds) | |
22 | +docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/**/*' 'voices.json' | |
23 | +# Download specific quality of a specific voice (~2 seconds) | |
24 | +docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/medium/*' 'voices.json' | |
25 | +``` | |
26 | + | |
27 | +!!! note | |
28 | + | |
29 | + You can find audio samples of all the available voices [here](https://rhasspy.github.io/piper-samples/) | |
30 | + | |
31 | +## Curl | |
32 | + | |
33 | +```bash | |
34 | +# Generate speech from text using the default values (response_format="mp3", speed=1.0, voice="en_US-amy-medium", etc.) | |
35 | +curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!"}' --output audio.mp3 | |
36 | +# Specifying the output format | |
37 | +curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "response_format": "wav"}' --output audio.wav | |
38 | +# Specifying the audio speed | |
39 | +curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "speed": 2.0}' --output audio.mp3 | |
40 | + | |
41 | +# List available (downloaded) voices | |
42 | +curl http://localhost:8000/v1/audio/speech/voices | |
43 | +# List just the voice names | |
44 | +curl http://localhost:8000/v1/audio/speech/voices | jq --raw-output '.[] | .voice' | |
45 | +# List just the voices in your language | |
46 | +curl --silent http://localhost:8000/v1/audio/speech/voices | jq --raw-output '.[] | select(.voice | startswith("en")) | .voice' | |
47 | + | |
48 | +curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "voice": "en_US-ryan-high"}' --output audio.mp3 | |
49 | +``` | |
50 | + | |
51 | +## Python | |
52 | + | |
53 | +=== "httpx" | |
54 | + | |
55 | + ```python | |
56 | + from pathlib import Path | |
57 | + | |
58 | + import httpx | |
59 | + | |
60 | + client = httpx.Client(base_url="http://localhost:8000/") | |
61 | + res = client.post( | |
62 | + "v1/audio/speech", | |
63 | + json={ | |
64 | + "model": "piper", | |
65 | + "voice": "en_US-amy-medium", | |
66 | + "input": "Hello, world!", | |
67 | + "response_format": "mp3", | |
68 | + "speed": 1, | |
69 | + }, | |
70 | + ).raise_for_status() | |
71 | + with Path("output.mp3").open("wb") as f: | |
72 | + f.write(res.read()) | |
73 | + ``` | |
74 | + | |
75 | +## OpenAI SDKs | |
76 | + | |
77 | +=== "Python" | |
78 | + | |
79 | + ```python | |
80 | + from pathlib import Path | |
81 | + | |
82 | + from openai import OpenAI | |
83 | + | |
84 | + openai = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty") | |
85 | + res = openai.audio.speech.create( | |
86 | + model="piper", | |
87 | + voice="en_US-amy-medium", # pyright: ignore[reportArgumentType] | |
88 | + input="Hello, world!", | |
89 | + response_format="mp3", | |
90 | + speed=1, | |
91 | + ) | |
92 | + with Path("output.mp3").open("wb") as f: | |
93 | + f.write(res.response.read()) | |
94 | + ``` | |
95 | + | |
96 | +=== "Other" | |
97 | + | |
98 | + See [OpenAI libraries](https://platform.openai.com/docs/libraries) |
--- mkdocs.yml
+++ mkdocs.yml
... | ... | @@ -1,6 +1,10 @@ |
1 | 1 |
# yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json |
2 |
+# https://www.mkdocs.org/user-guide/configuration/#configuration |
|
2 | 3 |
site_name: Faster Whisper Server Documentation |
3 |
-repo_url: https://github.com/fedirz/faster-whisper-server |
|
4 |
+site_url: https://fedirz.github.io/faster-whisper-server/ |
|
5 |
+repo_url: https://github.com/fedirz/faster-whisper-server/ |
|
6 |
+edit_uri: edit/master/docs/ |
|
7 |
+docs_dir: docs |
|
4 | 8 |
theme: |
5 | 9 |
language: en |
6 | 10 |
name: material |
... | ... | @@ -9,13 +13,15 @@ |
9 | 13 |
primary: deep orange |
10 | 14 |
accent: indigo |
11 | 15 |
features: |
12 |
- - content.tabs.link |
|
13 |
- - content.code.copy |
|
16 |
+ # https://squidfunk.github.io/mkdocs-material/setup/setting-up-navigation/ |
|
14 | 17 |
- navigation.instant |
15 | 18 |
- navigation.instant.progress |
16 | 19 |
- navigation.instant.prefetch |
20 |
+ # https://squidfunk.github.io/mkdocs-material/setup/setting-up-site-search/ |
|
17 | 21 |
- search.highlight |
18 | 22 |
- search.share |
23 |
+ - content.tabs.link |
|
24 |
+ - content.code.copy |
|
19 | 25 |
plugins: |
20 | 26 |
# https://github.com/bharel/mkdocs-render-swagger-plugin |
21 | 27 |
- render_swagger |
... | ... | @@ -23,9 +29,13 @@ |
23 | 29 |
default_handler: python |
24 | 30 |
nav: |
25 | 31 |
- Introduction: introduction.md |
32 |
+ - Capabilities / Usage: |
|
33 |
+ - Speech-to-Text: usage/speech-to-text.md |
|
34 |
+ - Text-to-Speech: usage/text-to-speech.md |
|
35 |
+ - Live Transcription (using WebSockets): usage/live-transcription.md |
|
36 |
+ - Open WebUI Intergration: usage/open-webui-integration.md |
|
26 | 37 |
- Installation: installation.md |
27 | 38 |
- Configuration: configuration.md |
28 |
- - Usage: usage.md |
|
29 | 39 |
- API: api.md |
30 | 40 |
markdown_extensions: |
31 | 41 |
- admonition |
... | ... | @@ -34,3 +44,4 @@ |
34 | 44 |
alternate_style: true |
35 | 45 |
# https://github.com/mkdocs/mkdocs/issues/545 |
36 | 46 |
- mdx_truly_sane_lists |
47 |
+# TODO: https://github.com/oprypin/markdown-callouts |
Add a comment
Delete comment
Once you delete this comment, you won't be able to recover it. Are you sure you want to delete this comment?