Commit @8f4851657b67a4ba51d7e0ea25af381dbc219e43 - yjyoon/whisper_server

937ee3a

8f48516

.pre-commit-config.yaml

--- .pre-commit-config.yaml

+++ .pre-commit-config.yaml

...	...	@@ -44,4 +44,4 @@
44	44	rev: v1.5.0
45	45	hooks:
46	46	- id: detect-secrets
47		- exclude: 'README.md\|tests/conftest.py\|docs/usage.md'
	47	+ exclude: 'README.md\|tests/conftest.py\|docs/usage/*'

937ee3a

8f48516

README.md

--- README.md

+++ README.md


 ### OpenAI API Python SDK
 
 ```python
+from pathlib import Path
+
 from openai import OpenAI
 
-client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/")
+client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
 
-audio_file = open("audio.wav", "rb")
-transcript = client.audio.transcriptions.create(
-    model="Systran/faster-distil-whisper-large-v3", file=audio_file
-)
-print(transcript.text)
+with Path("audio.wav").open("rb") as f:
+    transcript = client.audio.transcriptions.create(model="Systran/faster-distil-whisper-large-v3", file=f)
+    print(transcript.text)
 ```
 
 ### cURL

937ee3a

8f48516

docs/introduction.md

--- docs/introduction.md

+++ docs/introduction.md


 
     Under development. I don't yet recommend using these docs as reference for now.
 
+TODO: add HuggingFace Space URL
+
 # Faster Whisper Server
 
-`faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend.
-Features:
+`faster-whisper-server` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
+
+## Features:
 
 - GPU and CPU support.
-- Easily deployable using Docker.
-- **Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py))**.
-- OpenAI API compatible.
+- [Deployable via Docker Compose / Docker](./installation.md)
+- [Highly configurable](./configuration.md)
+- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `faster-whisper-server`.
 - Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
 - Live transcription support (audio is sent via websocket as it's generated).
 - Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
+- (Coming soon) Audio generation (chat completions endpoint) | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
+  - Generate a spoken audio summary of a body of text (text in, audio out)
+  - Perform sentiment analysis on a recording (audio in, text out)
+  - Async speech to speech interactions with a model (audio in, audio out)
+- (Coming soon) Realtime API | [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
 
 Please create an issue if you find a bug, have a question, or a feature suggestion.
 

937ee3a

docs/usage.md (deleted)

--- docs/usage.md

...	...	@@ -1,86 +0,0 @@
	1	-TODO: break this down into: transcription/translation, streaming transcription/translation, live transcription, audio generation, model listing
	2	-TODO: add video demos for all
	3	-TODO: add a note about OPENAI_API_KEY
	4	-
	5	-## Curl
	6	-
	7	-```bash
	8	-curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav"
	9	-```
	10	-
	11	-## Python
	12	-
	13	-=== "httpx"
	14	-
	15	- ```python
	16	- import httpx
	17	-
	18	- with open('audio.wav', 'rb') as f:
	19	- files = {'file': ('audio.wav', f)}
	20	- response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
	21	-
	22	- print(response.text)
	23	- ```
	24	-
	25	-## OpenAI SDKs
	26	-
	27	-=== "Python"
	28	-
	29	- ```python
	30	- import httpx
	31	-
	32	- with open('audio.wav', 'rb') as f:
	33	- files = {'file': ('audio.wav', f)}
	34	- response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
	35	-
	36	- print(response.text)
	37	- ```
	38	-
	39	-=== "CLI"
	40	-
	41	- ```bash
	42	- export OPENAI_BASE_URL=http://localhost:8000/v1/
	43	- export OPENAI_API_KEY="cant-be-empty"
	44	- openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text
	45	- ```
	46	-
	47	-=== "Other"
	48	-
	49	- See [OpenAI libraries](https://platform.openai.com/docs/libraries) and [OpenAI speech-to-text usage](https://platform.openai.com/docs/guides/speech-to-text).
	50	-
	51	-## Open WebUI
	52	-
	53	-### Using the UI
	54	-
	55	-1. Go to the [Admin Settings](http://localhost:8080/admin/settings) page
	56	-2. Click on the "Audio" tab
	57	-3. Update settings
	58	- - Speech-to-Text Engine: OpenAI
	59	- - API Base URL: http://faster-whisper-server:8000/v1
	60	- - API Key: does-not-matter-what-you-put-but-should-not-be-empty
	61	- - Model: Systran/faster-distil-whisper-large-v3
	62	-4. Click "Save"
	63	-
	64	-### Using environment variables (Docker Compose)
	65	-
	66	-!!! warning
	67	-
	68	- This doesn't seem to work when you've previously used the UI to set the STT engine.
	69	-
	70	-```yaml
	71	-# NOTE: Some parts of the file are omitted for brevity.
	72	-services:
	73	- open-webui:
	74	- image: ghcr.io/open-webui/open-webui:main
	75	- ...
	76	- environment:
	77	- ...
	78	- # Environment variables are documented here https://docs.openwebui.com/getting-started/env-configuration#speech-to-text
	79	- AUDIO_STT_ENGINE: "openai"
	80	- AUDIO_STT_OPENAI_API_BASE_URL: "http://faster-whisper-server:8000/v1"
	81	- AUDIO_STT_OPENAI_API_KEY: "does-not-matter-what-you-put-but-should-not-be-empty"
	82	- AUDIO_STT_MODEL: "Systran/faster-distil-whisper-large-v3"
	83	- faster-whisper-server:
	84	- image: fedirz/faster-whisper-server:latest-cuda
	85	- ...
	86	-```

8f48516

docs/usage/live-transcription.md (added)

+++ docs/usage/live-transcription.md

...	...	@@ -0,0 +1,17 @@
	1	+## Live Transcription (using WebSocket)
	2	+
	3	+!!! note
	4	+
	5	+ More content will be added here soon.
	6	+
	7	+TODO: fix link
	8	+From [live-audio](./examples/live-audio) example
	9	+
	10	+https://github.com/fedirz/faster-whisper-server/assets/76551385/e334c124-af61-41d4-839c-874be150598f
	11	+
	12	+[websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required.
	13	+Live transcription of audio data from a microphone.
	14	+
	15	+```bash
	16	+ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - \| websocat --binary ws://localhost:8000/v1/audio/transcriptions
	17	+```

8f48516

docs/usage/open-webui-integration.md (added)

+++ docs/usage/open-webui-integration.md

...	...	@@ -0,0 +1,36 @@
	1	+## Open WebUI
	2	+
	3	+### Using the UI
	4	+
	5	+1. Go to the [Admin Settings](http://localhost:8080/admin/settings) page
	6	+2. Click on the "Audio" tab
	7	+3. Update settings
	8	+ - Speech-to-Text Engine: OpenAI
	9	+ - API Base URL: http://faster-whisper-server:8000/v1
	10	+ - API Key: does-not-matter-what-you-put-but-should-not-be-empty
	11	+ - Model: Systran/faster-distil-whisper-large-v3
	12	+4. Click "Save"
	13	+
	14	+### Using environment variables (Docker Compose)
	15	+
	16	+!!! warning
	17	+
	18	+ This doesn't seem to work when you've previously used the UI to set the STT engine.
	19	+
	20	+```yaml
	21	+# NOTE: Some parts of the file are omitted for brevity.
	22	+services:
	23	+ open-webui:
	24	+ image: ghcr.io/open-webui/open-webui:main
	25	+ ...
	26	+ environment:
	27	+ ...
	28	+ # Environment variables are documented here https://docs.openwebui.com/getting-started/env-configuration#speech-to-text
	29	+ AUDIO_STT_ENGINE: "openai"
	30	+ AUDIO_STT_OPENAI_API_BASE_URL: "http://faster-whisper-server:8000/v1"
	31	+ AUDIO_STT_OPENAI_API_KEY: "does-not-matter-what-you-put-but-should-not-be-empty"
	32	+ AUDIO_STT_MODEL: "Systran/faster-distil-whisper-large-v3"
	33	+ faster-whisper-server:
	34	+ image: fedirz/faster-whisper-server:latest-cuda
	35	+ ...
	36	+```

8f48516

docs/usage/speech-to-text.md (added)

+++ docs/usage/speech-to-text.md

...	...	@@ -0,0 +1,54 @@
	1	+https://platform.openai.com/docs/api-reference/audio/createTranscription
	2	+https://platform.openai.com/docs/guides/speech-to-text
	3	+
	4	+TODO: add a note about automatic downloads
	5	+TODO: add a note about api-key
	6	+TODO: mention streaming
	7	+TODO: add a demo
	8	+TODO: talk about audio format
	9	+
	10	+## Curl
	11	+
	12	+```bash
	13	+curl http://localhost:8000/v1/audio/transcriptions -F "file=@audio.wav"
	14	+```
	15	+
	16	+## Python
	17	+
	18	+=== "httpx"
	19	+
	20	+ ```python
	21	+ import httpx
	22	+
	23	+ with open('audio.wav', 'rb') as f:
	24	+ files = {'file': ('audio.wav', f)}
	25	+ response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
	26	+
	27	+ print(response.text)
	28	+ ```
	29	+
	30	+## OpenAI SDKs
	31	+
	32	+=== "Python"
	33	+
	34	+ ```python
	35	+ import httpx
	36	+
	37	+ with open('audio.wav', 'rb') as f:
	38	+ files = {'file': ('audio.wav', f)}
	39	+ response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)
	40	+
	41	+ print(response.text)
	42	+ ```
	43	+
	44	+=== "CLI"
	45	+
	46	+ ```bash
	47	+ export OPENAI_BASE_URL=http://localhost:8000/v1/
	48	+ export OPENAI_API_KEY="cant-be-empty"
	49	+ openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text
	50	+ ```
	51	+
	52	+=== "Other"
	53	+
	54	+ See [OpenAI libraries](https://platform.openai.com/docs/libraries).

8f48516

docs/usage/text-to-speech.md (added)

+++ docs/usage/text-to-speech.md

...	...	@@ -0,0 +1,98 @@
	1	+!!! warning
	2	+
	3	+ This feature not supported on ARM devices only x86_64. I was unable to build [piper-phonemize](https://github.com/rhasspy/piper-phonemize)(my [fork](https://github.com/fedirz/piper-phonemize))
	4	+
	5	+https://platform.openai.com/docs/api-reference/audio/createSpeech
	6	+https://platform.openai.com/docs/guides/text-to-speech
	7	+http://localhost:8001/faster-whisper-server/api/
	8	+TODO: add a note about automatic downloads
	9	+TODO: add a note about api-key
	10	+TODO: add a demo
	11	+
	12	+## Prerequisite
	13	+
	14	+Download the piper voices from [HuggingFace model repository](https://huggingface.co/rhasspy/piper-voices)
	15	+
	16	+```bash
	17	+# Download all voices (~15 minutes / 7.7 Gbs)
	18	+docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices
	19	+# Download all English voices (~4.5 minutes)
	20	+docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/*/' 'voices.json'
	21	+# Download all qualities of a specific voice (~4 seconds)
	22	+docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/*/' 'voices.json'
	23	+# Download specific quality of a specific voice (~2 seconds)
	24	+docker exec -it faster-whisper-server huggingface-cli download rhasspy/piper-voices --include 'en/en_US/amy/medium/*' 'voices.json'
	25	+```
	26	+
	27	+!!! note
	28	+
	29	+ You can find audio samples of all the available voices [here](https://rhasspy.github.io/piper-samples/)
	30	+
	31	+## Curl
	32	+
	33	+```bash
	34	+# Generate speech from text using the default values (response_format="mp3", speed=1.0, voice="en_US-amy-medium", etc.)
	35	+curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!"}' --output audio.mp3
	36	+# Specifying the output format
	37	+curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "response_format": "wav"}' --output audio.wav
	38	+# Specifying the audio speed
	39	+curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "speed": 2.0}' --output audio.mp3
	40	+
	41	+# List available (downloaded) voices
	42	+curl http://localhost:8000/v1/audio/speech/voices
	43	+# List just the voice names
	44	+curl http://localhost:8000/v1/audio/speech/voices \| jq --raw-output '.[] \| .voice'
	45	+# List just the voices in your language
	46	+curl --silent http://localhost:8000/v1/audio/speech/voices \| jq --raw-output '.[] \| select(.voice \| startswith("en")) \| .voice'
	47	+
	48	+curl http://localhost:8000/v1/audio/speech --header "Content-Type: application/json" --data '{"input": "Hello World!", "voice": "en_US-ryan-high"}' --output audio.mp3
	49	+```
	50	+
	51	+## Python
	52	+
	53	+=== "httpx"
	54	+
	55	+ ```python
	56	+ from pathlib import Path
	57	+
	58	+ import httpx
	59	+
	60	+ client = httpx.Client(base_url="http://localhost:8000/")
	61	+ res = client.post(
	62	+ "v1/audio/speech",
	63	+ json={
	64	+ "model": "piper",
	65	+ "voice": "en_US-amy-medium",
	66	+ "input": "Hello, world!",
	67	+ "response_format": "mp3",
	68	+ "speed": 1,
	69	+ },
	70	+ ).raise_for_status()
	71	+ with Path("output.mp3").open("wb") as f:
	72	+ f.write(res.read())
	73	+ ```
	74	+
	75	+## OpenAI SDKs
	76	+
	77	+=== "Python"
	78	+
	79	+ ```python
	80	+ from pathlib import Path
	81	+
	82	+ from openai import OpenAI
	83	+
	84	+ openai = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
	85	+ res = openai.audio.speech.create(
	86	+ model="piper",
	87	+ voice="en_US-amy-medium", # pyright: ignore[reportArgumentType]
	88	+ input="Hello, world!",
	89	+ response_format="mp3",
	90	+ speed=1,
	91	+ )
	92	+ with Path("output.mp3").open("wb") as f:
	93	+ f.write(res.response.read())
	94	+ ```
	95	+
	96	+=== "Other"
	97	+
	98	+ See [OpenAI libraries](https://platform.openai.com/docs/libraries)

937ee3a

8f48516

mkdocs.yml

--- mkdocs.yml

+++ mkdocs.yml


 # yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json
+# https://www.mkdocs.org/user-guide/configuration/#configuration
 site_name: Faster Whisper Server Documentation
-repo_url: https://github.com/fedirz/faster-whisper-server
+site_url: https://fedirz.github.io/faster-whisper-server/
+repo_url: https://github.com/fedirz/faster-whisper-server/
+edit_uri: edit/master/docs/
+docs_dir: docs
 theme:
   language: en
   name: material

     primary: deep orange
     accent: indigo
   features:
-    - content.tabs.link
-    - content.code.copy
+    # https://squidfunk.github.io/mkdocs-material/setup/setting-up-navigation/
     - navigation.instant
     - navigation.instant.progress
     - navigation.instant.prefetch
+    # https://squidfunk.github.io/mkdocs-material/setup/setting-up-site-search/
     - search.highlight
     - search.share
+    - content.tabs.link
+    - content.code.copy
 plugins:
   # https://github.com/bharel/mkdocs-render-swagger-plugin
   - render_swagger

       default_handler: python
 nav:
   - Introduction: introduction.md
+  - Capabilities / Usage:
+      - Speech-to-Text: usage/speech-to-text.md
+      - Text-to-Speech: usage/text-to-speech.md
+      - Live Transcription (using WebSockets): usage/live-transcription.md
+      - Open WebUI Intergration: usage/open-webui-integration.md
   - Installation: installation.md
   - Configuration: configuration.md
-  - Usage: usage.md
   - API: api.md
 markdown_extensions:
   - admonition

       alternate_style: true
   # https://github.com/mkdocs/mkdocs/issues/545
   - mdx_truly_sane_lists
+# TODO: https://github.com/oprypin/markdown-callouts

...	...	@@ -78,15 +78,15 @@
78	78	### OpenAI API Python SDK
79	79
80	80	```python
	81	+from pathlib import Path
	82	+
81	83	from openai import OpenAI
82	84
83		-client = OpenAI(api_key="cant-be-empty", base_url="http://localhost:8000/v1/")
	85	+client = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
84	86
85		-audio_file = open("audio.wav", "rb")
86		-transcript = client.audio.transcriptions.create(
87		- model="Systran/faster-distil-whisper-large-v3", file=audio_file
88		-)
89		-print(transcript.text)
	87	+with Path("audio.wav").open("rb") as f:
	88	+ transcript = client.audio.transcriptions.create(model="Systran/faster-distil-whisper-large-v3", file=f)
	89	+ print(transcript.text)
90	90	```
91	91
92	92	### cURL

...	...	@@ -2,18 +2,26 @@
2	2
3	3	Under development. I don't yet recommend using these docs as reference for now.
4	4
	5	+TODO: add HuggingFace Space URL
	6	+
5	7	# Faster Whisper Server
6	8
7		-`faster-whisper-server` is an OpenAI API-compatible transcription server which uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) as its backend.
8		-Features:
	9	+`faster-whisper-server` is an OpenAI API-compatible server supporting transcription, translation, and speech generation. For transcription/translation it uses [faster-whisper](https://github.com/SYSTRAN/faster-whisper) and for text-to-speech [piper](https://github.com/rhasspy/piper) is used.
	10	+
	11	+## Features:
9	12
10	13	- GPU and CPU support.
11		-- Easily deployable using Docker.
12		-- Configurable through environment variables (see [config.py](./src/faster_whisper_server/config.py)).
13		-- OpenAI API compatible.
	14	+- [Deployable via Docker Compose / Docker](./installation.md)
	15	+- [Highly configurable](./configuration.md)
	16	+- OpenAI API compatible. All tools and SDKs that work with OpenAI's API should work with `faster-whisper-server`.
14	17	- Streaming support (transcription is sent via [SSE](https://en.wikipedia.org/wiki/Server-sent_events) as the audio is transcribed. You don't need to wait for the audio to fully be transcribed before receiving it).
15	18	- Live transcription support (audio is sent via websocket as it's generated).
16	19	- Dynamic model loading / offloading. Just specify which model you want to use in the request and it will be loaded automatically. It will then be unloaded after a period of inactivity.
	20	+- (Coming soon) Audio generation (chat completions endpoint) \| [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
	21	+ - Generate a spoken audio summary of a body of text (text in, audio out)
	22	+ - Perform sentiment analysis on a recording (audio in, text out)
	23	+ - Async speech to speech interactions with a model (audio in, audio out)
	24	+- (Coming soon) Realtime API \| [OpenAI Documentation](https://platform.openai.com/docs/guides/realtime)
17	25
18	26	Please create an issue if you find a bug, have a question, or a feature suggestion.
19	27

...	...	@@ -1,6 +1,10 @@
1	1	# yaml-language-server: $schema=https://squidfunk.github.io/mkdocs-material/schema.json
	2	+# https://www.mkdocs.org/user-guide/configuration/#configuration
2	3	site_name: Faster Whisper Server Documentation
3		-repo_url: https://github.com/fedirz/faster-whisper-server
	4	+site_url: https://fedirz.github.io/faster-whisper-server/
	5	+repo_url: https://github.com/fedirz/faster-whisper-server/
	6	+edit_uri: edit/master/docs/
	7	+docs_dir: docs
4	8	theme:
5	9	language: en
6	10	name: material
...	...	@@ -9,13 +13,15 @@
9	13	primary: deep orange
10	14	accent: indigo
11	15	features:
12		- - content.tabs.link
13		- - content.code.copy
	16	+ # https://squidfunk.github.io/mkdocs-material/setup/setting-up-navigation/
14	17	- navigation.instant
15	18	- navigation.instant.progress
16	19	- navigation.instant.prefetch
	20	+ # https://squidfunk.github.io/mkdocs-material/setup/setting-up-site-search/
17	21	- search.highlight
18	22	- search.share
	23	+ - content.tabs.link
	24	+ - content.code.copy
19	25	plugins:
20	26	# https://github.com/bharel/mkdocs-render-swagger-plugin
21	27	- render_swagger
...	...	@@ -23,9 +29,13 @@
23	29	default_handler: python
24	30	nav:
25	31	- Introduction: introduction.md
	32	+ - Capabilities / Usage:
	33	+ - Speech-to-Text: usage/speech-to-text.md
	34	+ - Text-to-Speech: usage/text-to-speech.md
	35	+ - Live Transcription (using WebSockets): usage/live-transcription.md
	36	+ - Open WebUI Intergration: usage/open-webui-integration.md
26	37	- Installation: installation.md
27	38	- Configuration: configuration.md
28		- - Usage: usage.md
29	39	- API: api.md
30	40	markdown_extensions:
31	41	- admonition
...	...	@@ -34,3 +44,4 @@
34	44	alternate_style: true
35	45	# https://github.com/mkdocs/mkdocs/issues/545
36	46	- mdx_truly_sane_lists
	47	+# TODO: https://github.com/oprypin/markdown-callouts

Delete comment