

chore: rename to 'faster-whisper-server'
@39ee11644eedfd9dac27d3cf4d795597505e3622
--- .github/workflows/docker-build-and-push.yaml
+++ .github/workflows/docker-build-and-push.yaml
... | ... | @@ -28,7 +28,7 @@ |
28 | 28 |
uses: docker/metadata-action@v5 |
29 | 29 |
with: |
30 | 30 |
images: | |
31 |
- fedirz/speaches |
|
31 |
+ fedirz/faster-whisper-server |
|
32 | 32 |
# https://github.com/docker/metadata-action?tab=readme-ov-file#flavor-input |
33 | 33 |
flavor: | |
34 | 34 |
latest=false |
... | ... | @@ -47,5 +47,5 @@ |
47 | 47 |
# platforms: linux/amd64,linux/arm64 |
48 | 48 |
tags: ${{ steps.meta.outputs.tags }} |
49 | 49 |
# TODO: cache |
50 |
- # cache-from: type=registry,ref=fedirz/speaches:buildcache |
|
51 |
- # cache-to: type=registry,ref=fedirz/speaches:buildcache,mode=max |
|
50 |
+ # cache-from: type=registry,ref=fedirz/faster-whisper-server:buildcache |
|
51 |
+ # cache-to: type=registry,ref=fedirz/faster-whisper-server:buildcache,mode=max |
--- Dockerfile.cpu
+++ Dockerfile.cpu
... | ... | @@ -9,12 +9,12 @@ |
9 | 9 |
rm -rf /var/lib/apt/lists/* && \ |
10 | 10 |
curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11 |
11 | 11 |
RUN pip install --no-cache-dir poetry==1.8.2 |
12 |
-WORKDIR /root/speaches |
|
12 |
+WORKDIR /root/faster-whisper-server |
|
13 | 13 |
COPY pyproject.toml poetry.lock ./ |
14 | 14 |
RUN poetry install --only main |
15 |
-COPY ./speaches ./speaches |
|
15 |
+COPY ./faster_whisper_server ./faster_whisper_server |
|
16 | 16 |
ENTRYPOINT ["poetry", "run"] |
17 |
-CMD ["uvicorn", "speaches.main:app"] |
|
17 |
+CMD ["uvicorn", "faster_whisper_server.main:app"] |
|
18 | 18 |
ENV WHISPER_MODEL=distil-medium.en |
19 | 19 |
ENV WHISPER_INFERENCE_DEVICE=cpu |
20 | 20 |
ENV WHISPER_COMPUTE_TYPE=int8 |
--- Dockerfile.cuda
+++ Dockerfile.cuda
... | ... | @@ -9,12 +9,12 @@ |
9 | 9 |
rm -rf /var/lib/apt/lists/* && \ |
10 | 10 |
curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11 |
11 | 11 |
RUN pip install --no-cache-dir poetry==1.8.2 |
12 |
-WORKDIR /root/speaches |
|
12 |
+WORKDIR /root/faster-whisper-server |
|
13 | 13 |
COPY pyproject.toml poetry.lock ./ |
14 | 14 |
RUN poetry install --only main |
15 |
-COPY ./speaches ./speaches |
|
15 |
+COPY ./faster_whisper_server ./faster_whisper_server |
|
16 | 16 |
ENTRYPOINT ["poetry", "run"] |
17 |
-CMD ["uvicorn", "speaches.main:app"] |
|
17 |
+CMD ["uvicorn", "faster_whisper_server.main:app"] |
|
18 | 18 |
ENV WHISPER_MODEL=distil-large-v3 |
19 | 19 |
ENV WHISPER_INFERENCE_DEVICE=cuda |
20 | 20 |
ENV UVICORN_HOST=0.0.0.0 |
--- README.md
+++ README.md
... | ... | @@ -1,20 +1,27 @@ |
1 |
-# WARN: WIP (code is ugly, bad documentation, may have bugs, test files aren't included, CPU inference was barely tested, etc.) |
|
2 |
-# Intro |
|
3 |
-:peach:`speaches` is a web server that supports real-time transcription using WebSockets. |
|
1 |
+## Faster Whisper Server |
|
2 |
+`faster-whisper-server` is a web server that supports real-time transcription using WebSockets. |
|
4 | 3 |
- [faster-whisper](https://github.com/SYSTRAN/faster-whisper) is used as the backend. Both GPU and CPU inference are supported. |
5 | 4 |
- LocalAgreement2 ([paper](https://aclanthology.org/2023.ijcnlp-demo.3.pdf) | [original implementation](https://github.com/ufal/whisper_streaming)) algorithm is used for real-time transcription. |
6 | 5 |
- Can be deployed using Docker (Compose configuration can be found in [compose.yaml](./compose.yaml)). |
7 |
-- All configuration is done through environment variables. See [config.py](./speaches/config.py). |
|
6 |
+- All configuration is done through environment variables. See [config.py](./faster_whisper_server/config.py). |
|
8 | 7 |
- NOTE: only transcription of single channel, 16000 sample rate, raw, 16-bit little-endian audio is supported. |
9 | 8 |
- NOTE: this isn't really meant to be used as a standalone tool but rather to add transcription features to other applications. |
10 | 9 |
Please create an issue if you find a bug, have a question, or a feature suggestion. |
11 | 10 |
# Quick Start |
12 |
-Spinning up a `speaches` web server |
|
11 |
+Using Docker |
|
13 | 12 |
```bash |
14 |
-docker run --gpus=all --publish 8000:8000 --mount type=bind,source=$HOME/.cache/huggingface,target=/root/.cache/huggingface fedirz/speaches:cuda |
|
13 |
+docker run --gpus=all --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:cuda |
|
15 | 14 |
# or |
16 |
-docker run --publish 8000:8000 --mount type=bind,source=$HOME/.cache/huggingface,target=/root/.cache/huggingface fedirz/speaches:cpu |
|
15 |
+docker run --publish 8000:8000 --volume ~/.cache/huggingface:/root/.cache/huggingface fedirz/faster-whisper-server:cpu |
|
17 | 16 |
``` |
17 |
+Using Docker Compose |
|
18 |
+```bash |
|
19 |
+curl -sO https://raw.githubusercontent.com/fedirz/faster-whisper-server/master/compose.yaml |
|
20 |
+docker compose up --detach up faster-whisper-server-cuda |
|
21 |
+# or |
|
22 |
+docker compose up --detach up faster-whisper-server-cpu |
|
23 |
+``` |
|
24 |
+## Usage |
|
18 | 25 |
Streaming audio data from a microphone. [websocat](https://github.com/vi/websocat?tab=readme-ov-file#installation) installation is required. |
19 | 26 |
```bash |
20 | 27 |
ffmpeg -loglevel quiet -f alsa -i default -ac 1 -ar 16000 -f s16le - | websocat --binary ws://0.0.0.0:8000/v1/audio/transcriptions |
... | ... | @@ -38,7 +45,7 @@ |
38 | 45 |
curl -X POST -F "file=@output.raw" http://0.0.0.0:8000/v1/audio/transcriptions |
39 | 46 |
# Output: "{\"text\":\"One, two, three, four, five.\"}"% |
40 | 47 |
``` |
41 |
-# Roadmap |
|
48 |
+## Roadmap |
|
42 | 49 |
- [ ] Support file transcription (non-streaming) of multiple formats. |
43 | 50 |
- [ ] CLI client. |
44 | 51 |
- [ ] Separate the web server related code from the "core", and publish "core" as a package. |
--- Taskfile.yaml
+++ Taskfile.yaml
... | ... | @@ -1,6 +1,6 @@ |
1 | 1 |
version: "3" |
2 | 2 |
tasks: |
3 |
- speaches: poetry run uvicorn --host 0.0.0.0 speaches.main:app {{.CLI_ARGS}} |
|
3 |
+ server: poetry run uvicorn --host 0.0.0.0 faster_whisper_server.main:app {{.CLI_ARGS}} |
|
4 | 4 |
test: |
5 | 5 |
cmds: |
6 | 6 |
- poetry run pytest -o log_cli=true -o log_cli_level=DEBUG {{.CLI_ARGS}} |
... | ... | @@ -11,15 +11,15 @@ |
11 | 11 |
- docker compose build |
12 | 12 |
sources: |
13 | 13 |
- Dockerfile.* |
14 |
- - speaches/*.py |
|
14 |
+ - faster_whisper_server/*.py |
|
15 | 15 |
create-multi-arch-builder: docker buildx create --name main --driver=docker-container |
16 | 16 |
build-and-push: |
17 | 17 |
cmds: |
18 | 18 |
- docker compose build --builder main --push |
19 | 19 |
sources: |
20 | 20 |
- Dockerfile.* |
21 |
- - speaches/*.py |
|
22 |
- sync: lsyncd -nodaemon -delay 0 -rsyncssh . gpu-box speaches |
|
21 |
+ - faster_whisper_server/*.py |
|
22 |
+ sync: lsyncd -nodaemon -delay 0 -rsyncssh . gpu-box faster-whisper-server |
|
23 | 23 |
# Python's urllib3 takes forever when ipv6 is enabled |
24 | 24 |
# https://support.nordvpn.com/hc/en-us/articles/20164669224337-How-to-disable-IPv6-on-Linux |
25 | 25 |
disable-ipv6: sudo sysctl -w net.ipv6.conf.all.disable_ipv6=1 && sudo sysctl -w net.ipv6.conf.default.disable_ipv6=1 |
--- compose.yaml
+++ compose.yaml
... | ... | @@ -1,7 +1,7 @@ |
1 | 1 |
# NOTE: arm images haven't been tested |
2 | 2 |
services: |
3 |
- speaches-cuda: |
|
4 |
- image: fedirz/speaches:cuda |
|
3 |
+ faster-whisper-server-cuda: |
|
4 |
+ image: fedirz/faster-whisper-server:cuda |
|
5 | 5 |
build: |
6 | 6 |
dockerfile: Dockerfile.cuda |
7 | 7 |
context: . |
... | ... | @@ -9,7 +9,7 @@ |
9 | 9 |
- linux/amd64 |
10 | 10 |
- linux/arm64 |
11 | 11 |
tags: |
12 |
- - fedirz/speaches:cuda |
|
12 |
+ - fedirz/faster-whisper-server:cuda |
|
13 | 13 |
volumes: |
14 | 14 |
- ~/.cache/huggingface:/root/.cache/huggingface |
15 | 15 |
restart: unless-stopped |
... | ... | @@ -20,8 +20,8 @@ |
20 | 20 |
reservations: |
21 | 21 |
devices: |
22 | 22 |
- capabilities: ["gpu"] |
23 |
- speaches-cpu: |
|
24 |
- image: fedirz/speaches:cpu |
|
23 |
+ faster-whisper-server-cpu: |
|
24 |
+ image: fedirz/faster-whisper-server:cpu |
|
25 | 25 |
build: |
26 | 26 |
dockerfile: Dockerfile.cpu |
27 | 27 |
context: . |
... | ... | @@ -29,7 +29,7 @@ |
29 | 29 |
- linux/amd64 |
30 | 30 |
- linux/arm64 |
31 | 31 |
tags: |
32 |
- - fedirz/speaches:cpu |
|
32 |
+ - fedirz/faster-whisper-server:cpu |
|
33 | 33 |
volumes: |
34 | 34 |
- ~/.cache/huggingface:/root/.cache/huggingface |
35 | 35 |
restart: unless-stopped |
--- speaches/__init__.py
+++ faster_whisper_server/__init__.py
No changes |
--- speaches/asr.py
+++ faster_whisper_server/asr.py
... | ... | @@ -4,9 +4,9 @@ |
4 | 4 |
|
5 | 5 |
from faster_whisper import transcribe |
6 | 6 |
|
7 |
-from speaches.audio import Audio |
|
8 |
-from speaches.core import Transcription, Word |
|
9 |
-from speaches.logger import logger |
|
7 |
+from faster_whisper_server.audio import Audio |
|
8 |
+from faster_whisper_server.core import Transcription, Word |
|
9 |
+from faster_whisper_server.logger import logger |
|
10 | 10 |
|
11 | 11 |
|
12 | 12 |
class FasterWhisperASR: |
--- speaches/audio.py
+++ faster_whisper_server/audio.py
... | ... | @@ -7,8 +7,8 @@ |
7 | 7 |
import soundfile as sf |
8 | 8 |
from numpy.typing import NDArray |
9 | 9 |
|
10 |
-from speaches.config import SAMPLES_PER_SECOND |
|
11 |
-from speaches.logger import logger |
|
10 |
+from faster_whisper_server.config import SAMPLES_PER_SECOND |
|
11 |
+from faster_whisper_server.logger import logger |
|
12 | 12 |
|
13 | 13 |
|
14 | 14 |
def audio_samples_from_file(file: BinaryIO) -> NDArray[np.float32]: |
--- speaches/config.py
+++ faster_whisper_server/config.py
No changes |
--- speaches/core.py
+++ faster_whisper_server/core.py
... | ... | @@ -4,7 +4,7 @@ |
4 | 4 |
import re |
5 | 5 |
from dataclasses import dataclass |
6 | 6 |
|
7 |
-from speaches.config import config |
|
7 |
+from faster_whisper_server.config import config |
|
8 | 8 |
|
9 | 9 |
|
10 | 10 |
# TODO: use the `Segment` from `faster-whisper.transcribe` instead |
--- speaches/logger.py
+++ faster_whisper_server/logger.py
... | ... | @@ -1,8 +1,8 @@ |
1 | 1 |
import logging |
2 | 2 |
|
3 |
-from speaches.config import config |
|
3 |
+from faster_whisper_server.config import config |
|
4 | 4 |
|
5 |
-# Disables all but `speaches` logger |
|
5 |
+# Disables all but `faster_whisper_server` logger |
|
6 | 6 |
|
7 | 7 |
root_logger = logging.getLogger() |
8 | 8 |
root_logger.setLevel(logging.CRITICAL) |
--- speaches/main.py
+++ faster_whisper_server/main.py
... | ... | @@ -20,16 +20,22 @@ |
20 | 20 |
from faster_whisper import WhisperModel |
21 | 21 |
from faster_whisper.vad import VadOptions, get_speech_timestamps |
22 | 22 |
|
23 |
-from speaches import utils |
|
24 |
-from speaches.asr import FasterWhisperASR |
|
25 |
-from speaches.audio import AudioStream, audio_samples_from_file |
|
26 |
-from speaches.config import SAMPLES_PER_SECOND, Language, Model, ResponseFormat, config |
|
27 |
-from speaches.logger import logger |
|
28 |
-from speaches.server_models import ( |
|
23 |
+from faster_whisper_server import utils |
|
24 |
+from faster_whisper_server.asr import FasterWhisperASR |
|
25 |
+from faster_whisper_server.audio import AudioStream, audio_samples_from_file |
|
26 |
+from faster_whisper_server.config import ( |
|
27 |
+ SAMPLES_PER_SECOND, |
|
28 |
+ Language, |
|
29 |
+ Model, |
|
30 |
+ ResponseFormat, |
|
31 |
+ config, |
|
32 |
+) |
|
33 |
+from faster_whisper_server.logger import logger |
|
34 |
+from faster_whisper_server.server_models import ( |
|
29 | 35 |
TranscriptionJsonResponse, |
30 | 36 |
TranscriptionVerboseJsonResponse, |
31 | 37 |
) |
32 |
-from speaches.transcriber import audio_transcriber |
|
38 |
+from faster_whisper_server.transcriber import audio_transcriber |
|
33 | 39 |
|
34 | 40 |
models: OrderedDict[Model, WhisperModel] = OrderedDict() |
35 | 41 |
|
... | ... | @@ -72,7 +78,7 @@ |
72 | 78 |
|
73 | 79 |
@app.get("/health") |
74 | 80 |
def health() -> Response: |
75 |
- return Response(status_code=200, content="Everything is peachy!") |
|
81 |
+ return Response(status_code=200, content="OK") |
|
76 | 82 |
|
77 | 83 |
|
78 | 84 |
@app.post("/v1/audio/translations") |
--- speaches/server_models.py
+++ faster_whisper_server/server_models.py
... | ... | @@ -3,8 +3,8 @@ |
3 | 3 |
from faster_whisper.transcribe import Segment, TranscriptionInfo, Word |
4 | 4 |
from pydantic import BaseModel |
5 | 5 |
|
6 |
-from speaches import utils |
|
7 |
-from speaches.core import Transcription |
|
6 |
+from faster_whisper_server import utils |
|
7 |
+from faster_whisper_server.core import Transcription |
|
8 | 8 |
|
9 | 9 |
|
10 | 10 |
# https://platform.openai.com/docs/api-reference/audio/json-object |
--- speaches/transcriber.py
+++ faster_whisper_server/transcriber.py
... | ... | @@ -2,11 +2,16 @@ |
2 | 2 |
|
3 | 3 |
from typing import AsyncGenerator |
4 | 4 |
|
5 |
-from speaches.asr import FasterWhisperASR |
|
6 |
-from speaches.audio import Audio, AudioStream |
|
7 |
-from speaches.config import config |
|
8 |
-from speaches.core import Transcription, Word, common_prefix, to_full_sentences |
|
9 |
-from speaches.logger import logger |
|
5 |
+from faster_whisper_server.asr import FasterWhisperASR |
|
6 |
+from faster_whisper_server.audio import Audio, AudioStream |
|
7 |
+from faster_whisper_server.config import config |
|
8 |
+from faster_whisper_server.core import ( |
|
9 |
+ Transcription, |
|
10 |
+ Word, |
|
11 |
+ common_prefix, |
|
12 |
+ to_full_sentences, |
|
13 |
+) |
|
14 |
+from faster_whisper_server.logger import logger |
|
10 | 15 |
|
11 | 16 |
|
12 | 17 |
class LocalAgreement: |
--- speaches/utils.py
+++ faster_whisper_server/utils.py
No changes |
--- tests/__init__.py
... | ... | @@ -1,0 +0,0 @@ |
--- tests/app_test.py
+++ tests/app_test.py
... | ... | @@ -10,9 +10,9 @@ |
10 | 10 |
from fastapi.testclient import TestClient |
11 | 11 |
from starlette.testclient import WebSocketTestSession |
12 | 12 |
|
13 |
-from speaches.config import BYTES_PER_SECOND |
|
14 |
-from speaches.main import app |
|
15 |
-from speaches.server_models import TranscriptionVerboseJsonResponse |
|
13 |
+from faster_whisper_server.config import BYTES_PER_SECOND |
|
14 |
+from faster_whisper_server.main import app |
|
15 |
+from faster_whisper_server.server_models import TranscriptionVerboseJsonResponse |
|
16 | 16 |
|
17 | 17 |
SIMILARITY_THRESHOLD = 0.97 |
18 | 18 |
AUDIO_FILES_LIMIT = 5 |
Add a comment
Delete comment
Once you delete this comment, you won't be able to recover it. Are you sure you want to delete this comment?