Skip to content

Commit 1b414fb

Browse files
authored
SpeechToText Provider API (NC29+) (#196)
Ref: nextcloud/app_api#184 As currently there is not OCS API for queuing and asking to do something, added only Provider Registration API with basic tests. --------- Signed-off-by: Alexander Piskun <[email protected]>
1 parent ba3af9f commit 1b414fb

File tree

21 files changed

+444
-23
lines changed

21 files changed

+444
-23
lines changed
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
<component name="ProjectRunConfigurationManager">
2+
<configuration default="false" name="Speech2TxtProvider (last)" type="PythonConfigurationType" factoryName="Python">
3+
<module name="nc_py_api" />
4+
<option name="ENV_FILES" value="" />
5+
<option name="INTERPRETER_OPTIONS" value="" />
6+
<option name="PARENT_ENVS" value="true" />
7+
<envs>
8+
<env name="APP_HOST" value="0.0.0.0" />
9+
<env name="APP_ID" value="speech2text_example" />
10+
<env name="APP_PORT" value="9036" />
11+
<env name="APP_SECRET" value="12345" />
12+
<env name="APP_VERSION" value="1.0.0" />
13+
<env name="NEXTCLOUD_URL" value="http://nextcloud.local" />
14+
<env name="PYTHONUNBUFFERED" value="1" />
15+
</envs>
16+
<option name="SDK_HOME" value="" />
17+
<option name="WORKING_DIRECTORY" value="$PROJECT_DIR$/examples/as_app/speech2text/lib" />
18+
<option name="IS_MODULE_SDK" value="true" />
19+
<option name="ADD_CONTENT_ROOTS" value="true" />
20+
<option name="ADD_SOURCE_ROOTS" value="true" />
21+
<EXTENSION ID="PythonCoverageRunConfigurationExtension" runner="coverage.py" />
22+
<option name="SCRIPT_NAME" value="$PROJECT_DIR$/examples/as_app/speech2text/lib/main.py" />
23+
<option name="PARAMETERS" value="" />
24+
<option name="SHOW_COMMAND_LINE" value="false" />
25+
<option name="EMULATE_TERMINAL" value="false" />
26+
<option name="MODULE_MODE" value="false" />
27+
<option name="REDIRECT_INPUT" value="false" />
28+
<option name="INPUT_FILE" value="" />
29+
<method v="2" />
30+
</configuration>
31+
</component>

CHANGELOG.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,20 +2,26 @@
22

33
All notable changes to this project will be documented in this file.
44

5-
## [0.7.2 - 2022-12-28]
5+
## [0.8.0 - 2024-01-xx]
6+
7+
### Added
8+
9+
- API for registering Speech to Text provider(*avalaible from Nextcloud 29*). #196
10+
11+
## [0.7.2 - 2023-12-28]
612

713
### Fixed
814

915
- files: proper url encoding of special chars in `mkdir` and `delete` methods. #191 Thanks to @tobenary
1016
- files: proper url encoding of special chars in all other `DAV` methods. #194
1117

12-
## [0.7.1 - 2022-12-21]
18+
## [0.7.1 - 2023-12-21]
1319

1420
### Added
1521

1622
- The `ocs` method is now public, making it easy to use Nextcloud OCS that has not yet been described. #187
1723

18-
## [0.7.0 - 2022-12-17]
24+
## [0.7.0 - 2023-12-17]
1925

2026
### Added
2127

README.md

Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -24,21 +24,21 @@ Python library that provides a robust and well-documented API that allows develo
2424
* **Sync + Async**: Provides both sync and async APIs.
2525

2626
### Capabilities
27-
| **_Capability_** | Nextcloud 26 | Nextcloud 27 | Nextcloud 28 |
28-
|-----------------------|:------------:|:------------:|:------------:|
29-
| Calendar ||||
30-
| File System & Tags ||||
31-
| Nextcloud Talk ||||
32-
| Notifications ||||
33-
| Shares ||||
34-
| Users & Groups ||||
35-
| User & Weather status ||||
36-
| Other APIs*** ||||
37-
| Talk Bot API* | N/A |||
38-
| Text Processing* | N/A | ||
39-
| SpeechToText* | N/A | | |
40-
41-
&ast;_available only for NextcloudApp_<br>
27+
| **_Capability_** | Nextcloud 26 | Nextcloud 27 | Nextcloud 28 | Nextcloud 29 |
28+
|-----------------------|:------------:|:------------:|:------------:|:------------:|
29+
| Calendar |||||
30+
| File System & Tags |||||
31+
| Nextcloud Talk |||||
32+
| Notifications |||||
33+
| Shares |||||
34+
| Users & Groups |||||
35+
| User & Weather status |||||
36+
| Other APIs*** |||||
37+
| Talk Bot API* | N/A ||||
38+
| TextProcessing* | N/A | N/A | N/A ||
39+
| SpeechToText* | N/A | N/A | N/A | |
40+
41+
&ast;_available only for **NextcloudApp**_<br>
4242
&ast;&ast;&ast;_Activity, Notes_
4343

4444
### Differences between the Nextcloud and NextcloudApp classes

docs/reference/ExApp.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,3 +56,12 @@ UI methods should be accessed with the help of :class:`~nc_py_api.nextcloud.Next
5656

5757
.. autoclass:: nc_py_api.ex_app.ui.resources.UiStyle
5858
:members:
59+
60+
.. autoclass:: nc_py_api.ex_app.providers.providers.ProvidersApi
61+
:members:
62+
63+
.. autoclass:: nc_py_api.ex_app.providers.speech_to_text.SpeechToTextProvider
64+
:members:
65+
66+
.. autoclass:: nc_py_api.ex_app.providers.speech_to_text._SpeechToTextProviderAPI
67+
:members:
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
FROM python:3.11-slim-bookworm
2+
3+
COPY requirements.txt /
4+
5+
ADD cs[s] /app/css
6+
ADD im[g] /app/img
7+
ADD j[s] /app/js
8+
ADD l10[n] /app/l10n
9+
ADD li[b] /app/lib
10+
11+
RUN \
12+
python3 -m pip install -r requirements.txt && rm -rf ~/.cache && rm requirements.txt
13+
14+
WORKDIR /app/lib
15+
ENTRYPOINT ["python3", "main.py"]

examples/as_app/speech2text/Makefile

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
.DEFAULT_GOAL := help
2+
3+
.PHONY: help
4+
help:
5+
@echo "Welcome to Speech2TextProvider example. Please use \`make <target>\` where <target> is one of"
6+
@echo " "
7+
@echo " Next commands are only for dev environment with nextcloud-docker-dev!"
8+
@echo " They should run from the host you are developing on(with activated venv) and not in the container with Nextcloud!"
9+
@echo " "
10+
@echo " build-push build image and upload to ghcr.io"
11+
@echo " "
12+
@echo " deploy deploy Speech2TextProvider to registered 'docker_dev' for Nextcloud Last"
13+
@echo " "
14+
@echo " run install Speech2TextProvider for Nextcloud Last"
15+
@echo " "
16+
@echo " For development of this example use PyCharm run configurations. Development is always set for last Nextcloud."
17+
@echo " First run 'Speech2TextProvider' and then 'make registerXX', after that you can use/debug/develop it and easy test."
18+
@echo " "
19+
@echo " register perform registration of running Speech2TextProvider into the 'manual_install' deploy daemon."
20+
21+
.PHONY: build-push
22+
build-push:
23+
docker login ghcr.io
24+
docker buildx build --push --platform linux/arm64/v8,linux/amd64 --tag ghcr.io/cloud-py-api/speech_to_text_example:latest .
25+
26+
.PHONY: deploy
27+
deploy:
28+
docker exec master-nextcloud-1 sudo -u www-data php occ app_api:app:unregister speech2text_example --silent || true
29+
docker exec master-nextcloud-1 sudo -u www-data php occ app_api:app:deploy speech2text_example docker_dev \
30+
--info-xml https://raw.githubusercontent.com/cloud-py-api/nc_py_api/main/examples/as_app/speech2text_example/appinfo/info.xml
31+
32+
.PHONY: run
33+
run:
34+
docker exec master-nextcloud-1 sudo -u www-data php occ app_api:app:unregister speech2text_example --silent || true
35+
docker exec master-nextcloud-1 sudo -u www-data php occ app_api:app:register speech2text_example docker_dev --force-scopes \
36+
--info-xml https://raw.githubusercontent.com/cloud-py-api/nc_py_api/main/examples/as_app/speech2text_example/appinfo/info.xml
37+
38+
.PHONY: register
39+
register:
40+
docker exec master-nextcloud-1 sudo -u www-data php occ app_api:app:unregister speech2text_example --silent || true
41+
docker exec master-nextcloud-1 sudo -u www-data php occ app_api:app:register speech2text_example manual_install --json-info \
42+
"{\"appid\":\"speech2text_example\",\"name\":\"SpeechToText Provider\",\"daemon_config_name\":\"manual_install\",\"version\":\"1.0.0\",\"secret\":\"12345\",\"host\":\"host.docker.internal\",\"port\":9036,\"scopes\":{\"required\":[\"AI_PROVIDERS\"],\"optional\":[]},\"protocol\":\"http\",\"system_app\":0}" \
43+
--force-scopes --wait-finish
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
<?xml version="1.0"?>
2+
<info>
3+
<id>speech2text_example</id>
4+
<name>SpeechToText Provider</name>
5+
<summary>Example of SpeechToText Provider</summary>
6+
<description>
7+
<![CDATA[Simplest Speech to Text Provider example written in python]]>
8+
</description>
9+
<version>1.0.0</version>
10+
<licence>MIT</licence>
11+
<author mail="[email protected]" homepage="https://github.com/andrey18106">Andrey Borysenko</author>
12+
<author mail="[email protected]" homepage="https://github.com/bigcat88">Alexander Piskun</author>
13+
<namespace>PyAppV2_Speech2TextProvider</namespace>
14+
<category>tools</category>
15+
<website>https://github.com/cloud-py-api/nc_py_api</website>
16+
<bugs>https://github.com/cloud-py-api/nc_py_api/issues</bugs>
17+
<repository type="git">https://github.com/cloud-py-api/nc_py_api</repository>
18+
<dependencies>
19+
<nextcloud min-version="29" max-version="30"/>
20+
</dependencies>
21+
<external-app>
22+
<docker-install>
23+
<registry>ghcr.io</registry>
24+
<image>cloud-py-api/speech2text_example</image>
25+
<image-tag>latest</image-tag>
26+
</docker-install>
27+
<scopes>
28+
<required>
29+
<value>AI_PROVIDERS</value>
30+
</required>
31+
<optional>
32+
</optional>
33+
</scopes>
34+
<protocol>http</protocol>
35+
<system>false</system>
36+
</external-app>
37+
</info>
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
"""Use the simplest model to just test speech recognition.
2+
3+
Example is not production ready, as probably in production app we want running requests in subprocesses with timeout or
4+
run multiply workers to process requests simultaneously.
5+
"""
6+
7+
import os
8+
import tempfile
9+
import typing
10+
from contextlib import asynccontextmanager
11+
12+
import torch
13+
from fastapi import Depends, FastAPI, UploadFile, responses
14+
from huggingface_hub import snapshot_download
15+
from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
16+
17+
from nc_py_api import NextcloudApp
18+
from nc_py_api.ex_app import nc_app, persistent_storage, run_app, set_handlers
19+
20+
MODEL_NAME = "distil-whisper/distil-small.en"
21+
22+
23+
@asynccontextmanager
24+
async def lifespan(_app: FastAPI):
25+
set_handlers(APP, enabled_handler, models_to_fetch={MODEL_NAME: {"ignore_patterns": ["*.bin", "*onnx*"]}})
26+
yield
27+
28+
29+
APP = FastAPI(lifespan=lifespan)
30+
31+
32+
@APP.post("/distil_whisper_small")
33+
async def distil_whisper_small(
34+
_nc: typing.Annotated[NextcloudApp, Depends(nc_app)],
35+
data: UploadFile,
36+
max_execution_time: float = 0,
37+
):
38+
print(max_execution_time)
39+
model = AutoModelForSpeechSeq2Seq.from_pretrained(
40+
snapshot_download(
41+
MODEL_NAME,
42+
local_files_only=True,
43+
cache_dir=persistent_storage(),
44+
),
45+
torch_dtype=torch.float32,
46+
low_cpu_mem_usage=True,
47+
use_safetensors=True,
48+
).to("cpu")
49+
50+
processor = AutoProcessor.from_pretrained(MODEL_NAME)
51+
pipe = pipeline(
52+
"automatic-speech-recognition",
53+
model=model,
54+
tokenizer=processor.tokenizer,
55+
feature_extractor=processor.feature_extractor,
56+
max_new_tokens=128,
57+
torch_dtype=torch.float32,
58+
device="cpu",
59+
)
60+
_, file_extension = os.path.splitext(data.filename)
61+
with tempfile.NamedTemporaryFile(mode="w+b", suffix=f"{file_extension}") as tmp:
62+
tmp.write(await data.read())
63+
result = pipe(tmp.name)
64+
return responses.Response(content=result["text"])
65+
66+
67+
# async
68+
def enabled_handler(enabled: bool, nc: NextcloudApp) -> str:
69+
print(f"enabled={enabled}")
70+
if enabled is True:
71+
nc.providers.speech_to_text.register("distil_whisper_small", "DistilWhisperSmall", "/distil_whisper_small")
72+
else:
73+
nc.providers.speech_to_text.unregister("distil_whisper_small")
74+
return ""
75+
76+
77+
if __name__ == "__main__":
78+
run_app("main:APP", log_level="trace")
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
nc_py_api[app]>=0.8.0

nc_py_api/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
"""Version of nc_py_api."""
22

3-
__version__ = "0.7.2"
3+
__version__ = "0.8.0.dev0"

0 commit comments

Comments
 (0)