Skip to content

Commit 330e531

Browse files
author
Steve
committed
feat: Add record.py audio recording and transcription service from PR #42
- Audio recording from VDO.Ninja rooms - Automatic transcription using Whisper AI - FastAPI REST endpoints for start/stop recording - Process monitoring with 1-hour timeout - Systemd service configuration - HTML templates for web interface - Transcriptions saved to stt/ directory Co-authored-by: astroport contributor
1 parent 4886063 commit 330e531

File tree

8 files changed

+539
-0
lines changed

8 files changed

+539
-0
lines changed

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,3 +177,8 @@ Thumbs.db
177177
.Trash-*
178178
.nfs*
179179
ndi/gst-plugin-ndi/
180+
181+
# record.py whisper transcriptions output directory
182+
*_audio.ts
183+
stt/*.txt
184+
stt/*.json

README.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -696,6 +696,68 @@ Please note, the raspberry_ninja publish.py script can both send and recieve MID
696696

697697
midi demo video: https://youtu.be/Gry9UFtOTmQ
698698

699+
## `record.py` - Audio Recording and Transcription Service
700+
701+
The `record.py` microservice provides audio recording and automatic transcription capabilities using OpenAI's Whisper AI model.
702+
703+
### Features
704+
705+
- **Audio Recording**: Record audio streams from VDO.Ninja rooms
706+
- **Automatic Transcription**: Transcribe recordings using Whisper AI
707+
- **REST API**: Start/stop recordings via HTTP endpoints
708+
- **Process Monitoring**: Automatic timeout to prevent runaway recordings
709+
- **Command Line Interface**: Direct CLI usage for recording
710+
711+
### Prerequisites
712+
713+
```bash
714+
# Install Whisper and dependencies
715+
pip3 install openai-whisper fastapi uvicorn
716+
```
717+
718+
### Usage
719+
720+
#### Start the FastAPI Server
721+
722+
```bash
723+
python3 record.py --host 0.0.0.0 --port 8000
724+
```
725+
726+
#### Start Recording (API)
727+
728+
```bash
729+
curl -X POST -F "room=myRoom" -F "record=myRecord" http://localhost:8000/rec
730+
```
731+
732+
#### Stop Recording (API)
733+
734+
```bash
735+
curl -X POST -F "record=myRecord" -F "process_pid=<PID>" -F "language=en" http://localhost:8000/stop
736+
```
737+
738+
#### Command Line Recording
739+
740+
```bash
741+
# Start recording
742+
python3 record.py --room myRoom --record myRecord
743+
744+
# Stop recording and transcribe
745+
python3 record.py --stop --pid <PID> --record myRecord --language en
746+
```
747+
748+
### Systemd Service
749+
750+
To run as a system service:
751+
752+
```bash
753+
sudo ./setup.ninja_record.systemd.sh
754+
```
755+
756+
### Output
757+
758+
- Audio files: Saved as `<record_id>_audio.ts`
759+
- Transcriptions: Saved in `stt/` directory as text files
760+
699761
### Note:
700762

701763
- Installation from source is pretty slow and problematic on a rpi; using system images makes using this so much easier.

record.py

Lines changed: 207 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,207 @@
1+
#!/usr/bin/env python3
2+
from fastapi import FastAPI, Request, Form, HTTPException
3+
from fastapi.responses import HTMLResponse, JSONResponse
4+
from fastapi.templating import Jinja2Templates
5+
import whisper
6+
import subprocess
7+
import random
8+
import os
9+
import logging
10+
import glob
11+
import argparse
12+
import threading
13+
import time
14+
import uvicorn
15+
16+
# Configurer la journalisation
17+
logging.basicConfig(level=logging.INFO)
18+
logger = logging.getLogger(__name__)
19+
20+
app = FastAPI()
21+
templates = Jinja2Templates(directory="templates")
22+
model = whisper.load_model("medium")
23+
24+
# Liste pour suivre les processus
25+
processes = []
26+
27+
# Fonction de surveillance des processus
28+
def monitor_processes():
29+
while True:
30+
current_time = time.time()
31+
for process_info in processes:
32+
process, start_time = process_info
33+
if current_time - start_time > 3600: # 3600 secondes = 1 heure
34+
logger.info("Killing process with PID: %d due to timeout", process.pid)
35+
process.kill()
36+
processes.remove(process_info)
37+
time.sleep(60) # Vérifier toutes les minutes
38+
39+
# Démarrer le thread de surveillance
40+
monitor_thread = threading.Thread(target=monitor_processes, daemon=True)
41+
monitor_thread.start()
42+
43+
@app.get("/", response_class=HTMLResponse)
44+
async def index(request: Request, room: str = "", record: str = ""):
45+
logger.info("Serving index page with room: %s (%s)", room, record)
46+
return templates.TemplateResponse("index.html", {"request": request, "room": room, "record": record})
47+
48+
@app.api_route("/rec", methods=["GET", "POST"])
49+
async def start_recording(request: Request, room: str = Form(None), record: str = Form(None)):
50+
room = room or request.query_params.get("room")
51+
record = record or request.query_params.get("record")
52+
53+
if not room or not record:
54+
raise HTTPException(status_code=400, detail="Room and record parameters must not be empty")
55+
56+
logger.info("Starting recording for room: %s with record ID: %s", room, record)
57+
58+
# Créer un pipe pour rediriger les logs de publish.py
59+
read_pipe, write_pipe = os.pipe()
60+
61+
# Lancer l'enregistrement audio en arrière-plan avec les logs redirigés vers le pipe
62+
process = subprocess.Popen(["python3", "publish.py", "--room", room, "--record", record, "--novideo"], stdout=write_pipe, stderr=write_pipe)
63+
logger.info("Started publish.py process with PID: %d", process.pid)
64+
65+
# Fermer le côté écriture du pipe dans le processus parent
66+
os.close(write_pipe)
67+
68+
# Ajouter le processus à la liste avec l'heure de début
69+
processes.append((process, time.time()))
70+
71+
# Afficher un bouton pour ouvrir la nouvelle page de visioconférence
72+
return templates.TemplateResponse("recording.html", {"request": request, "room": room, "record": record, "process_pid": process.pid})
73+
74+
@app.post("/stop")
75+
async def stop_recording(record: str = Form(...), process_pid: int = Form(...), language: str = Form(...)):
76+
logger.info("Stopping recording for record ID: %s with process PID: %d", record, process_pid)
77+
78+
# Arrêter le processus d'enregistrement
79+
process = subprocess.Popen(["kill", str(process_pid)])
80+
process.wait()
81+
logger.info("Stopped publish.py process with PID: %d", process_pid)
82+
83+
# Trouver le fichier audio correspondant
84+
audio_files = glob.glob(f"{record}_*_audio.ts")
85+
if not audio_files:
86+
logger.error("No audio file found for record ID: %s", record)
87+
return {"error": f"No audio file found for record ID: {record}"}
88+
89+
audio_file = audio_files[0]
90+
logger.info("Transcribing audio file: %s", audio_file)
91+
92+
try:
93+
speech = model.transcribe(audio_file, language=language)['text']
94+
logger.info("Transcription completed for record ID: %s", record)
95+
except Exception as e:
96+
logger.error("Failed to transcribe audio file: %s", str(e))
97+
return {"error": f"Failed to transcribe audio file: {str(e)}"}
98+
99+
# Écrire la transcription dans un fichier texte
100+
transcript_file = f"stt/{record}_speech.txt"
101+
with open(transcript_file, "w") as f:
102+
f.write(speech)
103+
logger.info("Transcription saved to: %s", transcript_file)
104+
105+
# Supprimer le fichier audio
106+
os.remove(audio_file)
107+
logger.info("Audio file %s removed.", audio_file)
108+
109+
return {"transcription": speech}
110+
111+
@app.get("/stt")
112+
async def get_transcription(id: str):
113+
transcript_file = f"stt/{id}_speech.txt"
114+
if not os.path.exists(transcript_file):
115+
logger.error("No transcription file found for record ID: %s", id)
116+
return JSONResponse(status_code=404, content={"error": f"No transcription file found for record ID: {id}"})
117+
118+
with open(transcript_file, "r") as f:
119+
transcription = f.read()
120+
121+
# Ajouter le fichier à IPFS
122+
try:
123+
logger.info(f"Adding file to IPFS: {transcript_file}")
124+
result = subprocess.run(["ipfs", "add", transcript_file], capture_output=True, text=True)
125+
cid = result.stdout.split()[1]
126+
logger.info("Added file to IPFS: %s with CID: %s", transcript_file, cid)
127+
except Exception as e:
128+
logger.error("Failed to add file to IPFS: %s", str(e))
129+
return JSONResponse(status_code=500, content={"error": f"Failed to add file to IPFS: {str(e)}"})
130+
131+
logger.info("Returning transcription and CID for record ID: %s", id)
132+
return {"transcription": transcription, "cid": cid}
133+
134+
def start_recording_cli(room, record):
135+
logger.info("Starting recording for room: %s with record ID: %s", room, record)
136+
137+
# Créer un pipe pour rediriger les logs de publish.py
138+
read_pipe, write_pipe = os.pipe()
139+
140+
# Lancer l'enregistrement audio en arrière-plan avec les logs redirigés vers le pipe
141+
process = subprocess.Popen(["python3", "publish.py", "--room", room, "--record", record, "--novideo"], stdout=write_pipe, stderr=write_pipe)
142+
logger.info("Started publish.py process with PID: %d", process.pid)
143+
144+
# Fermer le côté écriture du pipe dans le processus parent
145+
os.close(write_pipe)
146+
147+
# Ajouter le processus à la liste avec l'heure de début
148+
processes.append((process, time.time()))
149+
150+
return process.pid
151+
152+
def stop_recording_cli(record, process_pid, language):
153+
logger.info("Stopping recording for record ID: %s with process PID: %d", record, process_pid)
154+
155+
# Arrêter le processus d'enregistrement
156+
process = subprocess.Popen(["kill", str(process_pid)])
157+
process.wait()
158+
logger.info("Stopped publish.py process with PID: %d", process_pid)
159+
160+
# Trouver le fichier audio correspondant
161+
audio_files = glob.glob(f"{record}_*_audio.ts")
162+
if not audio_files:
163+
logger.error("No audio file found for record ID: %s", record)
164+
return {"error": f"No audio file found for record ID: {record}"}
165+
166+
audio_file = audio_files[0]
167+
logger.info("Transcribing audio file: %s", audio_file)
168+
169+
try:
170+
speech = model.transcribe(audio_file, language=language)['text']
171+
logger.info("Transcription completed for record ID: %s", record)
172+
except Exception as e:
173+
logger.error("Failed to transcribe audio file: %s", str(e))
174+
return {"error": f"Failed to transcribe audio file: {str(e)}"}
175+
176+
# Écrire la transcription dans un fichier texte
177+
transcript_file = f"stt/{record}_speech.txt"
178+
with open(transcript_file, "w") as f:
179+
f.write(speech)
180+
logger.info("Transcription saved to: %s", transcript_file)
181+
182+
# Supprimer le fichier audio
183+
os.remove(audio_file)
184+
logger.info("Audio file %s removed.", audio_file)
185+
186+
return {"transcription": speech}
187+
188+
if __name__ == "__main__":
189+
parser = argparse.ArgumentParser(description="Démarrer le serveur FastAPI avec des paramètres personnalisés.")
190+
parser.add_argument("--host", type=str, default="0.0.0.0", help="Adresse hôte pour le serveur FastAPI.")
191+
parser.add_argument("--port", type=int, default=9000, help="Port pour le serveur FastAPI.")
192+
parser.add_argument("--room", type=str, help="Room name for the recording session.")
193+
parser.add_argument("--record", type=str, help="Record ID for the session.")
194+
parser.add_argument("--stop", action="store_true", help="Stop the recording.")
195+
parser.add_argument("--pid", type=int, help="Process PID to stop.")
196+
parser.add_argument("--language", type=str, default="en", help="Language for transcription.")
197+
args = parser.parse_args()
198+
199+
if args.room and args.record and not args.stop:
200+
pid = start_recording_cli(args.room, args.record)
201+
print(f"Recording started with PID: {pid}")
202+
elif args.stop and args.pid and args.record:
203+
result = stop_recording_cli(args.record, args.pid, args.language)
204+
print(result)
205+
else:
206+
logger.info("Starting FastAPI server")
207+
uvicorn.run(app, host=args.host, port=args.port)

setup.ninja_record.systemd.sh

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
#!/bin/bash
2+
set -euo pipefail
3+
[ $(id -u) -eq 0 ] && echo "LANCEMENT root INTERDIT (use sudo user). " && exit 1
4+
cat templates/record.service.tpl | sed "s~_USER_~$USER~g" | sed "s~_MY_PATH_~$(pwd)~" > /tmp/ninja_record.service
5+
6+
cat /tmp/ninja_record.service
7+
sudo cp /tmp/ninja_record.service /etc/systemd/system/ninja_record.service
8+
9+
sudo systemctl daemon-reload
10+
sudo systemctl enable ninja_record
11+
sudo systemctl restart ninja_record

stt/.readme

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Transcriptions will be saved here

templates/index.html

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
<!DOCTYPE html>
2+
<html>
3+
<head>
4+
<title>Vdo.Ninja Audio to AI</title>
5+
<style>
6+
body {
7+
font-family: Arial, sans-serif;
8+
background-color: #f0f0f0;
9+
margin: 0;
10+
padding: 0;
11+
display: flex;
12+
justify-content: center;
13+
align-items: center;
14+
height: 100vh;
15+
}
16+
.container {
17+
background-color: #fff;
18+
padding: 20px;
19+
border-radius: 8px;
20+
box-shadow: 0 0 10px rgba(0, 0, 0, 0.1);
21+
text-align: center;
22+
}
23+
h1 {
24+
color: #333;
25+
}
26+
p {
27+
color: #666;
28+
}
29+
input[type="text"] {
30+
padding: 10px;
31+
margin: 10px 0;
32+
border: 1px solid #ccc;
33+
border-radius: 4px;
34+
width: 100%;
35+
}
36+
button {
37+
padding: 10px 20px;
38+
background-color: #007bff;
39+
color: #fff;
40+
border: none;
41+
border-radius: 4px;
42+
cursor: pointer;
43+
}
44+
button:hover {
45+
background-color: #0056b3;
46+
}
47+
</style>
48+
</head>
49+
<body>
50+
<div class="container">
51+
<h1>VDO.ninja audio to text</h1>
52+
<p>Enter vdo.ninja room and push channel</p>
53+
<form action="/rec" method="post">
54+
<input type="text" name="room" placeholder="Room Name" value="{{ room }}" required>
55+
<input type="text" name="record" placeholder="Record ID" value="{{ record }}" required>
56+
<button type="submit">Start Recording</button>
57+
</form>
58+
<p>code : <a href="https://github.com/papiche/raspberry_ninja/">https://github.com/papiche/raspberry_ninja/</a></p>
59+
</div>
60+
</body>
61+
</html>

templates/record.service.tpl

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
[Unit]
2+
Description=Record Vdo Ninja STT
3+
After=network.target
4+
5+
[Service]
6+
User=_USER_
7+
Group=_USER_
8+
WorkingDirectory=_MY_PATH_
9+
ExecStart=/usr/bin/python3 _MY_PATH_/record.py --host 0.0.0.0 --port 9000
10+
Restart=always
11+
12+
[Install]
13+
WantedBy=multi-user.target

0 commit comments

Comments
 (0)