Image Annotation App with Speech Recognition

This is a Flask-based web application for annotating images using speech recognition. The app allows users to annotate images from a specified directory by selecting categories via voice commands or by clicking buttons. Annotations are saved in an Excel file (annotations.xlsx) with a binary representation (1 for selected categories, 0 for unselected).

Features

Speech Recognition: Use voice commands to select categories and navigate between images.
Persistent Microphone: The microphone stays active across images until manually turned off.
Uniform Image Display: Images are displayed at a consistent size to prevent layout shifts.
Annotations Saving: Annotations are saved in an Excel file with a binary representation.
Session Persistence: The app skips already annotated images, allowing you to continue where you left off.

Prerequisites

Python 3.6+
pip (Python package installer)
Web Browser: Google Chrome is recommended for optimal speech recognition support.

Installation

Clone the Repository
Create a Virtual Environment (Optional but Recommended)

python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
Install Dependencies

pip install -r requirements.txt

Directory Structure

image-annotation-app/ ├── app.py ├── annotations.xlsx # Created automatically after first annotation ├── requirements.txt ├── README.md ├── templates/ │ ├── index.html │ ├── annotate.html │ └── finished.html └── static/ ├── styles.css └── scripts.js

Usage

Prepare Your Images
- Place all the images you want to annotate in a directory on your local system.
- Supported image formats: .png, .jpg, .jpeg, .gif, .bmp
Run the Application

python app.py

The app will start running on http://localhost:5000.
Access the App
- Open your web browser and navigate to http://localhost:5000.
Start Annotating
- Step 1: Enter the directory path where your images are located.
  - Use an absolute path or ensure the path is correct relative to app.py.
- Step 2: Enter the categories you want to use for annotation, separated by commas (e.g., Dog, Cat, Bird).
- Step 3: Click Submit to start annotating.
Annotation Interface
- Image Display: The app will display images one at a time.
- Category Selection:
  - Voice Commands: Click the microphone button (🎤) to activate speech recognition.
    - Say the category names to toggle their selection (e.g., "dog", "cat").
    - Say "next" or "next image" to save the current annotations and proceed to the next image.
    - The microphone stays active across images until you click the microphone button again to turn it off.
  - Manual Selection: Alternatively, click on the category buttons to select/deselect them.
- Proceeding to Next Image:
  - Click the Next Image button or use the voice command "next image".
Completing Annotation
- After all images have been annotated, a message will display: "All images are annotated. Thank you!"

Annotations File

The annotations are saved in annotations.xlsx in the root directory.
The file contains:
- Column A: Image filenames.
- Columns B onward: Categories with binary values:
  - '1' indicates the category was selected.
  - '0' indicates the category was not selected.

Resuming Annotation

If you close the app or stop the server, your progress is saved in annotations.xlsx.
Restarting the app and entering the same directory and categories will allow you to continue where you left off.
The app skips images that have already been annotated.

Customization

Adjusting Image Size: Modify max-width and max-height in static/styles.css under the #image selector to change the displayed image size.
Changing Speech Recognition Language: In static/scripts.js, change recognition.lang = 'en-US'; to your desired language code.

Troubleshooting

Microphone Not Working:
- Ensure your browser has permission to access the microphone.
- The Web Speech API is best supported in Google Chrome.
Speech Recognition Not Accurate:
- Speak clearly and ensure background noise is minimized.
- Check that your microphone is functioning properly.
Invalid Directory Error:
- Ensure the directory path is correct and accessible.
- Use absolute paths to avoid confusion.

Dependencies

Flask: Web framework for Python.
openpyxl: Library to read/write Excel files.
SpeechRecognition API: Implemented in JavaScript via the Web Speech API (no additional installation needed).

Requirements File

See requirements.txt for the list of Python dependencies.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Inspired by the need for efficient image annotation tools.
Utilizes the Web Speech API for speech recognition functionality.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Annotation App with Speech Recognition

Features

Prerequisites

Installation

Directory Structure

Usage

Annotations File

Resuming Annotation

Customization

Troubleshooting

Dependencies

Requirements File

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
static		static
templates		templates
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

License

Rid1Whitehead/speech-to-text-image-annotation-app

Folders and files

Latest commit

History

Repository files navigation

Image Annotation App with Speech Recognition

Features

Prerequisites

Installation

Directory Structure

Usage

Annotations File

Resuming Annotation

Customization

Troubleshooting

Dependencies

Requirements File

License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages