A Python-based dataset generator and image processor for Khmer character recognition tasks. This tool creates PNG images of Khmer script (characters and digits), applies font and rotation augmentations, and exports the dataset into CSV format for model training.
- 🔡 Random Label Generation: Create diverse Khmer digits and characters from a predefined list.
- 🎨 Random Font Selection: Randomly choose fonts from a
.zip
archive of Khmer.ttf
fonts. - 🔄 Rotation Augmentation: Add random rotation (-10° to 10°) to simulate real handwriting variance.
- 🖼️ Unique Image Naming: Combines label, font, and rotation angle to prevent overwriting.
- 📁 Image Output Directory: Saves images to the
labels/
directory. - 📊 CSV Export: Converts 28×28 images to flattened pixel arrays for model training (
label, pixel1, ..., pixel784
).
Here are some sample generated images of Khmer characters:
Character | Image |
---|---|
ញ | ![]() |
ក | ![]() |
គ | ![]() |
ឈ | ![]() |
Install all dependencies using pip
:
pip install pillow numpy pandas scikit-learn imbalanced-learn