Dataset Translator
CLI tool for translating English text datasets into Burmese using the Gemini API with safe batch processing.
Key Highlights
- ◆Batch translation to reduce Gemini API calls
- ◆Token-aware batching for reliable processing
- ◆Automatic retry on temporary errors
- ◆Safe stop + partial save when quota is exceeded
- ◆Works with CSV datasets (text + label columns)
Overview
A CLI tool for translating English text datasets into Burmese using the Gemini API. Designed for AI/ML dataset preparation with safe batch translation, automatic error handling, and API limit protection. Useful for sentiment analysis, NLP localization, and multilingual training data.
Features
- –Batch translation to minimize API requests
- –Token-aware batching for reliable processing
- –Automatic retry on temporary API errors
- –Safe stop when quota is exceeded - no data lost
- –Saves translated and untranslated rows separately
- –Works with CSV datasets (text + label columns)
- –Uses Gemini 2.5 Flash for fast translation
Installation & Setup
Install globally
npm install -g dataset-translatorOr run via npx
npx dataset-translatorRun and follow prompts
dataset-translator
# Enter CSV dataset path: example_dataset.csv
# Enter Gemini API Key: **********************ℹ CSV must have 'text' and 'label' columns. Get a Gemini API key at ai.google.dev
Usage
Expected CSV format and output files:
# Input CSV format:
text,label
I love programming.,2
This is very sad.,0
# Output files (batched):
burmese_dataset_0-457.csv # Successfully translated rows
burmese_dataset_457-855.csv
untranslated_dataset_855-953.csv # Rows that hit quota limitsRelated Projects
Burmese Quote Generator
CLI tool to generate emotion-tagged Burmese quotes using the Gemini API in TXT or CSV format.
View project →NEZT CLI
Next & Nuxt EaZy Templates
Scaffold fully configured Next.js and Nuxt.js projects with routing, themes, and pages in minutes.
View project →Elyza Myanmar Chatbot
A rule-based Myanmar language chatbot with emotion detection, served as a Flask REST API.
View project →