← Back to Projects
Fun

Dataset Translator

CLI tool for translating English text datasets into Burmese using the Gemini API with safe batch processing.

CLINode.jsGemini APInpmNLPBurmese

Key Highlights

  • Batch translation to reduce Gemini API calls
  • Token-aware batching for reliable processing
  • Automatic retry on temporary errors
  • Safe stop + partial save when quota is exceeded
  • Works with CSV datasets (text + label columns)

Overview

A CLI tool for translating English text datasets into Burmese using the Gemini API. Designed for AI/ML dataset preparation with safe batch translation, automatic error handling, and API limit protection. Useful for sentiment analysis, NLP localization, and multilingual training data.

Features

  • Batch translation to minimize API requests
  • Token-aware batching for reliable processing
  • Automatic retry on temporary API errors
  • Safe stop when quota is exceeded - no data lost
  • Saves translated and untranslated rows separately
  • Works with CSV datasets (text + label columns)
  • Uses Gemini 2.5 Flash for fast translation

Installation & Setup

Install globally

npm install -g dataset-translator

Or run via npx

npx dataset-translator

Run and follow prompts

dataset-translator
# Enter CSV dataset path: example_dataset.csv
# Enter Gemini API Key: **********************

CSV must have 'text' and 'label' columns. Get a Gemini API key at ai.google.dev

Usage

Expected CSV format and output files:

# Input CSV format:
text,label
I love programming.,2
This is very sad.,0

# Output files (batched):
burmese_dataset_0-457.csv        # Successfully translated rows
burmese_dataset_457-855.csv
untranslated_dataset_855-953.csv # Rows that hit quota limits