File Upload

Last updated: 6/29/2025

Nexogen AI's file upload feature allows you to upload audio and video files for transcription. Support multiple formats, batch processing, and secure cloud storage with advanced speaker diarization and language detection.

Supported File Formats

Audio Formats

  • • MP3 (up to 320 kbps)
  • • WAV (PCM, 16-bit, 44.1kHz)
  • • M4A (AAC encoding)
  • • FLAC (lossless compression)
  • • OGG (Vorbis encoding)
  • • WMA (Windows Media Audio)

Video Formats

  • • MP4 (H.264 video, AAC audio)
  • • AVI (various codecs)
  • • MOV (QuickTime format)
  • • MKV (Matroska format)
  • • WebM (VP8/VP9 video)
  • • FLV (Flash Video)
Upload Process
1

Select Files

Drag and drop files or click to browse. You can upload multiple files at once for batch processing.

2

Configure Settings

Choose language, enable speaker diarization, and set processing options before upload.

3

Upload & Process

Files are securely uploaded and processed using our AI transcription engine.

4

Download Results

Download transcriptions in multiple formats including PDF, TXT, SRT, and VTT.

Advanced Features

Speaker Diarization

Automatically identify and label different speakers in your audio files

Language Detection

Automatic language detection for 50+ languages and dialects

Timestamp Generation

Precise timestamps for every word and sentence in the transcription

Cloud Storage

Secure cloud storage for your files and transcriptions with easy access

Multiple Export Formats

Export in PDF, TXT, SRT, VTT, JSON, and other formats

Batch Processing

Upload and process multiple files simultaneously for efficiency

File Size & Duration Limits

File Size

Up to 2GB per file

Larger files supported for enterprise plans

Duration

Up to 4 hours per file

Longer files supported for enterprise plans

Batch Upload

Up to 10 files at once

Higher limits for enterprise plans

Processing Options

Language Settings

Automatic Detection
  • • AI-powered language detection
  • • Supports 50+ languages
  • • Handles mixed-language content
  • • Dialect recognition
Manual Selection
  • • Choose specific language
  • • Improved accuracy for known languages
  • • Faster processing time
  • • Better handling of accents

Speaker Settings

Speaker Diarization
  • • Automatic speaker identification
  • • Up to 10 speakers per file
  • • Speaker labels (Speaker 1, Speaker 2, etc.)
  • • Custom speaker naming
Single Speaker
  • • Optimized for monologue content
  • • Faster processing
  • • Lower cost per minute
  • • Ideal for podcasts and lectures

Output Formats

Text Formats
  • • Plain text (.txt)
  • • Rich text with timestamps
  • • JSON with metadata
  • • CSV for data analysis
Subtitle Formats
  • • SRT (SubRip)
  • • VTT (WebVTT)
  • • ASS/SSA subtitles
  • • Custom timestamp formats
Security & Privacy

End-to-End Encryption

All files are encrypted during upload, processing, and storage to ensure maximum security.

Secure Storage

Files are stored in secure, EU-based data centers with redundant backups and disaster recovery.

Automatic Deletion

Files are automatically deleted after processing unless you choose to save them permanently.

Access Control

Only you have access to your files and transcriptions. No third-party access is permitted.

Tips for Best Results

File Preparation

  • • Use high-quality audio files (16-bit, 44.1kHz or higher)
  • • Ensure clear audio with minimal background noise
  • • Avoid heavily compressed formats for better accuracy
  • • Split very long files into smaller segments if needed

Processing Optimization

  • • Choose the correct language for better accuracy
  • • Enable speaker diarization for multi-speaker content
  • • Use batch upload for multiple files to save time
  • • Check processing status regularly for large files

Output Management

  • • Download results promptly to avoid expiration
  • • Use appropriate export formats for your needs
  • • Review and edit transcriptions for accuracy
  • • Save important files to cloud storage for backup