If you have existing experiments tracked in Neptune (v2.x or v3.x), you can export and migrate them to Pluto using the neptune-exporter CLI tool. This tool streams your Neptune runs to parquet files and loads them into Pluto while preserving run structure, metrics, parameters, and artifacts.
This is separate from the Neptune compatibility layer which enables dual-logging. Use this tool to migrate existing historical runs from Neptune to Pluto.
Overview
The neptune-exporter tool works in three stages:
- Export - Download Neptune runs to local parquet files and artifacts
- Inspect - View a summary of exported data
- Load - Upload the exported data to Pluto
Installation
Clone the neptune-exporter repository and install it with uv:
# Clone the repository
git clone https://github.com/Trainy-ai/neptune-exporter
cd neptune-exporter
# Install all dependencies (including Pluto loader)
uv sync --extra pluto
Quick Start
1. Export Neptune Data
First, authenticate with Neptune by setting your API token:
export NEPTUNE_API_TOKEN="your-neptune-api-token"
Then, export your Neptune runs to local storage using this basic command:
uv run neptune-exporter export \
-p "my-workspace/my-project" \
--exporter neptune3 \
--data-path ./exports/data \
--files-path ./exports/files \
-v
Use --exporter neptune2 if you’re using Neptune 2.x, or --exporter neptune3 for Neptune 3.x.
Export Options
The export command supports various filters to control what data gets exported:
| Option | Description |
|---|
-p/--project-ids | Required. Neptune project path (e.g., "workspace/project"). Can specify multiple projects. |
--exporter | Required. Neptune version: neptune2 or neptune3. |
--data-path | Directory for parquet files (default: ./exports/data). |
--files-path | Directory for artifacts (default: ./exports/files). |
-r/--runs | Filter runs by ID (regex supported). Neptune 3.x uses sys/custom_run_id, Neptune 2.x uses sys/id (e.g., SAN-1). |
-a/--attributes | Filter specific attributes (regex or exact names). |
-c/--classes | Include specific data types: parameters, metrics, series, or files. |
--exclude | Exclude specific data types (same options as --classes). |
--include-archived-runs | Include archived/trashed runs. |
--include-metric-previews | Neptune 3.x only. Include Metric Previews in the export (preview completion info is discarded). |
--api-token | Neptune API token (can also use NEPTUNE_API_TOKEN env var). |
2. Inspect Exported Data
Review what was exported before loading to Pluto:
uv run neptune-exporter summary --data-path ./exports/data
This displays:
- Number of projects and runs
- Breakdown of attribute types
- Step statistics (min/max/count)
- Data volume information
3. Load to Pluto
Upload the exported data to Pluto. You have two authentication options:
Option A: Use stored credentials (recommended for repeated loads)
# First, authenticate once (stores credentials locally)
pluto login <your-api-key>
# Then load without specifying credentials
uv run neptune-exporter load \
--loader pluto \
--data-path ./exports/data \
--files-path ./exports/files \
-v
Option B: Provide API key directly
# First, authenticate by setting the API key
export PLUTO_API_KEY="your-api-key"
# Then load with PLUTO_API_KEY
uv run neptune-exporter load \
--loader pluto \
--pluto-api-key "$PLUTO_API_KEY" \
--data-path ./exports/data \
--files-path ./exports/files \
-v
The loader will:
- Create Ops in Pluto for each Neptune run
- Upload metrics, parameters, and histograms
- Upload artifacts and file series
- Preserve experiment structure and metadata
Configuration
Optional Configuration
Configure the Pluto loader behavior using environment variables:
| Variable | Default | Description |
|---|
NEPTUNE_EXPORTER_PLUTO_PROJECT_NAME | Neptune’s project_id | Override destination project name (e.g., "workspace/project"). |
NEPTUNE_EXPORTER_PLUTO_BASE_DIR | . (current dir) | Base directory for cache files and working data. |
NEPTUNE_EXPORTER_PLUTO_LOADED_CACHE | .pluto_upload_cache.txt | Explicit path to the loaded runs cache file. |
NEPTUNE_EXPORTER_PLUTO_BATCH_ROWS | 10000 | Arrow-to-pandas batch size. Higher = more RAM, faster. Min: 1000. |
NEPTUNE_EXPORTER_PLUTO_LOG_EVERY | 50 | Downsample metric steps by logging every N-th point. Set to 1 for lossless (slower), 50+ for faster. |
NEPTUNE_EXPORTER_PLUTO_FLUSH_EVERY | 1000 | Buffered metric step flush threshold. Higher = more RAM, fewer API calls. Min: 100. |
NEPTUNE_EXPORTER_PLUTO_FILE_CHUNK_SIZE | 100 | Number of files per upload batch. Higher = faster, more risk of 502 errors. |
NEPTUNE_EXPORTER_PLUTO_FILE_CHUNK_SLEEP | 0.5 | Seconds to sleep between file batches. Lower = faster, more risk of rate limits. |
NEPTUNE_EXPORTER_PLUTO_MAX_FILES_PER_RUN | 0 | Hard cap on uploaded files per run. 0 = disabled. |
Loading Example
Choose your authentication method, then apply performance tuning:
With stored credentials:
# Authenticate once
pluto login <your-api-key>
# Set performance tuning variables
export NEPTUNE_EXPORTER_PLUTO_PROJECT_NAME="my-workspace/migrated-runs"
export NEPTUNE_EXPORTER_PLUTO_BATCH_ROWS=50000
export NEPTUNE_EXPORTER_PLUTO_LOG_EVERY=1 # Lossless
export NEPTUNE_EXPORTER_PLUTO_FLUSH_EVERY=3000
export NEPTUNE_EXPORTER_PLUTO_FILE_CHUNK_SIZE=100
export NEPTUNE_EXPORTER_PLUTO_FILE_CHUNK_SLEEP=0.1
# Load without specifying API key
uv run neptune-exporter load \
--loader pluto \
--data-path ./exports/data \
--files-path ./exports/files
With direct API key:
# Set all variables at once
export NEPTUNE_EXPORTER_PLUTO_PROJECT_NAME="my-workspace/migrated-runs"
export PLUTO_API_KEY="your-api-key"
export NEPTUNE_EXPORTER_PLUTO_BATCH_ROWS=50000
export NEPTUNE_EXPORTER_PLUTO_LOG_EVERY=1 # Lossless
export NEPTUNE_EXPORTER_PLUTO_FLUSH_EVERY=3000
export NEPTUNE_EXPORTER_PLUTO_FILE_CHUNK_SIZE=100
export NEPTUNE_EXPORTER_PLUTO_FILE_CHUNK_SLEEP=0.1
# Load with API key flag
uv run neptune-exporter load \
--loader pluto \
--pluto-api-key "$PLUTO_API_KEY" \
--data-path ./exports/data \
--files-path ./exports/files
Data Mapping
Attribute Types
Neptune attributes are mapped to Pluto as follows:
| Neptune Type | Pluto Mapping | Details |
|---|
float, int, string, bool | Config parameters | Logged via op.update_config() |
datetime | Config parameter (ISO string) | Converted to ISO 8601 format |
string_set | Config parameter (list) | Converted to list of strings |
float_series | Metrics | Logged via op.log(), preserves decimal steps |
string_series | Text artifacts (Logs) | Printed to console; consolidated into logs/stdout (non-error) and logs/stderr (error paths) as Text artifacts. |
histogram_series | Histograms | Logged as pluto.Histogram by step |
file, file_series | Artifacts | Uploaded via pluto.Artifact() |
Run Structure
- Project: Target project is set via
NEPTUNE_EXPORTER_PLUTO_PROJECT_NAME or uses Neptune’s project_id
- Op Name: Neptune
sys/name (experiment name) becomes the Pluto Op name. If missing, falls back to custom_run_id/run_id.
- Tags: Includes
import:neptune and import_project:<project_id> for traceability
- Tags: Neptune tags are preserved as Pluto tags
- Fork Relationships: Not natively supported (stored as metadata only)
Data Schema
Exported data uses the following parquet schema:
| Column | Type | Description |
|---|
project_id | string | Neptune project path (e.g., workspace/project) |
run_id | string | Neptune run identifier |
attribute_path | string | Full attribute path (e.g., metrics/accuracy) |
attribute_type | string | One of: float, int, string, bool, datetime, string_set, float_series, string_series, histogram_series, file, file_series |
step | decimal(18,6) | Decimal step value for series data |
timestamp | timestamp(ms, UTC) | Timestamp for time-based records |
int_value / float_value / string_value / bool_value / datetime_value / string_set_value | typed | Value based on attribute_type |
file_value | struct{path} | Relative path to downloaded artifact |
histogram_value | struct{type,edges,values} | Histogram payload |
Storage Layout
The exporter creates the following directory structure:
exports/
├── data/ # Parquet files
│ └── workspace_project_abc123/ # Sanitized project dir
│ ├── run_1_part_0.parquet
│ ├── run_1_part_1.parquet
│ └── run_2_part_0.parquet
└── files/ # Artifacts
└── workspace_project_abc123/
├── run_1/
│ └── artifacts/
└── run_2/
└── artifacts/
- Projects are sanitized for filesystem safety (with digest suffix)
- Each run is split into ~50 MB compressed parquet parts
- Files and artifacts mirror the project structure
Duplicate Prevention
The Pluto loader tracks loaded runs in a local cache file to prevent duplicates:
- Cache file:
.pluto_upload_cache.txt (or custom path via NEPTUNE_EXPORTER_PLUTO_LOADED_CACHE)
- Located in
NEPTUNE_EXPORTER_PLUTO_BASE_DIR (default: current directory)
- Stores project ID and run name to identify already-uploaded runs
- The loader does not check the Pluto backend; it only uses the local cache
To re-upload the same runs:
- Delete the run from the cache file, or
- Delete the entire cache file, or
- Run from a different directory, or
- Set
NEPTUNE_EXPORTER_PLUTO_BASE_DIR to a new location
Troubleshooting
Large Datasets
For runs with hundreds of thousands of steps:
-
Increase batch size - Process more rows at once (uses more RAM):
export NEPTUNE_EXPORTER_PLUTO_BATCH_ROWS=50000
-
Downsample metrics - Reduce points uploaded (lossy but faster):
export NEPTUNE_EXPORTER_PLUTO_LOG_EVERY=100 # Keep every 100th point
-
Increase flush buffer - Fewer API calls (uses more RAM):
export NEPTUNE_EXPORTER_PLUTO_FLUSH_EVERY=5000
File Upload Errors
If you encounter 502 errors or rate limits during file uploads:
-
Reduce chunk size - Upload fewer files per batch:
export NEPTUNE_EXPORTER_PLUTO_FILE_CHUNK_SIZE=50
-
Increase sleep time - Wait longer between batches:
export NEPTUNE_EXPORTER_PLUTO_FILE_CHUNK_SLEEP=1.0
-
Cap total files - Limit files per run:
export NEPTUNE_EXPORTER_PLUTO_MAX_FILES_PER_RUN=1000
Memory Issues
If the loader runs out of memory:
-
Decrease batch size:
export NEPTUNE_EXPORTER_PLUTO_BATCH_ROWS=5000
-
Decrease flush buffer:
export NEPTUNE_EXPORTER_PLUTO_FLUSH_EVERY=500
-
Process runs individually - Export and load one run at a time using
-r filter
Still Having Issues?
For additional help and the latest information, check out the GitHub Repository