The World's Leading Managed Robot Data Collection Service
Expert operators. Professional teleoperation hardware. Your dataset format of choice. From 20-episode pilots to 10,000+ episode production campaigns — we collect the training data your robot policies need.
Why Robot Training Data Quality Matters
Most robot learning failures are data failures. Here are the three problems every ML team hits — and how SVRC solves each one.
Hard to collect at scale
Building a data collection station takes weeks. Recruiting and training operators takes longer. Most teams spend 60% of their project timeline on infrastructure instead of research. SVRC operates multi-station facilities with trained operators ready to start collecting within days of project kickoff.
Operator quality varies wildly
An untrained operator produces demonstrations with inconsistent strategies, failed grasps, and jerky trajectories. These demonstrations actively harm policy training. SVRC operators complete a qualification program covering approach consistency, grasp precision, and temporal smoothness before they touch your data.
Format inconsistency wastes months
Different labs store data in incompatible schemas. Timestamp conventions differ. Camera naming is inconsistent. Converting between formats introduces subtle bugs. SVRC delivers datasets in your exact target format — HDF5, RLDS, or LeRobot — validated against your training pipeline before handoff.
How a Data Campaign Actually Works
Six phases from first conversation to training-ready dataset. Every campaign follows this process.
Kickoff Call & Task Design
We work with your team to define the task specification: success criteria, observation space (which cameras, what resolution, what frame rate), action space (joint positions vs. end-effector velocity vs. delta actions), and scene diversity requirements (object variations, lighting, initial positions). You receive a detailed collection protocol document for review before any data is collected. Typical duration: 1-3 days.
Hardware Configuration
We configure the robot arm, teleoperation interface, cameras (Intel RealSense D435/D455, ZED 2i, or your cameras), and workspace fixtures for your task. Camera extrinsics are calibrated, time synchronization is verified to <5 ms across all streams using hardware-triggered cameras and shared clock sources. A test episode validates the full pipeline end-to-end. Typical duration: 1-2 days.
Operator Training & Qualification
Operators are trained on your specific task before collection begins. Each operator must pass a proficiency test demonstrating consistent approach strategy, precise grasps, smooth trajectories, and correct scene resets. Operators who fail the qualification test are re-trained or replaced. We track per-operator quality metrics throughout the campaign. Typical duration: 0.5-1 day.
Collection Sprints
Qualified operators collect demonstrations in structured sprints. Real-time quality monitoring flags failed episodes, inconsistent strategies, synchronization errors, and frame drops. Scene diversity is managed according to the protocol — object positions, lighting, and distractors are varied systematically. You receive daily progress reports with throughput metrics, quality statistics, and example episodes.
Quality Control & Validation
Every episode passes our 10-point QA checklist (see below). Failed episodes are flagged, excluded from the primary dataset, and made available separately on request. Dataset-level validation checks format consistency, schema compliance, and statistical properties (action distributions, episode length distributions, success rates). A QA report accompanies every delivery.
Delivery & Handoff
Validated data is exported to your target format (HDF5, RLDS, LeRobot, or custom) and delivered via secure transfer, Hugging Face Hub, or directly into your Fearless Platform workspace. We include the collection protocol, QA report, camera calibration files, and a README with dataset schema documentation. Post-delivery support: 2 weeks of format/pipeline troubleshooting included.
Data Specifications
Technical specs for the data streams we capture during collection.
| Data Stream | Frequency | Resolution | Notes |
|---|---|---|---|
| Joint states | 30 Hz (configurable to 50 Hz) | float64 | Position, velocity, effort for each joint |
| RGB cameras | 60 fps (configurable) | 640x480 or 1280x720 | 1-4 views typical (wrist, overhead, side, ego) |
| Depth cameras | 30 fps | 640x480 | Intel RealSense D435/D455, aligned to RGB |
| End-effector pose | 30 Hz | 6-DOF (xyz + rpy) | Forward kinematics from joint states |
| Gripper state | 30 Hz | float (aperture) | Continuous aperture + binary open/close |
| Force/torque | 100 Hz (when available) | 6-axis | Wrist-mounted F/T sensor (UR, Franka) |
| Annotations | Per-episode | Structured JSON | Task phase labels, language instructions, keyframes, success/failure |
All streams are synchronized to <5 ms using hardware-triggered cameras and a shared clock source. Timestamp format: Unix epoch (float64, seconds).
Data Collection Methods
We select the teleoperation method that matches your task requirements. Here is how they compare.
| Method | How It Works | Precision | Scale (demos/hr) | Cost/Episode | Best For |
|---|---|---|---|---|---|
| Leader-Follower Arms | Operator moves lightweight leader arm; follower replicates joint positions at 3-8 ms latency | Highest | 20-35 | $$ | Contact-rich manipulation, insertion, bimanual tasks |
| VR / Quest 3 | Hand controller positions mapped to end-effector via inverse kinematics | Good | 15-25 | $ | Pick-and-place, sorting, packing, gross manipulation |
| SpaceMouse / Keyboard | 6-DOF joystick controls end-effector velocity; keyboard triggers discrete actions | Moderate | 5-12 | $ | Prototyping, navigation, low-precision tasks |
| Haptic Gloves | Finger joint tracking drives dexterous robot hands with force feedback | High (dexterous) | 8-15 | $$$ | In-hand manipulation, assembly, tool use |
| Kinesthetic Teaching | Operator physically guides the robot arm through the task in gravity-compensation mode | High | 10-18 | $$ | Simple tasks with compliant arms, quick data |
| Scripted Demos | Programmatic waypoint trajectories with randomized perturbations | Exact (deterministic) | 60-200+ | $ | Structured tasks, data augmentation, baseline generation |
Not sure which method fits your task? Talk to our team — we will recommend the right approach based on your task requirements and budget.
Operator Quality Assurance Process
The quality of demonstrations directly determines the quality of trained policies. Here is how we ensure operator quality.
Qualification Testing
Before touching production data, every operator must pass a task-specific proficiency test. We evaluate approach consistency (does the operator use the same general strategy each time?), grasp precision (does the gripper close at the correct position and angle?), trajectory smoothness (are motions fluid or jerky?), and scene reset accuracy (are objects returned to valid initial positions?).
Real-Time Monitoring
During collection, automated monitoring flags potential issues: episodes where joint velocities exceed normal bounds, where the gripper state does not match visual evidence, where frame drops exceed 2%, or where episode duration falls outside the expected range. Flagged episodes are reviewed by a senior operator before inclusion in the dataset.
Per-Operator Metrics
We track success rate, average episode duration, trajectory smoothness score, and QA pass rate for each operator. Operators whose metrics drift below thresholds are re-trained or reassigned. Campaign-level quality reports break down these metrics so you can see exactly who collected which data and at what quality level.
Output Formats
We deliver datasets in the format your training pipeline needs — no conversion headaches on your end.
HDF5
The gold standard for robot data. Native to ACT, ALOHA, and Diffusion Policy. Hierarchical episode structure with efficient random access and mature Python tooling via h5py.
RLDS / TFRecord
The format behind Open X-Embodiment and Octo. TensorFlow Datasets schema for cross-embodiment training. Streamable from cloud storage with efficient tf.data pipelines.
LeRobot / Parquet
Hugging Face ecosystem native. One-command upload to HF Hub with built-in visualization. Compact MP4 video storage. Growing community with 300+ public datasets.
Custom Formats
Need ROS bag, CSV, JSON-lines, or a proprietary schema? We write custom export adapters. You provide the target schema; we handle conversion and validation.
Read our detailed HDF5 vs RLDS vs LeRobot format comparison guide for technical deep dives on each format.
How Many Demonstrations Do You Need?
Dataset size depends on task complexity, policy architecture, and target success rate. Here are benchmarks from real campaigns.
| Task Complexity | Example Tasks | Demos for 80%+ | Demos for 90%+ | Notes |
|---|---|---|---|---|
| Simple single-arm | Pick-and-place, push, open drawer | 20-50 | 50-100 | ACT/Diffusion Policy, fixed objects |
| Moderate single-arm | Insertion, stacking, tool use | 50-150 | 150-300 | Contact-rich, position-sensitive |
| Bimanual | Folding, handover, coordinated assembly | 100-300 | 300-600 | Two-arm coordination required |
| High diversity | Multi-object sorting, variable geometry | 200-500 | 500-1,000 | Many object/scene variations |
| VLA / generalist | Language-conditioned multi-task | 500-2,000 | 2,000-10,000+ | Large-scale, multi-embodiment |
These are guidelines based on published research and SVRC campaign data. Actual requirements depend on your specific policy architecture, training regime, and generalization targets. We help scope the right dataset size during the kickoff call.
Pricing
Transparent pricing aligned to your project stage. Every tier includes task design, hardware setup, expert collection, QA, and delivery.
Pilot
20 demonstrations
- Task design & protocol document
- Single collection station
- 1 qualified operator
- 10-point QA on every episode
- Delivery in 1 format (HDF5, RLDS, or LeRobot)
- 1-2 week turnaround
- 2 weeks post-delivery support
Campaign
100 demonstrations
- Everything in Pilot, plus:
- Multi-station parallel collection
- 2-4 dedicated operators
- Weekly batch deliveries
- Scene diversity management
- Delivery in up to 2 formats
- 2-6 week turnaround
Enterprise
Custom scale / ongoing
- Everything in Campaign, plus:
- Dedicated collection infrastructure
- On-site or co-located deployment
- SLA with uptime guarantees
- Custom robot integration
- All formats + Fearless Platform access
- Ongoing support & iteration
Compatible Hardware
We operate and integrate with a wide range of robot arms. If your platform is ROS2-compatible, we can collect data on it.
Open-source, SVRC-designed
Dual-arm kit with leader-follower
Research-grade torque control
Industrial collaborative arms
Humanoid full-body
Cost-effective 6/7-DOF
Lightweight research arm
Ship us your robot
See our full hardware catalog for specifications and availability. Leasing rates available for all platforms.
10-Point Data Quality Checklist
Every episode we deliver passes this checklist. No exceptions.
- Synchronized timestamps — All sensor streams (cameras, joints, actions) aligned to <5 ms tolerance using hardware-triggered cameras and shared clock sources.
- Consistent episode structure — Every episode follows the same observation/action schema with identical array dimensions, data types, and key names.
- Operator qualification — Operators pass a proficiency test on the specific task before their episodes enter the production dataset.
- Task success verification — Each episode is reviewed for full task completion. Failed episodes are flagged and excluded from the primary dataset (available separately on request).
- Scene reset consistency — Object positions, lighting, and workspace state are reset to defined initial conditions between episodes. Randomization ranges are documented.
- Frame drop monitoring — Camera streams are checked for dropped frames. Episodes with >2% frame loss are re-collected.
- Gripper state consistency — Gripper open/close signals are validated against camera evidence. Phantom gripper events are corrected or flagged.
- Joint limit compliance — No episode contains joint positions outside the robot's safe operating range or near singularity configurations.
- Metadata completeness — Every episode includes task name, operator ID, timestamp, robot serial, camera config, and success label as structured metadata.
- Annotation standards — Language instructions, task phase labels, and keyframe annotations (when requested) follow the agreed annotation schema.
Campaign Examples
Anonymized examples from real data collection campaigns.
2,400 bimanual demos in 6 weeks
A CMU robotics lab needed bimanual manipulation data for a Diffusion Policy paper. Two OpenArm 101 leader-follower stations, 3 camera views each, 2 trained operators. HDF5 delivery validated against their ACT training pipeline. Paper published at a top venue 3 months later.
First policy in 4 weeks
Series A manipulation startup. Campaign tier: 100 demos of a kitting task on UR5e. LeRobot format delivery. Their ML team trained an ACT policy that achieved 87% success rate on first deployment. Total data cost: $8,000.
Ongoing pipeline, 72% to 94% success
Logistics company with mobile manipulators across 3 warehouses. Monthly campaigns of 500+ demos covering new SKU types and failure edge cases. Data flows into Fearless Platform for failure mining and retraining. Policy success rate improved from 72% to 94% over 6 months.
5,000-episode benchmark dataset
Academic group creating a standardized manipulation benchmark. 10 task categories, 500 demos each, 4 operators. Delivered in all three formats (HDF5, RLDS, LeRobot). Dataset published on Hugging Face Hub with 200+ downloads in first month.
Who Uses SVRC Data Services
Research Labs
University and corporate research groups who need high-quality demonstration data for policy learning papers. We handle the tedious collection work so your researchers can focus on algorithms and experiments.
Startup Policy Training
Early-stage robotics companies building their first manipulation policies. Get from zero to a working policy in weeks instead of months by outsourcing data collection to operators who already know how to produce training-grade demonstrations.
Enterprise Deployment
Companies deploying robots in production who need ongoing data collection to improve policy performance, handle edge cases, and expand to new task variants. Our campaign and enterprise tiers support continuous data pipelines. Enterprise programs.
Academic Benchmarks
Research groups creating standardized benchmark datasets for the community. We provide the collection infrastructure and operator consistency needed for reproducible, high-quality benchmark datasets that other labs can build on.
Trusted by Leading Research Institutions
SVRC works with researchers and engineering teams at top universities and robotics companies to collect the demonstration data that powers state-of-the-art manipulation policies.
Frequently Asked Questions
What teleoperation hardware do you use?
We operate leader-follower arms (ALOHA-style WidowX/ViperX and OpenArm setups), Meta Quest 3 VR systems, 6-DOF SpaceMouse interfaces, and SenseGlove Nova 2 haptic gloves. We select the interface that best matches your task requirements for precision, throughput, and data quality. For bimanual tasks, we run dual leader-follower or dual VR configurations. See our bimanual teleoperation guide for details.
What formats do you deliver?
We deliver datasets in HDF5 (ACT/ALOHA compatible), RLDS/TFRecord (for Open X-Embodiment and Octo), LeRobot Parquet (Hugging Face Hub ready), or custom formats. You specify the format in your project brief, and we handle all conversion and validation. Read our format comparison guide for details on each.
How long does a data collection campaign take?
A pilot program (20 demos) typically takes 1-2 weeks from kickoff to delivery, including task design and hardware setup. A standard campaign (100 demos) takes 2-6 weeks depending on task complexity and scene diversity requirements. Enterprise-scale projects are scoped individually. Rush delivery is available for pilots at additional cost.
Can you collect data on my robot?
Yes. We work with OpenArm, DK1, Franka FR3, UR3e, UR5e, xArm, Kinova Gen3, Unitree G1, and most ROS2-compatible robot arms. If you ship us your robot or we can procure one, we integrate it into our collection infrastructure. Custom integrations typically take 3-5 business days. We also support mobile manipulators and bimanual configurations.
What is a typical cost per episode?
Cost per episode ranges from $8-$35 depending on task complexity, number of camera views, teleoperation method, and QA requirements. Simple tabletop pick-and-place tasks are at the lower end; contact-rich bimanual tasks with dexterous hands are at the higher end. Volume discounts apply for campaigns over 500 episodes. Contact us for a detailed quote based on your specific requirements.
Do you sign NDAs?
Yes. We sign mutual NDAs before any project discussion that involves proprietary tasks, robot configurations, or research goals. All data collected under contract is owned by the client. We do not retain copies, use client data for any other purpose, or include client data in public datasets. We also support custom data governance and security requirements for enterprise clients.
Can collected data go directly into the Fearless Platform?
Yes. Enterprise data collection campaigns include Fearless Platform access. Data collected by SVRC operators flows directly into your Fearless workspace with full metadata, QA reports, and lineage information. This creates a seamless path from collection to replay, annotation, evaluation, and retraining.
What annotation types are available?
We support timestamped annotations (task phase labels at specific time points), segmented annotations (start/end boundaries for subtask phases), language instructions (natural language descriptions for VLA training), keyframe annotations (critical manipulation moments), and success/failure labels. Custom annotation schemas are supported for enterprise campaigns.
Related Resources
Robot Data Collection Guide
Comprehensive guide covering teleoperation methods, station setup, operator training, and quality assurance.
HDF5 vs RLDS vs LeRobot Formats
Technical comparison with conversion examples and pipeline compatibility tables.
Bimanual Teleoperation Setup
Hardware configuration and calibration for dual-arm leader-follower and VR-based collection.
Fearless Data Platform
Manage collected data, replay episodes, mine failures, and close the loop to retraining.
Hardware Catalog
OpenArm 101, DK1, Orca Hand, UR series, and more. Purchase or lease.
Enterprise Programs
Managed programs, fleet deployment, and SLA-backed operations for enterprise teams.