Robot Learning vs Classical Robotics: When to Use Each (2026)

Classical Robotics: The Perception-Planning-Control Pipeline

Classical robotics follows an explicit, hand-engineered pipeline. Perception algorithms (often structured point cloud processing, CAD-model matching, or calibrated stereo vision) produce a geometric scene representation. A planning layer (RRT*, CHOMP, trajectory optimization, or model predictive control) computes a collision-free path to the goal. A control layer (PID, impedance control, or computed torque control) tracks the planned trajectory with tight real-time guarantees. Each stage has clear inputs, outputs, and failure modes that engineers can inspect, debug, and formally verify.

The strengths of this approach are not subtle. Classical controllers operate at 1 kHz control rates with deterministic latency. They provide formal stability guarantees through Lyapunov analysis, safety constraints through control barrier functions, and trajectory optimality through well-understood cost functions. They require no training data. And when they fail, the failure mode is typically interpretable: a perception error, an infeasible plan, or a tracking overshoot. For anyone deploying robots in environments where a regulator will ask "why did the robot do that," classical control provides answers that learned policies cannot.

The limitations are equally clear. Every hand-engineered pipeline is brittle to conditions its designer did not anticipate. A perception module tuned for brushed-metal parts fails on transparent objects. A motion planner optimized for a structured workstation fails in clutter. An impedance controller tuned for rigid grasping fails on deformable objects. Extending classical systems to novel situations requires more engineering, not more data, and the cost scales with the number of edge cases you need to handle.

Classical Tooling Deep Dive: IK, Motion Planning, Force Control

Understanding the specific tools in the classical stack is essential for teams evaluating which components to replace with learning and which to keep.

Inverse Kinematics (IK). Given a desired end-effector pose, IK computes the joint angles that achieve it. Analytical IK solutions exist for 6-DOF robots with specific geometries (most industrial arms) and run in microseconds. For 7-DOF redundant arms (Franka, OpenArm, Kinova Gen3), numerical IK solvers like KDL, TRAC-IK, or IKFast compute solutions in 1-10ms. IK is reliable, fast, and well-understood -- there is almost never a reason to replace it with a learned component. Even fully learned policies typically output end-effector targets that are converted to joint commands via IK.

Motion Planning. RRT* (Rapidly-exploring Random Tree, optimal variant) and its derivatives (BIT*, CHOMP, STOMP) compute collision-free trajectories in joint space or task space. Planning time ranges from 50ms for simple environments to 2-5 seconds for cluttered scenes. The key limitation: the planner needs an accurate collision model of the scene, which requires either known object geometry (from a CAD model) or real-time perception. In cluttered, unknown environments, the perception bottleneck makes classical planning brittle. MoveIt2 is the standard open-source motion planning framework, with support for most research arms.

Force Control. Impedance and admittance control regulate the mechanical interaction between the robot and its environment. Impedance control makes the robot behave like a mass-spring-damper system: when external forces act on the end-effector, the robot yields according to the specified stiffness and damping. Admittance control inverts this: the robot reads force-torque sensor data and generates position corrections proportional to the measured force error. These controllers are mathematically elegant, tunable, and provably stable -- but they require accurate knowledge of the robot's dynamics and contact model parameters. For a deeper dive on force sensing integration, see our F/T sensing guide.

The Classical Perception Pipeline. A typical structured perception stack: RGB-D camera captures a point cloud. Plane segmentation removes the table. Euclidean clustering isolates individual objects. Each object cluster is matched against a known CAD model library (using ICP or PPF matching) to estimate 6-DOF pose. This pipeline is fast (10-30ms per frame), accurate for known objects (sub-millimeter pose estimation), and completely fails for objects not in the model library. Every new object requires a new CAD model or a manual calibration step -- the engineering cost that drives teams toward learned perception.

Robot Learning: End-to-End Learned Policies

Robot learning replaces hand-engineered pipelines with data-driven models. In imitation learning, a neural network observes human demonstrations (camera images plus robot joint states) and learns a direct mapping from observation to action. In reinforcement learning, the agent learns through trial and error in simulation or the real world. In the emerging vision-language-action (VLA) paradigm, large pre-trained models take natural language task instructions and visual observations as input and produce motor commands as output.

The defining advantage of learned policies is that they handle perceptual complexity and environmental variation implicitly. A policy trained on 500 demonstrations of "pick up the cup" across 30 different cups, 5 lighting conditions, and varied table positions learns a representation that generalizes to the 31st cup without any explicit feature engineering. The policy does not have a perception module, a planning module, and a control module. It has a single model that maps pixels and proprioception to joint commands, and the relevant abstractions are learned from data rather than designed by engineers.

This capability comes at a cost. Learned policies are opaque: when they fail, diagnosing whether the failure is perceptual, planning-related, or a control execution error is difficult. They require substantial training data, typically hundreds to thousands of demonstrations for imitation learning, or millions of simulation steps for reinforcement learning. They offer no formal safety guarantees. And their behavior can change in unpredictable ways when the deployment environment shifts even slightly from the training distribution.

When Classical Robotics Wins

Precision assembly and machining. Tasks requiring sub-millimeter repeatability in known geometry. CNC machining, semiconductor wafer handling, PCB component insertion, and precision welding all demand tolerances that classical controllers achieve routinely and learned policies cannot guarantee. When the environment is fully specified and the physics well-modeled, classical control is both faster to deploy and more reliable in operation.

Known, structured environments. Automotive assembly lines, pharmaceutical packaging, and logistics sortation systems with controlled lighting, fixed object positions, and predictable physics are the natural domain of classical robotics. The engineering investment to cover all cases is finite and manageable. There is no reason to collect training data when you can write a deterministic controller that handles every situation your robot will encounter.

Safety-critical applications. Surgical robotics, collaborative robots operating near humans, and any deployment requiring regulatory certification benefit from the formal verification tools available to classical control. Control barrier functions, reachability analysis, and worst-case trajectory bounds give classical systems a safety assurance level that learned policies have not yet achieved. The FDA, for example, currently has no pathway for certifying an end-to-end learned surgical control policy.

Low-latency requirements. Applications requiring sub-millisecond control response, such as high-speed pick-and-place, balancing, or contact-sensitive assembly, need the deterministic timing that classical control loops provide. Neural network inference, even on optimized hardware, introduces variable latency that is problematic at control rates above 500 Hz.

When Robot Learning Wins

Unstructured environments. Sorting mixed items in warehouse bins, navigating cluttered homes, operating in kitchens or restaurants where object layouts change constantly. Writing a classical controller for bin picking across thousands of SKU geometries is a never-ending engineering project. Training a learned policy on diverse demonstrations is a data collection project with diminishing but continuous returns.

Dexterous manipulation. Tasks requiring finger-level coordination, deformable object handling, or contact-rich interaction. Folding laundry, tying knots, inserting flexible cables, and food preparation all involve physics that are prohibitively expensive to model analytically. Learned policies that observe the physical outcome of their actions and adapt implicitly through training data handle these tasks far more naturally than any engineered controller.

Generalization across object instances. When your robot needs to pick up any mug, not a specific mug. When your mobile robot needs to navigate any office, not a specific floor plan. When your cooking robot needs to handle any brand of pasta box. The moment your deployment requires handling novel instances within a category, learned representations from diverse training data become essential. Classical perception would need re-engineering for every new object variant.

Tasks that are hard to specify programmatically. "Wipe the table until it looks clean." "Pack the items so nothing shifts during shipping." "Arrange the flowers attractively." These tasks have objective success criteria that humans evaluate easily but that are difficult to express as mathematical cost functions. Imitation learning sidesteps the specification problem entirely by learning the task implicitly from demonstrations of the desired behavior.

Approach Comparison Table

Dimension	Classical Robotics	Learned Policies	Hybrid
Control frequency	1 kHz deterministic	10-50 Hz variable	100-1000 Hz (classical inner loop)
Novel objects	Requires new models	Generalizes from data	Learned perception + classical plan
Safety guarantees	Formal verification available	No formal guarantees	Classical safety envelope
Setup time	Weeks-months (engineering)	Days-weeks (data collection)	Weeks (both)
Debugging	Inspect each module	Black-box, need ablation	Learned modules harder
Deformable objects	Very difficult to model	Learns from demonstrations	Learned contact + classical motion
Scaling cost	O(edge cases) engineering	O(data diversity)	Both, but reduced

Tooling Comparison for Existing Teams

Tool Category	Classical Stack	Learning Stack
Middleware	ROS2 Humble/Iron	LeRobot, RoboCasa, robomimic
Motion planning	MoveIt2, OMPL, Drake	N/A (end-to-end)
Perception	PCL, Open3D, FoundationPose	DINOv2, SigLIP (learned backbone)
Simulation	Gazebo, Drake	Isaac Sim, MuJoCo, Genesis
Control	ros2_control, impedance/admittance	ACT, Diffusion Policy, VLA inference
Languages	C++ (real-time), Python (scripts)	Python (PyTorch), C++ (deployment)

The Hybrid Approach: Learned Perception + Classical Planning + Learned Control

The most capable deployed robot systems in 2026 are hybrids, and the specific hybrid architecture that has emerged as dominant is worth understanding in detail.

Learned perception layer. A neural network (often a pre-trained vision foundation model like DINOv2 or CLIP, fine-tuned on task-specific data) processes camera images and produces a structured scene representation: object poses, semantic labels, surface normals, grasp candidates. This replaces the brittle hand-engineered perception of classical systems with learned representations that generalize across lighting, textures, and object instances. The perception layer runs at 10-30 Hz and outputs structured data, not raw actions.

Classical planning layer. A model predictive controller (MPC) or sampling-based planner takes the perceived scene state and computes a collision-free, dynamically feasible trajectory to achieve the task goal. This layer operates on the clean geometric representation from the perception module and applies all the safety constraints, joint limits, and optimality criteria that classical planning excels at. Planning runs at 10-50 Hz.

Learned low-level control. For contact-rich tasks, a learned residual policy adjusts the classical controller's commands in real time based on force-torque sensor feedback and visual observations of the contact. This handles the deformable-object and contact-dynamics cases where classical control models break down, while the classical controller provides the overall trajectory structure and safety envelope. The residual policy runs at 100-500 Hz, adding corrections to the classical control output.

This architecture captures the strengths of both paradigms. The learned perception handles visual complexity. The classical planner provides safety guarantees and interpretable behavior. The learned residual controller handles contact dynamics that cannot be modeled analytically. Google DeepMind's manipulation systems, several production-deployed Amazon warehouse robotics cells, and multiple surgical robotics platforms use variants of this architecture in 2026.

Transition Path for Existing Classical Robotics Teams

If your team has a working classical robotics stack and wants to add learning capabilities, here is the recommended incremental path that minimizes risk:

Phase 1: Replace perception only. Swap your hand-engineered object detection and pose estimation with a learned model (FoundationPose, Grounding DINO, or a fine-tuned DINOv2 detector). Keep your classical planner and controller unchanged. This is the lowest-risk learning introduction and typically provides the largest immediate improvement (handles novel objects without CAD models). Timeline: 2-4 weeks of integration work.
Phase 2: Add learned grasp planning. Replace your analytical grasp planner (if any) with a learned grasp quality predictor (GraspNet, Contact-GraspNet, or AnyGrasp). The learned model proposes grasp candidates scored by predicted success, and your classical planner generates a trajectory to the selected grasp. Timeline: 2-6 weeks.
Phase 3: Add learned residual control. For contact-rich tasks where your classical impedance controller struggles, train a residual policy that adds corrections to the classical output. Collect 100-200 demonstrations of the contact phase only (not the full task). The residual policy handles the "last centimeter" that classical control cannot model. Timeline: 4-8 weeks including data collection.
Phase 4: Evaluate end-to-end. Once you have experience with learned components, evaluate whether an end-to-end learned policy (ACT or Diffusion Policy) outperforms your hybrid stack on your specific tasks. For some tasks, the answer will be yes -- particularly tasks with high visual complexity and moderate precision requirements. For precision tasks, the hybrid approach typically continues to win.

SVRC can provide data collection for any of these phases through our data services, and our engineering team advises on hybrid architecture design. The SVRC platform supports both ROS2-based classical workflows and PyTorch-based learning workflows.

A Practical Decision Framework: Five Questions

When starting a new robot application, answer these five questions to determine your approach.

1. Is the environment fully specified and stable? If yes (factory line, clean room, structured warehouse cell), start with classical control. You will deploy faster and with higher reliability than any learned approach. If no (homes, restaurants, unstructured warehouses), you need learning at least in the perception layer.

2. Do you need to handle novel object instances? If the robot will encounter objects it has never seen before, you need a learned perception and possibly a learned policy. Classical perception requires explicit models of every object. If the object set is fixed and known, classical perception is faster to implement and more reliable.

3. Is the task contact-rich or involving deformable objects? If yes, you need learning in the control layer. Classical contact models are inadequate for deformable manipulation, food handling, or textile tasks. A learned residual controller or a fully learned policy trained on contact-rich demonstrations is the practical path.

4. Do you need formal safety guarantees or regulatory certification? If yes, your system architecture must include a classical safety layer, even if other components are learned. Control barrier functions, emergency stop logic, and workspace boundary enforcement should be classical and formally verified. Learned components operate within the safety envelope defined by the classical layer.

5. What is your data budget? Learned policies require demonstrations (hundreds for imitation learning) or simulation environments (for RL). If you have the budget to collect 200-500 high-quality demonstrations of your specific task, imitation learning is practical. If not, classical control or a fine-tuned foundation model with minimal task-specific data is your path. SVRC's data collection services ($2,500 pilot / $8,000 campaign) can help you build the dataset efficiently if learning is the right approach.

Learning Approach Taxonomy: Choosing an Algorithm

Within the learning paradigm, the choice of algorithm has dramatic implications for data requirements, compute costs, and deployment characteristics. This taxonomy maps the landscape as of 2026.

Algorithm	Data Source	Sample Efficiency	Reward Required?	Best Use Case
Behavioral Cloning (BC)	50-500 demos	High	No	Short-horizon tasks with consistent strategy
ACT (Action Chunking)	50-200 demos	High	No	Bimanual tasks, long-horizon with action chunks
Diffusion Policy	200-1000 demos	Medium	No	Multimodal tasks with multiple valid strategies
VLA Fine-Tune (Octo/OpenVLA)	20-200 demos	Very High	No	Novel object generalization, language-conditioned tasks
PPO (on-policy RL)	10M-100M sim steps	Low	Yes (dense preferred)	Locomotion, continuous control with clear reward
SAC (off-policy RL)	1M-50M sim steps	Medium	Yes	Dexterous manipulation in sim, sample-efficient RL
GAIL / IRL	10-50 demos + sim	Medium	Learned from demos	Few demonstrations + good simulator available
Model-Based RL (Dreamer, MBPO)	100K-1M steps	High (for RL)	Yes	Data-limited RL where world model can be learned

The practical decision for most manipulation teams in 2026: start with ACT or Diffusion Policy (imitation learning), move to VLA fine-tuning if you need generalization across objects or language conditioning, and reserve RL for locomotion or cases where you have an accurate simulator and a clear reward function. GAIL and model-based RL occupy niche roles for now.

Classical Pipeline Failure Modes by Stage

Understanding exactly how classical pipelines fail helps teams identify which stages to replace with learning and which to keep. Each stage in the perception-planning-control pipeline has characteristic failure modes tied to specific environmental conditions.

Pipeline Stage	Failure Mode	Trigger Condition	Impact
Perception	Object not detected	Novel object geometry, transparent/reflective material	Complete task failure (no target)
Perception	Pose estimate off by >5mm	Symmetrical objects, partial occlusion, glare	Grasp misalignment, placement error
Planning	No feasible path found	Dense clutter, narrow passages, conflicting constraints	Task abort or timeout
Planning	Stale scene model	Dynamic environment, objects moved between perception and execution	Collision with moved objects
Control	Tracking overshoot	Aggressive trajectories, under-damped PID gains	Impact damage, position error at target
Control	Inadequate contact model	Deformable objects, unknown friction, compliant surfaces	Crush damage, slip, grasp failure
Integration	Timing desync between modules	High CPU load, ROS2 DDS congestion, GC pauses	Stale data used for planning, jerky execution

The pattern is clear: perception failures dominate in unstructured environments, planning failures dominate in cluttered scenes, and control failures dominate in contact-rich tasks. Teams should replace with learning the stage that causes the most failures in their specific deployment, and keep classical the stages that are working reliably.

The Residual Policy Pattern: Adding Learning to Classical Control

The residual policy pattern is the safest way to introduce learning into an existing classical system. Instead of replacing the classical controller, a learned residual policy adds corrections on top of the classical output. The total commanded action is: a_total = a_classical + a_residual, where a_residual is constrained to a small range (typically +/- 5mm position, +/- 2 degrees orientation per timestep).

# residual_policy.py -- Classical + learned residual controller
import numpy as np
import torch

class ResidualPolicyController:
    """Adds learned corrections to classical impedance controller."""

    def __init__(self, classical_controller, residual_model, max_residual=0.005):
        self.classical = classical_controller
        self.residual = residual_model  # Trained policy network
        self.max_residual = max_residual  # 5mm max correction

    def compute_action(self, obs, ft_reading, target_pose):
        # Classical controller: impedance control toward target
        a_classical = self.classical.compute(obs["joint_pos"], target_pose, ft_reading)

        # Learned residual: correct for contact dynamics
        with torch.no_grad():
            residual_input = torch.cat([
                torch.tensor(obs["joint_pos"]),
                torch.tensor(ft_reading),         # Force-torque sensor
                torch.tensor(obs["wrist_image"]).flatten()
            ])
            a_residual = self.residual(residual_input).numpy()

        # Safety clamp: residual cannot exceed max_residual per joint
        a_residual = np.clip(a_residual, -self.max_residual, self.max_residual)

        return a_classical + a_residual

This pattern has been deployed successfully in insertion tasks (peg-in-hole, connector mating), where the classical controller handles the approach trajectory and the residual policy handles the contact-phase corrections that require sensitivity to force feedback. The residual is trained on 100-200 demonstrations of the contact phase only, keeping data requirements low. At SVRC, we use this pattern with the OpenArm 101 for precision assembly tasks where classical control alone achieves 85% success and the residual policy pushes it to 96%.

Computational Requirements Comparison

The infrastructure cost of each approach differs dramatically. Teams must understand these requirements before committing to an architecture.

Resource	Classical Pipeline	IL (ACT / DP)	VLA Fine-Tune	RL (Sim)
Training GPU	None	1x RTX 3090/4090	1-4x A100/H100	1-8x A100 (Isaac Sim)
Training time	N/A (hand-tuned)	2-8 hrs	12-48 hrs	24-120 hrs
Inference GPU	None (CPU only)	1x RTX 3060+	1x A100 or H100	Same as IL at deploy
Inference latency	< 1 ms	20-100 ms	200-500 ms	Same as IL at deploy
Disk/storage	< 100 MB (URDF, configs)	50-200 GB (dataset)	200 GB-2 TB	50-500 GB (replay buf)
Engineering labor	High (weeks-months)	Medium (data + train)	Low-medium (fine-tune)	High (sim engineering)
Cloud cost estimate	$0/month	$50-200/train run	$500-3,000/train run	$1,000-10,000/run

These costs are for a single-task training cycle. Multi-task policies, hyperparameter sweeps, and iterative data collection multiply the numbers accordingly. Teams with tight budgets should consider SVRC's data collection service, which amortizes hardware and operator costs across multiple projects.

Failure Mode Analysis: Diagnosing Classical vs. Learned Systems

When a robot system fails, diagnosing the root cause follows fundamentally different pathways depending on the paradigm. Understanding these diagnostic frameworks saves significant debugging time.

Classical pipeline failure modes:

Perception failure. The point cloud is noisy, the object is not detected, or the pose estimate is off by more than the controller's tolerance. Diagnostic: visualize the point cloud and detection output at the failure timestep. Fix: tune segmentation parameters, add a camera viewpoint, or improve lighting. Time to diagnose: minutes to hours.
Planning failure. The planner returns no solution (infeasible), times out, or produces a collision. Diagnostic: visualize the planning scene and collision objects in RViz2. Fix: increase planning time, add clearance margins, or simplify the collision model. Time to diagnose: minutes.
Control failure. The robot overshoots the target, oscillates, or fails to maintain contact. Diagnostic: plot joint position tracking error, velocity profiles, and force-torque signals. Fix: retune PID gains, adjust impedance parameters, or reduce trajectory speed. Time to diagnose: hours.
Integration failure. Timing issues between modules -- the planner uses a stale perception output, or the controller receives a trajectory update mid-execution. Diagnostic: check message timestamps in ROS2 logs. Fix: add synchronization barriers or switch to a reactive replanning architecture. Time to diagnose: hours to days.

Learned policy failure modes:

Distribution shift. The object is in a position, orientation, or lighting condition not sufficiently covered by training data. Diagnostic: compare the failure observation to the training distribution (e.g., by computing embedding distances using the policy's vision encoder). Fix: collect more diverse demonstrations covering the failure case. Time to diagnose: hours to days.
Mode averaging. The policy outputs the average of two valid strategies, producing a trajectory that matches neither. Diagnostic: rollout visualization shows the robot hesitating between two approaches. Fix: switch from MSE loss to a multimodal architecture (Diffusion Policy, CVAE). Time to diagnose: hours.
Compounding error. The policy drifts off-trajectory after 20-30 steps and cannot recover. Diagnostic: track per-step action error over time and observe accelerating divergence. Fix: increase action chunk length, add temporal ensembling, or collect DAgger data. Time to diagnose: hours.
Calibration mismatch. Camera extrinsics shifted between data collection and deployment, causing consistent spatial offset in policy actions. Diagnostic: measure camera pose against the calibration used during data collection. Fix: recalibrate cameras or add camera pose to the observation space. Time to diagnose: minutes once suspected, days if not.

The key asymmetry: classical failures are generally faster to diagnose because each module has inspectable inputs and outputs. Learned policy failures require inference about the training data distribution, which is inherently more difficult. Hybrid architectures partially address this by isolating learned components so that classical diagnostic tools apply to most of the pipeline.

Real-World Case Studies

These examples illustrate how the choice between classical, learned, and hybrid approaches plays out in practice.

Case 1: Electronics connector insertion (classical wins). A contract manufacturer needed a robot to insert USB-C connectors into PCB sockets. Tolerance: +/- 0.15mm. The connector geometry is known, the PCB is fixtured, and the insertion trajectory is a straight line with controlled force. A classical impedance controller with spiral search at the insertion point achieved 99.2% success in 10,000 trials. No training data was needed. An IL approach was prototyped and achieved 94% success after 500 demonstrations -- worse performance at higher cost.

Case 2: Warehouse bin picking (learning wins). An e-commerce fulfillment center needed a robot to pick arbitrary items from bins containing 50+ SKU categories. Items ranged from soft pouches to rigid boxes to oddly shaped electronics. A classical pose estimation + grasp planning pipeline achieved 78% pick success, limited by perception failures on novel and reflective objects. A learned grasp planner (Contact-GraspNet) with a DINOv2 backbone achieved 93% pick success across all categories, including items never seen during training. The learned system took 3 weeks of data collection (4,000 pick demonstrations) versus 4 months of engineering for the classical system.

Case 3: Food plating (hybrid wins). A food preparation startup needed a robot to plate salad ingredients in an aesthetically pleasing arrangement. Classical control handled the precise placement of individual items (known portion sizes, calibrated dispensers). A learned perception model identified ingredient types and current plate state from overhead camera images. A learned high-level planner generated the composition layout based on training images of plated meals. The hybrid system achieved 87% acceptance rate from human quality evaluators, compared to 62% for a fully classical rule-based system and 79% for a fully learned end-to-end policy.

MoveIt2 + Learned Perception: A Minimal Hybrid Example

For teams looking to build their first hybrid system, here is the minimal integration pattern using MoveIt2 for motion planning with a learned object detector replacing classical perception.

# hybrid_pick.py -- Minimal hybrid: learned perception + classical planning
import rclpy
from moveit2 import MoveIt2
from groundingdino import GroundingDINO
import numpy as np

def hybrid_pick(node, moveit, detector, camera, prompt="the red mug"):
    # --- Learned perception layer ---
    rgb, depth = camera.capture()
    detections = detector.predict(rgb, prompt)  # GroundingDINO
    best = max(detections, key=lambda d: d.confidence)
    # Back-project 2D detection center to 3D using depth
    cx, cy = best.center
    z = depth[int(cy), int(cx)] / 1000.0  # mm to meters
    x = (cx - camera.cx) * z / camera.fx
    y = (cy - camera.cy) * z / camera.fy
    target_pose = [x, y, z, 0, 0, 0, 1]  # position + quaternion

    # --- Classical planning layer (MoveIt2) ---
    # Pre-grasp: approach from above
    pre_grasp = target_pose.copy()
    pre_grasp[2] += 0.10  # 10cm above
    moveit.move_to_pose(pre_grasp)
    # Grasp: descend with impedance control
    moveit.move_to_pose(target_pose, velocity_scaling=0.3)
    moveit.close_gripper(force=20.0)  # 20N grip
    # Lift
    lift_pose = target_pose.copy()
    lift_pose[2] += 0.15
    moveit.move_to_pose(lift_pose)

This 25-line example captures the essence of the hybrid pattern: a learned model (GroundingDINO) handles the perceptual complexity of finding arbitrary objects from language descriptions, while MoveIt2 handles collision-free trajectory planning with proper joint limits and velocity constraints. At SVRC, our OpenArm 101 ships with a MoveIt2 configuration package and RealSense camera driver integration, making this type of hybrid system deployable in an afternoon.

Data Requirements Compared

Classical control requires system identification data: joint position, velocity, torque, and force-torque sensor readings from carefully designed calibration experiments. A few hours of structured experiments typically suffice. The data is low-volume but must be high-precision. No neural network training is involved.

Imitation learning typically requires 200-1,000 demonstration episodes per task, each containing synchronized camera images and robot state at 30-50 Hz. Collection time ranges from 2 hours (200 demos of a simple task) to 2 weeks (1,000 demos of a complex task with diverse objects). Data quality dominates quantity: 200 clean demonstrations outperform 1,000 noisy ones. For details on collection cost, see our cost per demonstration analysis.

Foundation model fine-tuning (starting from Octo, OpenVLA, or RT-2) requires far fewer task-specific demonstrations, typically 50-200, because the pre-trained model provides a strong visual and behavioral prior. This is the most practical approach for teams with limited data budgets who need learned behavior. Pre-trained models are available through the Open X-Embodiment ecosystem.

Reinforcement learning requires a simulation environment that accurately models the task physics. Building that simulation is itself a significant engineering effort, but once available, RL can generate millions of training episodes at near-zero marginal cost. The challenge is sim-to-real transfer: policies trained in simulation often fail on real hardware due to physics modeling inaccuracies.

Residual Policy Learning: The Best of Both Worlds

Residual policy learning is a specific hybrid architecture that layers a learned correction on top of a classical controller. The classical controller provides a baseline behavior (e.g., move to target pose, apply insertion force), and the learned residual network outputs small corrections (typically +/- 5mm in position, +/- 5 degrees in orientation) that adapt the behavior to handle variability the classical controller cannot.

This architecture has several concrete advantages over end-to-end learning:

Safety by construction. The residual is bounded -- the learned network can only deviate by a limited amount from the classical trajectory. Even if the learned component fails completely (outputs zeros), the classical controller still executes a reasonable behavior.
Sample efficiency. The residual network only needs to learn the correction, not the entire manipulation trajectory. This reduces the required demonstrations from 200-500 (end-to-end) to 50-100 (residual learning) for many tasks.
Interpretable failure modes. When the system fails, you can inspect whether the classical controller or the residual correction is at fault by running the classical controller alone and comparing.
Incremental deployment. Start with the classical controller in production, then add the residual correction when it is validated. No big-bang replacement of the control stack.

# residual_policy.py -- Classical + learned residual
import numpy as np

class ResidualPolicy:
    """Wrap a classical controller with a learned residual correction."""

    def __init__(self, classical_controller, residual_network,
                 max_pos_residual=0.005, max_rot_residual=0.087):
        self.classical = classical_controller
        self.residual_net = residual_network
        self.max_pos = max_pos_residual  # 5mm
        self.max_rot = max_rot_residual  # 5 degrees in radians

    def predict_action(self, observation):
        # Classical controller produces baseline action
        base_action = self.classical.compute_action(observation)

        # Learned residual predicts correction from visual observation
        raw_residual = self.residual_net(observation['image'])

        # Clip residual to safety bounds
        pos_residual = np.clip(
            raw_residual[:3], -self.max_pos, self.max_pos)
        rot_residual = np.clip(
            raw_residual[3:6], -self.max_rot, self.max_rot)

        # Apply correction to classical action
        corrected_action = base_action.copy()
        corrected_action[:3] += pos_residual
        corrected_action[3:6] += rot_residual
        return corrected_action

In SVRC evaluations, residual policies consistently achieve 90-95% of end-to-end performance with 25-50% of the training data. They excel on tasks where the coarse behavior is known (move to object, insert, place) but fine adjustments are needed for object variability. The OpenArm 101 ships with a MoveIt2-based classical controller that serves as an ideal base for residual learning experiments.

Migration Path: Moving from Classical to Hybrid to Learned

For teams with existing classical robot systems, the migration to learned components should follow a phased approach that manages risk while capturing the benefits of learning.

Phase 1: Learned perception, classical everything else (2-4 weeks). Replace your fixed vision pipeline (segmentation, pose estimation) with a learned detector (GroundingDINO, SAM2) while keeping MoveIt2 planning and impedance control. This alone typically improves success rate by 10-25% on tasks involving novel objects, with no change to the safety-critical control stack. Risk: low.
Phase 2: Add residual corrections (4-8 weeks). Collect 50-100 demonstrations where the classical system succeeds but could be more precise or handle more object variants. Train a residual policy that corrects the classical trajectory based on visual observations. This improves success rate by another 5-15% on tasks with high object variability. Risk: low (bounded residuals).
Phase 3: End-to-end learned policy with classical safety layer (8-16 weeks). For tasks where the performance ceiling of hybrid approaches is insufficient, train a full imitation learning policy on 200-500 demonstrations. Wrap it with a classical safety layer that enforces joint limits, velocity constraints, and force limits. This provides maximum flexibility but requires more data and more careful validation. Risk: moderate.
Phase 4: Foundation model fine-tuning (ongoing). As foundation models mature, fine-tune Octo or OpenVLA on your task-specific data for maximum generalization with minimal new data collection. Use the classical safety layer from Phase 3. Risk: moderate, but decreasing as foundation models improve.

Most SVRC clients are in Phase 1-2. The key principle: never replace a working classical component with a learned one unless the learned version demonstrably outperforms it on your specific deployment distribution. Learned components add capability but also add failure modes -- the tradeoff must be positive.

The Trend: Learning Is Expanding, Classical Is Not Disappearing

The trajectory of the field is clear. Robot learning is expanding into domains previously dominated by classical control: factory assembly, logistics, quality inspection. Foundation models are reducing the data requirements that previously made learning impractical for many applications. And the hybrid architecture pattern is making it possible to add learned capabilities incrementally to classical systems without replacing the classical safety and control infrastructure.

But classical robotics is not disappearing. It is becoming the safety and precision substrate on which learned capabilities are layered. Every production robot system in 2026 that handles real-world variability through learning also contains a classical controller ensuring that the learned policy does not drive the robot into a table, exceed joint limits, or apply dangerous forces. The debate between learning and classical is resolving into a question of architecture: which components are learned, which are engineered, and how do they interface.

For teams starting new projects, SVRC supports both paradigms. Our data services provide the demonstrations needed for imitation learning. Our hardware catalog includes arms and sensors compatible with both classical and learned control stacks. And our engineering team can advise on architecture decisions for hybrid systems that combine the best of both approaches.