DART: Learning-Enhanced Model Predictive Control for Dual-Arm Non-Prehensile Manipulation

Abstract

What appears effortless to a human waiter remains a major challenge for robots. Manipulating objects non-prehensilely on a tray is inherently difficult, and the complexity is amplified in dual-arm settings. Such tasks are highly relevant to service robotics in domains such as hotels and hospitality, where robots must transport and reposition diverse objects with precision. We present DART, a novel dual-arm framework that integrates nonlinear Model Predictive Control (MPC) with an optimization-based impedance controller to achieve accurate object motion relative to a dynamically controlled tray. The framework systematically evaluates three complementary strategies for modeling tray–object dynamics as the state transition function within our MPC formulation: (i) a physics-based analytical model, (ii) an online regression-based identification model that adapts in real-time, and (iii) a reinforcement learning–based dynamics model that generalizes across object properties. Our pipeline is validated in simulation with objects of varying mass, geometry, and friction coefficients. Extensive evaluations highlight the trade-offs among the three modeling strategies in terms of settling time, steady-state error, control effort, and generalization across objects. To the best of our knowledge, DART constitutes the first framework for non-prehensile Dual-Arm manipulation of objects on a tray.

Architecture

DART Framework: Our proposed framework takes the current object state \(\mathbf{X}\) and the desired target state \(\mathbf{X}^{\text{ref}}\). We choose \(\boldsymbol{\nu}^{\text{ref}}\) as \(0^{6 \times 1}\) as inputs. These are fed into a nonlinear MPC, which computes the optimal tray-tilt commands (\(\mathbf{u}\)). These commands are then passed to an optimization-based impedance controller, which computes the torques required to realize the tilts. Feedback from the simulator updates the object state for closing the loop for the next MPC step. The object-tray dynamics is modeled as a state transition constraint for the MPC. We propose three models for this state transition constraint, namely (a) PMPC: an analytical physics-based dynamics model, (b) RMPC: a regressor-based model which learns unmodeled dynamics, and (c) LMPC: a PPO agent used to estimate the object dynamics.

Additional Results

(a) Knife

(b) Teapot

(c) Waterbottle

(d) Wineglass

(e) Pan