Before training (~20ep)
Accompnying Video
Figure👇: Overall architecture of the EasyUUV framework, which comprise three parts: (a) RL and adaptive S-Surface-based composite controller module; (b) parallelized reinforcement learning environment developed on NVIDIA Isaac Lab; and (c) multimodal LLM adjustment module for real-world adaption.
‎
‎
Although notable progress has been made in Unmanned Underwater Vehicles (UUVs) attitude control, most existing approaches still face challenges in achieving broad generalizability, maintaining robustness under real-world disturbances, and improving deployment efficiency.
To address above challenges, this paper presents EasyUUV, an LLM-enhanced, universal, and lightweight Sim-to-Real reinforcement learning (RL) framework for robust attitude control of UUVs. Our main contributions are summarized as follows:
Before training (~20ep)
After training (~400ep)
‎
‎
Image: Thrust (N) vs. Normalized PWM Value (corresponding to 1100–1900 μs) for the Blue Robotics T200 Thruster (16V Voltage).
EasyUUV testbed 3D Model
‎
Experimental testbed for real-world validation of EasyUUV. (Tank Experiment).
‎
Simulation videos (left: A-S-Surface, right: A-S-Surface+RL). The results underscore the advantage of integrating RL with adaptive control, especially multi-axis attitude tracking robustness.
Comparison of MSE across two tasks for different controllers, under both RL and non-RL settings (simulation). Left: w/o RL, Right: w/ RL.
‎
Comparison of tracking response curves and compund error for different control strategies. Specifically, A-S-Surface converges the fastest and achieves the highest final reward, indicating superior learning efficiency. Besides, S-Surface shows slower convergence and lower reward, while PID performs worst with minimal improvement.
‎
We next examine the effect of DR for RL training. DR notably reduce performance loss, with SDR consistently achieving better out-of-domain performance and stability.
‎
The MSE results under varying DR levels.
‎
EasyUUV achieves strong disturbance rejection, can quickly recover from transient disturbances, and also demonstrates certain position-holding capability.
Pertubation Only.
Pertubation + Manual Disturbance.
‎
Comparison of tracking response and compund error curves for different control tasks with and without RL policy (Real-world experiment). (a) Yaw+Roll. (b) Yaw+Pitch.
‎
‎
Tracking response curves of tank experiment (with manual disturbance)
‎
Tracking response curves and LLM-generated controller adjustments under turbulence scenario. LLM generate adjustment recommendation based on visual logs and textual data. The LLM can generate recommendation within 5s, and the performance significantly increase.
We conduct outdoor experiment in a estuary at Shenzhen Bay Park, where the turbulent current poses a significant challenge to the attitude control. EasyUUV still achieved reasonable attitude control performance in this environment with strong uncertainty.
@inproceedings{easyuuv,
title={{EasyUUV}: An LLM-Enhanced Universal and Lightweight Sim-to-Real Reinforcement Learning Framework for UUV Attitude Control},
author={Xie, Guanwen and Xu, Jingzehua and Tang, Jiwei and Huang, Yubo and Zhang, Shuai and Li, Xiaofan},
booktitle={IEEE International Conference on Robotics \& Automation (ICRA, in submission)},
year={2026}}