‎

The adaptive control and maneuvering capabilities of Autonomous Underwater Vehicles (AUVs) have drawn significant attention, but achieving precise maneuvering control of AUVs is challenging due to their highly nonlinear dynamics, time-varying hydrodynamics, strong six-degree-of-freedom coupling, and environmental uncertainties. The contributions of this paper mainly include three parts:

We develop a novel AUV controller that employs RL to train an expert-level control strategy for high-level task execution and control command generation, while the S-surface controller produces control signals, ensuring cancellation of nonlinear effects and external disturbances under extreme sea conditions.
We utilize LLMs for joint optimization of RL reward function and controller parameters, utilizing multimodal task execution logs and combining contextual information such as environmental descriptions to enhance the final task performance and adaptability.
The proposed controller demonstrates superior robust ness and flexibility compared to conventional PID and SMC controllers in challenging marine conditions characterized by waves, currents, and complex terrain. It exhibits exceptional performance in advanced 3D tasks, including underwater target tracking and data collection tasks.

Methodology

The following figure illustrates the overall design of our controller. To fully leverage the advantages of the LLM-enhanced RL based S-Surface controller, while achieving simulation and perception of extreme marine conditions to evaluate the disturbance rejection performance, we decompose the proposed framework into three core modules.

The RL based S-Surface Controller Module employs RL policies focusing on high-level task decision-making, and the S-Surface controller utilized to achieve precise 6-DoF control.
The LLM-enhanced Iterative Joint Optimization Module performs joint optimization of the RL reward function and controller parameters guided by domain-specific guidelines. It systematically analyzes environmental summaries, numerical computations, and multi-modal task feedback to enhance adaptation to dynamic marine environments.
The Simulation and Environment-Aware Module executes physical ocean modeling with 6-DoF control dynamics for extreme scenario simulation, and fuses multisource sensor data for active disturbance mitigation.

Figure👇: The overall framework of our proposed controller, which comprises of three modules: (A) RL-based S-Surface Controller Module. (B) LLM-Enhanced Iterative Joint Optimization Module. (C) Environment-Aware & Simulation Module.

‎

Video👇: 2D and 3D visualization of the extreme sea condition.

We validate the effectiveness of our proposed controller utilizing a REMUS 100 AUV, and we introduce two high-level tasks:

3D data collection task: Employing the proposed controller, a single or multiple AUVs operate together to search and collect data from sensor nodes (SNs) scattered randomly.
3D target tracking task: A single or multiple AUVs are utilized to follow a dynamic underwater target whose position is unpredictable.

LLM optimization Performance

LLMs achieve flexible adjustments for joint optimization based on the bottleneck analysis, and terminate optimization when reaching system control limitations.

Video👇: Parameters for yaw tracking controller and reward weights, along with 2D projections of AUV trajectories from the 3D data collection tasks during the LLM optimization phase.

‎

Figure👇: Performance results of the S-surface controller in tracking reference signals taken from a target tracking task, during the LLM optimization phase.

LLM-enhanced S-Surface Performance

Figure👇: Comparative results of three controllers tracking reference signals taken from a target tracking task under extreme sea(ES) and very extreme sea(VES) conditions. When transitioning to VES conditions, the controllers exhibit progressive performance deterioration, with the PID controller showing worse flexibility and stability compared to the S-Surface controller. Additionally, the PVS suffers a complete loss of depth regulation capability.

‎

Table👇: Performance metrics of different control methods evaluated during the data collection task under ES and VES conditions. Under the extreme sea condition, the S-Surface controller achieves performance close to the ideal setting, and under the very extreme sea condition, the S-Surface controller exhibits significantly lower performance degradation compared to PID.

3D Visualization

The following videos visualize the 3D data collection and target tracking tasks performed by two AUVs. In the former case, the AUV must judiciously control its direction to efficiently serve sensor nodes due to its restricted tuning capability, while the latter requires more real-time control capabilities.

Video👇: 3D visualizations of two AUVs performing the data collection task. The AUVs can plan optimal routes as much as possible, achieving performance close to ideal control conditions.

‎

Video👇: 3D visualizations of two AUVs performing the target tracking task. The AUVs also demonstrate high maneuverability in response to target turns.

Supplementary files

⚠ UNDER CONSTRUCTION! COMING SOON

BibTeX

@article{xie2025AUVRScontrol,
      title={Never too Prim to Swim: An LLM-Enhanced RL-based Adaptive S-Surface Controller for AUVs under Extre Sea Conditions},
      author={Xie, Guanwen and Xu, Jingzehua and Ding, Yimian and Zhang, Zhi and Zhang, Shuai and Li, Yi},
      journal={arXiv preprint arXiv:2503.00527},
      year={2025}
    }

Never too Prim to Swim: An LLM-Enhanced RL-based Adaptive S-Surface Controller for AUVs under Extre Sea Conditions

Introduction