Army Researchers Developed More Advanced Learning Model For Swarms of Drones

Army's future operating concept is the multi-drone operations that will allow swarms of autonomous aerial and ground vehicles to operate alongside warfighters to optimally accomplish missions and reduce the unpredictability of current training reinforcement.

The autonomous agents will be able to adapt to the changing battlefield conditions through these newly developed learning components, said Army researcher Dr. Alec Koppel from the DEVCOM, Army Research Laboratory (formerly U.S. Army Combat Capabilities Development Command).

Making the reinforcement learning-based policies efficiently obtainable is critical in making the multi-drone operations a reality.

Swarming Requires Heterogenous Mobile Platforms to Overmatch Enemies

According to an article by TechXplore, the swarming of drones during the operation is a method wherein multiple autonomous systems act as a cohesive unit by actively coordinating their actions.

According to the Army researchers, the future multi-drone operations will require swarms of dynamic and coordinated heterogeneous mobile platforms to overmatch the enemies of the US Armed Forces.

Moreover, Dr. Jemin George of the DEVCOM said that the Army is looking forward to using swarming technology to be able to execute time-consuming and dangerous tasks.

"Finding optimal guidance policies for these swarming vehicles in real-time is a key requirement for enhancing warfighters' tactical situational awareness, allowing the U.S. Army to dominate in a contested environment," Dr. George said.

Using Reinforcement Learning in the Multi-Drone Operations

Reinforcement learning has been used for solving intractable tasks in the past, like in the strategy games of Go, chess, and videogames. Koppel believed that using reinforcement learning in multi-drone operations will address the complexities of the relationship between the goals and dynamics of designing autonomous behaviors, Science Daily reported.

In other words, reinforcement learning will provide a way to optimally control uncertain agents to achieve multi-drone operations when the precise model for the agent is unavailable.

Together with his research team, Koppel developed new policy search schemes for general utilities and observed that it reduces the volatility of reward accumulation and results in an efficient exploration of unknown domains and a mechanism for including past experiences.

But Koppel also noted that in the context of ground robots, data could be costly to acquire.

"Reducing the volatility of reward accumulation, ensuring one explores an unknown domain in an efficient manner, or incorporating prior experience, all contribute towards breaking existing sample efficiency barriers of prevailing practice in reinforcement learning by alleviating the amount of random sampling one requires in order to complete policy optimization," Koppel said.

Nonetheless, Koppel believed in the bright future of this technology that he dedicated himself to making his finding applicable for the innovations in the Army that soldiers may use on the battlefield.

Check out for more news and information on Drones in Science Times.

Join the Discussion

Recommended Stories

Real Time Analytics