Altis is the world’s first AI personal trainer.
Altis sees you, understands you, and personally instructs you in real-time from a sleek soundbar-size console that connects to any screen on your home. The future of fitness is finally here.


THE OPPORTUNITY
In the new post-pandemic, work from anywhere world, home fitness is booming. Peloton is a famous growth leader, with paid digital subscribers increasing 176% y/y in their last quarter. Apple recently announced a subscription fitness service providing guided classes via their Apple TV product and incorporating input from their Apple Watch sensors.
Altis aims to go one step further by incorporating AI vision technology using multiple camera sensors to accurately track and analyze movement and provide guided fitness instruction and form coaching for fitness and sports applications.
Beyond just pose estimation and tracking, the proposed AI system needed to be able to identify precise exercises, identify errors in those exercises vs. ideal form, support multiple cameras and work in real time.
In order to control costs and maximize asset utilization, Altis needed to implement pipelines and MLOps on their existing on-premise GPU servers, but maintain the ability to scale in various geographies on the cloud of their choice.
The Challenge
The specifics of Altis’ proposed use cases meant they needed a system that could accurately detect a large number of exercises, including exercises using popular gym equipment, weights, machines, etc.
Altis needed a solution that is suitable for both hardware and cloud-based inference, and could scale globally without being tied to a single cloud provider. They also wanted to utilize their existing on-premise GPU resources for development, experimentation, training, monitoring and future iterations.
The Altis console provides personalized workouts and interactive coaching using advanced AI and computer vision to help you reach your fitness goals, understand how your body is moving during exercise and improve form and movement performance.
The Solution
Neu.ro researchers developed a custom data pipeline that employs volumetric triangulation from one or more RGB cameras, combines 2D backbone data into volumetric aggregation of an intermediate 2D feature map, followed by refinement via 3D convolutions into a 3D heatmap and pose model.
The system runs in real-time on Nvidia 3090 hardware and runs at 75 FPS while pose tracking, but has a startup mode of 30 FPS that runs until a person is detected.
This volumetric model is able to estimate 3D human pose using any number of cameras, even using only 1 camera. In single-view setup we get results comparable with current state of the art, while multiple sensors result in faster processing and fewer errors from potential occlusions.