Mimir + Coactive: multimodal AI search for your entire archive

Mimir’s integration with CoActive brings agentic, multimodal video intelligence directly into the Mimir interface. Editors and journalists can search, discover, and assemble video moments using natural language across their entire archive without relying on manually tagged metadata.

News organizations and broadcasters often have decades of content, but traditional asset management relies on manual metadata tagging, which is expensive to maintain and never complete. Mimir customers are used to automated, highly granular metadata tagging. With Coactive, they gain an AI-powered search layer that understands intent, not just keywords.

How it works

Coactive's agentic search uses an LLM to interpret what a user is actually looking for and dynamically orchestrates multiple search systems in parallel. A single natural-language query can simultaneously search across visual scenes, spoken dialogue (both semantic transcript matching and exact phrase search), environmental audio and sounds, on-screen text, recognized faces, and structured metadata like timestamps, locations, and events. The system automatically routes each query to the right combination of search modalities, so users never have to specify how to search. They just describe what they want to find.

Inside Mimir, that looks like typing:

"Find me where the spokesperson says 'we're responding to the situation' and include reporter questions."

"Find the story about 1,000+ people dressed as Marilyn Monroe in Palm Springs. I need the best wide shots and crowd reactions."

"Moments where the crowd is cheering but no one is speaking."

“Self-driving Tesla car on the road, autopilot footage”

Each of these queries triggers a different combination of search systems behind the scenes. The crowd-cheering example uses Coactive's audio sound search, powered by a 527-class audio recognition model, to find specific sounds within video independently of what is visible on screen. The spokesperson query combines transcript search with visual scene retrieval. Celebrity queries can optionally leverage Coactive's face recognition to surface enrolled individuals across the archive.

Results surface as specific moments within assets, with timestamps, so editors can review and select the right section of footage rather than scrubbing through full files. Users can also refine results with negative search (for example, "cars but not white cars"), and the system preserves those constraints across follow-up queries and drill-downs.

From discovery to assembly

The integration also supports production workflows. Users can go from discovery to assembly by selecting moments and generating a Mimir editing timeline. Selected clips can be turned into a Mimir Cutter sequence, refined in Mimir, or opened in professional editing tools such as Adobe Premiere Pro and DaVinci Resolve.

This is particularly valuable for fast-turnaround work: archive pulls, storm coverage recaps, highlight reels, social cutdowns, and any production where finding the right moment is the bottleneck.

Look interesting? Read more about Coactive here: https://coactive.ai