Interview with Lee Jaesung, Founder of Twelve Labs
Development of Video Search and Summarization Models 'Marengo' and 'Pegasus'
Over 90% of Clients Based in North America
Expanding Customer Base Through AWS Bedrock
"Preparing to Enter the Euro

"Our goal is to index more than 80% of the world’s video data with our model, 'Marengo', within the next four to five years. It will become the essential 'video intelligence layer' that every AI agent must pass through when utilizing video."


Lee Jaesung, CEO of Twelve Labs, introduced the company's vision during an interview at the Amazon Web Services (AWS) re:Invent event held in Las Vegas, USA, on December 1 (local time). He expressed his ambition to create a model that is indispensable for handling video with artificial intelligence (AI). Twelve Labs plans to launch its AI agent service for video in the first quarter of next year.


Lee Jaesung, CEO of Twelve Labs, poses during an interview held on the 1st (local time) at The Venetian Hotel in Las Vegas, USA. Photo by Lee Myunghwan

Lee Jaesung, CEO of Twelve Labs, poses during an interview held on the 1st (local time) at The Venetian Hotel in Las Vegas, USA. Photo by Lee Myunghwan

View original image

Founded by Lee in 2021, Twelve Labs is developing a Video Foundation Model (VFM) based on multimodal AI technology. The company currently offers AI models such as 'Marengo', which analyzes (indexes) text, image, and audio information within videos to help search for specific scenes, and 'Pegasus', which generates summaries or performs Q&A about video content. Since its founding, the company has raised a cumulative investment of approximately 107 million dollars (about 150 billion won). Global big tech companies such as Nvidia and Intel, as well as leading Korean companies like Naver and SK Telecom, have invested in Twelve Labs.


Twelve Labs is preparing to launch a 'video agent' in the first quarter of next year, which will go beyond simple search and indexing to enable AI to autonomously analyze and edit videos. For example, if a user requests the agent to "create a highlight reel of Son Heungmin's key plays this season," the AI agent will search for the main goal scenes and independently handle video editing and generation.


The distinguishing feature of Twelve Labs' model compared to existing big tech video understanding models is that it comprehends the video as it is. Conventional models convert videos into still images for each frame (the smallest unit of video) and then learn from the text, making it impossible to understand the context of the video. For example, if a video of a sunset is processed by conventional models, they cannot distinguish between sunrise and sunset. Lee emphasized, "The traditional approach loses the context of time and space, but Marengo is a native model that learns the video as a whole," adding, "It precisely understands the flow of time and causal relationships."


On this day, Twelve Labs officially launched its latest model, 'Marengo 3.0'. According to the company, this model is the most powerful version they have released so far. It goes beyond simply viewing videos and is capable of integrally understanding contextual elements such as text, speech, and movement within scenes. The model comprehensively understands information such as the situation in the video, the movements of people, and the weather, organizing it in advance to facilitate easy searching.


Marengo 3.0 has doubled the video indexing speed compared to previous models and cut storage costs for video in half. Lee highlighted, "Compared to previous models, Marengo 3.0 is particularly enhanced for sports," adding, "It can understand specific movements of athletes and sports terminology, making it highly useful for creating highlight videos."


Marengo Grows Rapidly through AWS’s Generative AI Platform 'Amazon Bedrock'

Marengo has achieved rapid growth through 'Amazon Bedrock', AWS’s generative AI platform. Bedrock is a service that supports building generative AI services or agents, allowing users to utilize various AI models in one place. Lee explained, "It has been five months since we launched Marengo on Bedrock at the end of July, and we have secured about 30,000 client companies," adding, "Most of these clients are AWS enterprise customers utilizing Marengo through Bedrock."


The range of client industries is also broad. According to Twelve Labs, more than 90% of its total revenue currently comes from the North American market. The company has secured clients in industries such as sports and entertainment, which handle video content, as well as in government and security sectors. In Korea, Sejong City is using Twelve Labs’ model for CCTV analysis.


The company is also accelerating its global expansion. In addition to its main markets, the United States and Korea, it is targeting the European market. With major sports leagues such as the English Premier League (EPL) and Formula One (F1) based in Europe, the company sees ample demand. It has already established a UK branch and is hiring local staff.


Lee’s ambition is to capture the demand of enterprise clients who handle massive amounts of video through an AI model capable of understanding video content itself. He stressed, "Ninety percent of the world’s data is in video format, but it is nearly impossible to search," adding, "Our goal is to help companies rapidly index and search their vast video assets to create new value."



Jae Sung Lee, CEO of Twelve Labs, is being interviewed by The Asia Business Daily on the 1st (local time) at the Venetian Hotel in Las Vegas, USA. Photo by Myunghwan Lee

Jae Sung Lee, CEO of Twelve Labs, is being interviewed by The Asia Business Daily on the 1st (local time) at the Venetian Hotel in Las Vegas, USA. Photo by Myunghwan Lee

View original image


This content was produced with the assistance of AI translation services.

© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Today’s Briefing