Syntekabio's '3bm GPT' Protein Binding Analysis LLM Technology Published in International SCI Journal View original image

AI drug discovery company Syntekabio (CEO Jung Jongseon) announced on November 24 that its research on the large language model (LLM)-based protein-small molecule binding analysis model, "3bm GPT," has been published in the 30th anniversary special issue (Recent Advances in Computational and Theoretical Chemistry) of the international SCI journal "MDPI Molecules," based in Switzerland.


This paper has attracted attention in academia for presenting a novel approach that converts complex three-dimensional protein-ligand binding information into language form and trains large language models (LLMs) such as ChatGPT with it.


LLMs are AI technologies that learn from vast amounts of text data to understand patterns and meanings in language. While they have recently become popularized through services like ChatGPT, applications to life science data-especially three-dimensional molecular binding information-are rare. Notably, this technology is set to be implemented as a cognitive AI module within Syntekabio's AI drug discovery platform "DeepMatcher." Therefore, the publication in this journal is significant as it serves as scientific validation of a real-world technology.


The research team expressed the interactions between proteins and small molecules as "interaction words" and, using a Coulomb law algorithm, generated about 60 million binding interaction sentences that included electronic and spatial bond characteristics. These were then used to train a GPT-2 architecture-based LLM, resulting in the construction of the 3bmGPT model. This represents a new attempt to integrate the language comprehension capabilities of LLMs with life science data, in a way completely different from conventional structure-based analyses or physics-based simulations.


Syntekabio's model demonstrated the ability to capture the properties of actual protein-ligand binding using only language-based data. The research team analyzed the logit vectors generated by the GPT model to determine how well the language-reflected binding characteristics matched actual structural features. For key protein families such as EGFR and ABL1, the GPT was able to independently distinguish the binding patterns shared by each family. This is being evaluated as experimental evidence that "even one-dimensional language-encoded information can capture the biological meaning of three-dimensional binding."


In particular, the fact that the linguistic expressions (logits) generated by the LLM correspond to a significant extent with the structural features of actual protein binding is a new result that has rarely been addressed in previous AI drug discovery research. This demonstrates that large language models could become a new analytical axis in drug development.


Jung Jongseon, CEO of Syntekabio, stated, "This research is significant in that it officially validates, through an international journal, a new technology that converts protein-ligand binding into language and trains large language models (LLMs) with it," adding, "The key point is that we have demonstrated superior performance in direct comparison with existing structure-based tools."


He continued, "Through logit analysis, we have experimentally proven that language-based models can quantitatively reproduce 3D binding characteristics," and said, "Based on this academic validation, we will apply it to practical drug discovery projects."


It is noteworthy that the core technology of 3bm GPT is already scheduled to be implemented in Syntekabio's AI drug discovery platform, and this publication in an international journal serves as official scientific validation of its practical utility. In addition, the company obtained a domestic patent for the related technology last year and has also filed a patent application in the United States, which is currently under review.



This research was jointly conducted by Syntekabio's research team and Dr. Bin Claire Zhang of AI computing company Cerebras Systems. For transparency, Syntekabio has released the pretrained model and code through Zenodo and GitHub.


This content was produced with the assistance of AI translation services.

© The Asia Business Daily(www.asiae.co.kr). All rights reserved.

Today’s Briefing