
Collecting All User Information Without 'Tokenization'
In particular, there was controversy during the initial launch of DeepSeek over the collection of keyboard input patterns, but this was excluded from the collection targets as of the 14th. Keyboard input patterns are classified as sensitive information because they can not only identify individual users but also be used to infer important information entered by users, such as passwords.
This is different from the data collection practices of other AI companies. OpenAI's ChatGPT also collects user-inputted text, uploaded files, and device identifiers. However, it goes through a 'tokenization' process to ensure that the source of the information cannot be identified. 'Tokenization' is a method that allows only the necessary information to be used safely. For example, it is similar to how only the last four digits of a card number are shown on a convenience store receipt instead of the entire number. Through this process, user-identifying information is removed, and only the inputted data remains to be used for service improvement.
An official from the AI industry pointed out DeepSeek's privacy policy, stating, "You can assume that all digital information you enter is transferred," and explained, "In advanced countries such as the United States, regulations on the use of specific individuals' data are clearly stipulated in privacy protection laws, making collection and use impossible."

No Specified Data Retention Period, No Right to Refuse
The period for which the collected data is stored is also unclear. DeepSeek's privacy policy only states that information will be retained 'for the period necessary to provide the service.' The deletion deadline for collected data is also ambiguous, described only as 'when it is no longer needed.' In contrast, ChatGPT and CLOVA X set the retention period for user-inputted data to a maximum of 30 days. Gemini allows users to set the data retention period themselves, and CLOVA X provides a function for users to directly delete their inputted data.

1.21 Million Users' Personal Information Exposed... Stored on Servers in China
The Personal Information Protection Commission has already confirmed that DeepSeek user information has been transferred to ByteDance, the parent company of the Chinese social networking service TikTok. Jang Dongin, a lead professor at the Korea Advanced Institute of Science and Technology (KAIST) AI Graduate School, also commented, "DeepSeek has notified users that it will use the collected personal information at its discretion, so caution is needed."
Although the Personal Information Protection Commission has suspended new app downloads for DeepSeek, controversy is expected over the effectiveness of this measure. Users who have already downloaded the DeepSeek app can continue to use it, and the DeepSeek web page is not subject to this measure. According to the app analytics service WiseApp Retail, the number of DeepSeek app users in the fourth week of last month was 1.21 million, the second highest among generative AI apps after ChatGPT (4.93 million) during the same period. A representative from the Personal Information Protection Commission stated, "For users who have already downloaded and are using the DeepSeek app, there are no special measures that the service provider can take, so each user must be cautious when entering personal information," adding, "Due to the nature of the internet, it is not easy to block access."