Conversation with Li Kaifu: With one million yuan from Zero One, it’s enough to do pre-training. I won’t lose money in the B2B business.

Blog 2024-10-18 15:48

Li Kaifu, who just returned from the United States, has figured out many things and his large-scale company, Zero One Thousand Objects, has also begun to undergo changes.

At a rare communication meeting on October 16th, Li Kaifu "responded to everything." He mentioned that he recently realized that only models with a good foundation can achieve PMF (Product-Market Fit), and in order to activate a healthy generative AI ecosystem and help improve the profitability of their AI products, they need world-class models at a low cost. At the online release event on that day, Zero One released a new model called Yi-Lightning, which is Li Kaifu's model that best fits this standard.

Currently, the 10,000-object model series is divided into two categories: open source and closed source. Open source models such as Yi-9B, Yi-1.5, and Yi-Coder are more effective in building brand awareness. Among the closed source models, the Yi-Large model with billions of parameters is the most important flagship model of the 10,000-object, but the Yi-Lightning model will replace its position this time.

On the global large-scale model benchmark LMSYS run by the University of California, Berkeley, Yi-Lightning ranks sixth overall, surpassing well-known Silicon Valley models like OpenAI GPT-4o and Anthropic Claude 3.5 Sonnet. It is the top-ranking model from China, marking the first time that a Chinese large-scale model has achieved better performance than OpenAI GPT-4o on the public platform of LMSYS.

Based on the evaluation results, Yi-Lightning has relatively excellent model performance and has shown breakthrough abilities in terms of inference speed and inference cost, compared to the previous flagship model Yi-Large. The highest generation speed of Yi-Lightning has increased by nearly 40%. Li Kaifu mentioned that this model, with a cost reduced to 0.99 yuan per million tokens, will bring "tens of percent profit" to Zero to Ten. "We won't lose money in sales, but we won't make too much either." Li Kaifu further stated, "To enable APP usage, we must make high-performance models at bargain prices." He revealed that Yi-Lightning has already received orders from several large customers.

In addition to this, Lingyi announced that it is shifting from a consumer-facing (C-end) model to a business-facing (B-end) model in the domestic market. On that day, Lingyi released a set of digital human solutions that focus on retail and e-commerce scenarios, applying its flagship model, Yi Lightning, to specific industries.

And five months ago, the focus of Li Jiaxuan's company was still on the C-end AI office assistant, Wanzhi. At that time, Li Kaifu emphasized that he would not engage in losing money in B-end business.

According to the information from 01World, the internal strategy for the second half of the year is that they will focus on ToB domestically and on ToC internationally. Currently, they have incubated a new ToB product matrix internally, which is about to make a centralized offline appearance soon.

The domestic B2B market mainly charges based on the volume of API interface calls and customized software and hardware delivery projects, while the C-end market mainly charges through subscription models. Li Kaifu emphasized at the press conference that zero one's focus on B2B is still not a loss-making B2B.

"The B2B approach of Zero One Myriad is to create profitable solutions, not just selling models or working on projects," said Li Kaifu. He further stated that due to genetic factors, it is very challenging for a large modeling company to work on both B-side and C-side, requiring a diversified management approach and differentiated measurement methods. "Zero One Myriad chooses to focus its B2B operations in China, seeking breakthroughs, such as using digital technology for retail and catering, in order to provide a complete solution and work with local suppliers."

So what about toC, what about toC, we are not promoting it in China for now, still operating overseas - "We choose to focus on overseas for toC products, if we rely on traffic in China, the cost will be too high," said Li Kaifu, and they will continue to maintain the operation of Wanzhi, but will put more effort into overseas markets afterwards. "We aim to acquire high-quality users at a lower price, or directly sell the app to allow foreign users to subscribe and pay."

Regarding the rumors that some top big model startups have paused their pre-training of basic models, Li Kaifu also responded. He believes that the cost of pre-training a model is around $3 million, which should not be an issue for top startups. However, "creating a good pre-training model is a challenging task that requires the collaboration of many talented individuals." If a company is lucky enough to have these talents and engages in interdisciplinary cooperation, China will have the opportunity to create a pre-training model that ranks in the top ten in the world.

In the current environment where large-scale model startups face a slowdown in model capability growth and concerns about monetization, 01AI persists in launching basic models and targeting the B-end market, which is considered a classic strategy. Based on Li Kaifu's statement, the intensified difficulty in financing and scarcity of resources have not impacted the company, and the six-month gap in models compared to OpenAI does not seem to cause anxiety. 01AI appears to be sticking to its strategy of seeking new growth points in the fiercely competitive market through innovative use of basic technology, showing a deviation from the path taken by other large model startups.

"Li Kaifu said, 'If you expect a breakthrough, you may need an unprecedented algorithm to have a chance.'"

The following is the Q&A session of the press conference, with some modifications for brevity.

Question: It was previously reported that some of the "Six Tigers" in the field of AI had abandoned pre-training, Kai-Fu Lee once clarified the rumors. However, from an industry perspective, could you evaluate whether gradually abandoning pre-training models will be a trend in the entire industry?

Li Kaifu: Building a good pre-training model is a technical task that requires a lot of talented people working together. It is a slow and meticulous process, involving individuals who understand chips, reasoning, basic architecture, modeling, and have strong algorithmic knowledge in order to successfully create it.

If a company is fortunate enough to have so many outstanding talents who can cooperate across domains, then I believe China is definitely capable of creating a pre-trained universal model that ranks among the top ten in the world. However, not every company can do this, as the cost is relatively high. It is possible that in the future, there will be fewer large-model companies training for pre-training. As far as I know, these six companies have sufficient financing. For a production run of pre-training, it costs around three to four million US dollars each time, a cost that top companies can afford. I think as long as China's six major model companies have enough talented individuals and the determination to do pre-training, financing and chip availability will not be issues.

Question: After the release of OpenAI's GPT-3, many people believe that it will bring new paradigms from a technological standpoint. What impacts will this have on startup companies?

Kaifu Li: I just returned from the United States, had a conversation with people from OpenAI, they analyzed that there are good things inside the company. However, they are not in a hurry to reveal them because they are far ahead in the industry. They will release them at a certain business node. This is what they can do that others can't. Although OpenAI o1 hides all intermediate thinking states, many people start to guess how it works online. We think some speculations are more reliable. When you launch a new technology, it will definitely be speculated by many intelligent people. I believe that in five months, there will be quite a few capabilities similar to the o1 model emerging in various modeling companies, including Zero Zero Wan Wu.

The mindset of O1 is to expand the trend of scaling that was only seen in pre-training to the inference stage, which is the biggest cognitive change for the industry. In the past, people thought that whoever did pre-training well was enough, but gradually everyone realized that both fine-tuning after pre-training (SFT) and reinforcement training are very important. Therefore, the team of O1 at the beginning mainly focused on pre-training, and later many talented people joined us to help us develop Posttrain (fine-tuning) as well. Now it seems that inference is also very important. A year and a half ago, everyone thought that the most powerful aspect of large models was pre-training, but a year later, they discovered that fine-tuning was equally important. We thank OpenAI for awakening us to this point. Now I believe that many companies in China and the United States are rushing towards the direction of O1.

Question: Kai-Fu Lee previously said "not to do loss-making to B", but this time he first published the relevant matrix. Does this mean that we will also try to further cultivate in the direction of to B?

Li Kaifu: We particularly value providing value to each user, so we will not say we have a model, what you want to use it for, I'll sell it to you, you pay first, as this will not satisfy the user. Another common practice is to go to an enterprise, the enterprise says they want to do customer service, sell me the model, I don't know how to do customer service, you help me do it. This becomes like the system integration-type AI of the AI 1.0 era, that is, selling the model to you, first help you to build the customer service application well. In such a situation, it is difficult to make a profit. As I mentioned before, if doing one deal will lose one deal in the B2B context, even if ten million, rather we won't do it. This principle has not changed.

We have just released the AI 2.0 digital human solution, which does not operate on a one-to-one basis for compensation, because it focuses on the significant pain points and profit points of users. For instance, a store manager or Key Opinion Leader (KOL) may waste the most important resource, their time, when conducting a live broadcast. Even if doing a one-hour live stream can earn them 1,000 yuan, using a digital human for the broadcast can extend beyond just one hour. It may be possible to conduct a thousand hours of live stream, even if each hour only generates half the income, it can still result in five hundred times more earnings overall. This makes it a very lucrative opportunity. If digital humans can fully cover end-to-end processes, by inputting internal company information, selecting an image and voice, and simply pressing a button to conduct hundreds or thousands of live streams, it's like selling a money-printing machine to the company. Renting out such a machine could be a feasible business model. Apart from live streaming, our AI 2.0 digital human solution has been successfully implemented in various other business scenarios, such as AI companions, IP characters, office meetings, and more. We will continue to execute our integrated model and strategy, combining Yi-Lightning model capabilities with digital human solutions, constantly iterating on products, and unlocking more business scenarios in the future.

Regarding the issue of SaaS in China, it is difficult to achieve success with SaaS in China at the moment. While the pricing and business models have been successful in the United States, China still faces significant challenges. However, there are some industries where SaaS can work well. SaaS can be charged based on usage or through subscriptions, with monthly fees or revenue sharing. These models can be collectively regarded as better business models because they entail ongoing revenue, unlike one-off sales where a company completes a project for you and receives payment, but may not have further revenue opportunities. The models discussed earlier, whether revenue sharing or subscription-based SaaS models, are sustainable business models. Currently, we do not see a universally accepted SaaS model in existence. Therefore, in China, the strategy for large-scale B2B companies is different from the AI 1.0 era. The primary task is to find ways to charge based on usage, rather than custom projects, and then focus on obtaining orders with higher profit margins.

Overall, the overall solution provided by Zero-One-Ten for B will adopt a "horizontal and vertical" strategy. Compared to Yi-Large, Yi-lightning has significantly improved model performance. As the foundation model for international SOTA, they themselves have excellent generalization capabilities. In addition, Zero-One-Ten itself possesses strong capabilities in SFT (supervised learning). These technical capabilities enable our team to conduct in-depth exploration of individual industries first and then, based on our own technical capabilities and industry accumulation, refine standardized solutions for B. This will help reduce costs and improve efficiency for enterprises in various industries, utilizing the world's top-tier large models effectively and truly bringing business growth and competitiveness to enterprises.

Question: Last time in May, Yi-Large shortened the time gap of cutting-edge models between China and the United States to six months. This time, our release directly defeated GPT4o, even shortening the time gap to five months. For small startup companies in China starting from scratch, what unique features should pre-training models have in order to continue catching up and reducing the time gap?

Li Kaifu: It is very difficult to shorten the time difference. I do not predict that we can shorten it. After all, others used 100,000 GPUs for training, while we used only 2,000 GPUs for training. The reason our time difference can be minimized is because our model, AI infrastructure, and other teams are enthusiastic and smart in utilizing and understanding what others have produced. In addition, each of our research and development areas has its own strengths, such as data processing, training optimization, etc. This methodology is now mature in the world of AI. We are confident that by incorporating our innovations and strengths, along with paying attention to the new technologies released by OpenAI and other companies, we can leverage their capabilities in our products. I think maintaining this methodology for around six months would already yield very good results.

If we are looking forward to a breakthrough, it may require an unprecedented algorithm to have a chance. We must not think it is shameful to be six months behind, or feel the need to catch up, because many American friends believe that China will lag far behind. Some American friends even think that with hundreds of thousands of GPUs, we might be left behind by three, five, or even ten years. Now, starting from scratch to demonstrate, we will not fall behind so much. In this LMSYS ranking, there are two other Chinese companies performing well apart from us, we are not the only one doing it. Therefore, for companies with hardworking, diligent, and intelligent diversified teams in the country, it is possible to adopt a similar approach like starting from scratch, to get closer to the top American companies without falling behind more than six months. I think it is possible, and not just for us, but the difficulty level is high. It is very challenging to reduce it further unless there is a real breakthrough in invention and technology.

Question: Company Zero One has launched products targeting consumers (to C) in overseas markets, while gradually introducing products for businesses (to B) in the domestic market. Given this background, how should we view the boundaries between to B and to C products? Since last year, Zero One has been trying to commercialize its technology products, what are the advantages especially in the B and C areas? Are there any reuse scenarios or capabilities between the two?

Please note that 'B端' is referring to products for businesses (to B), and 'C端' means products for consumers (to C).

Li Kaifu: It's quite challenging for a big enterprise to simultaneously engage in both B2B and B2C businesses. It requires a diversified management approach because the two teams have different characteristics, methods of operation, and ways of measuring KPIs. I have experience in both areas and am trying to manage them, but it is not feasible to do everything. Therefore, in the B2B sector, we focus on the domestic market because we have identified some breakthrough opportunities, such as using digital technology in retail and dining, which can provide a comprehensive solution. Besides, we are also exploring two or three other fields, but it's not convenient to disclose them at the moment.

We believe that working this way for B-to-B can only be done in China, because it is unlikely to reach users in the United States or other foreign countries. Therefore, in terms of the global scope, most B-to-B suppliers are local. Even if you want to buy SAP products in China, it will be SAP China selling to you. Therefore, establishing branches internationally to do B-to-B is definitely not something that we or other startup companies can do. So we are giving up on foreign B-to-B and focusing on domestic B-to-B, focusing on profitable solutions rather than just selling models or doing project-based work. This is our approach to B-to-B.

As for C-to-C, our main focus is overseas for several reasons. The first reason is that when we started doing Internet of Things, there were no suitable Chinese models available domestically. Therefore, we had to try overseas first. After trying for a period of time, we gained insights and iterated on one, two, three products. Some of these products are performing well, while others are not. We are constantly adjusting and refining them.

We are also observing in China regarding when and what kind of products are suitable to be launched domestically. The major issue of traffic for the to C products in China is that the cost of traffic is increasing. We have seen some competitors raising their single user costs from around ten RMB to over thirty RMB, leading to considerable user loss recently. In such a challenging environment, we will be very cautious and not launch new to C applications in China for now. Our existing products will continue to be maintained, but more efforts will be focused on acquiring high-quality users at a lower cost in foreign markets, or even selling the app directly and letting users subscribe for fees, as the subscription habit is relatively well-established abroad for these several reasons. Currently, the biggest reason for focusing on overseas to C products is that we can balance our monetization capabilities and the cost of user growth. We will consider opportunities to introduce products in the domestic market in the future.

Is there any similarities between the two? Quite a few. Firstly, high-quality and fast models are needed on both sides, which we have. In addition, we will need to use various pre-training and post-training techniques, which will be used on both sides. Both an application for to C and an application method for a digital person require similar techniques, which we have accumulated. When we finally produce products, such as the various functions needed in a to B product, and also the functions needed for to C can be shared. For example, AI reading, AI writing, AI PPT, AI search are modules that many to B and to C applications need. Today, looking at traditional software, there are many similarities in the underlying technology, like the platform APIs provided in Windows, which are shared by to B and to C. We are also accumulating these shared areas.

Question: Will your newly released flagship model being priced at cabbage prices cause you to lose money?

Li Kaifu: The pricing of Yi-Lightning's products is not at a loss at zero.

From the first day of its establishment, Zero-One World simultaneously launched three major teams: model training, AI Infra, and AI applications. Once these three teams matured, they were integrated together. The summarized model of Zero-One World consists of two major strategies: model-base co-construction and model-application integration. The AI Infra capabilities help the model training and inference, enabling the training of leading performance models at lower costs, and supporting application-level exploration at lower inference costs. Outstanding model performance and low inference costs not only support Zero-One World in exploring excellent to-B application scenarios but also make the large model to-B solutions launched by Zero-One World more cost-effective.

When responding to the industry price war previously, my answer was that ZERO1 does not participate. Also, at that time, I mentioned that we should not only consider the price of models, but also whether the model performance is good enough.

At that time, many models with poor performance had their prices reduced to very low, even free. I believe that enterprises and individuals who chose to integrate such model APIs at that time did not achieve the expected results.

To access the API, having good model performance is very important, otherwise the product cannot achieve PMF. Another important point is to reduce the price of high-performance models to a very low level. The price of 0.99 RMB per one million tokens is very cheap, but if each user in an application makes dozens of requests per day, the cumulative cost over a year cannot be ignored. Zero One Inc. is also developing an app, and cost control is necessary for app development. We will not sell models at a loss, but we will not make a lot of profit either. Instead, we will add a small profit margin on top of the cost, which results in the current price of 0.99 RMB per one million tokens.

The most important point in selecting a model API is that the model performance must be excellent. Based on this premise, choose the most affordable option, calculate how much the user call volume would be in reality, and see if the cost adds up. I believe that considering the overall quality and price of the Yi-Lightning model, Yi-Lightning is likely to be the most recognized and cost-effective model for many developers.

Question: Is the TO B solution matrix released this time complete? Will there be any other TO B solutions announced in the near future?

Li Kaifu: In addition to the AI 2.0 digital personality and APIs we have already released, Zero One Mobile currently has AI infra solutions, customized models for private deployments, and other B2B services. We will officially release these in the near future. Stay tuned.

Conversation with Li Kaifu: With one million yuan from Zero One, it’s enough to do pre-training. I won’t lose money in the B2B business.

Leave a Reply Cancel reply