Baidu recently launched another big move, releasing two new models at once – Wenxin 4.5 and Inference Model X1 .
What’s even more surprising is that these two models were originally planned to be open to the public for free on April 1st, but now everyone can experience them for free.

Wenxin 4.5 is a multimodal model that can comprehensively interpret text, pictures, videos, audio and other content, and its capabilities far exceed OpenAI’s GPT-4o.
Like DeepSeek-R1, X1 has the ability to think deeply , understand, plan, reflect, and evolve the questions raised, and also supports multimodality.
What’s even more amazing is that X1 is also the first to automatically call upon many special tools such as advanced search, document Q&A, AI drawing, code interpreter, web link reading, Baidu academic search, etc.
think.
I have to say that Wen Xin Yi Yan’s development is really dramatic.
Two years ago, it came out of nowhere. It was the first general large model in our country . Everyone was so excited at that time, as if we saw the Chinese version of ChatGPT.
But the good times did not last long. As more and more “opponents” came in, Wen Xin Yiyan began to reveal some problems.
First, the product positioning is a bit shaky. One moment they say they want to enhance search, the next moment they rush into creative creation. Their core competitiveness is becoming increasingly blurred. Of course, this is a common pain point for most model companies.
Another thing is that the charging strategy is a bit too hasty. This move made many users feel that the cost-effectiveness was not high, and they chose to wait and see or simply turn to other platforms.
It was not until DeepSeek became popular that people realized that the model itself could become a core product to attract users, and there was no need to rush to come up with other tricks.
But having said that, from my experience, the update of Wenxin YiyanX and version 4.5 has indeed made some significant progress, and it should be an effort to catch up with the gap that has fallen behind other large models in the past two years.
Looking at the entire domestic market, with the continuous iteration and upgrading of products such as DeepSeek, Doubao, and Kimi, as well as Wenxin Yiyan’s efforts this time, a healthy competitive situation is forming in the domestic large model field, which is quite interesting.
Ai+.
Today I will take you to evaluate two models of Baidu.
I put the URL here: https://yiyan.baidu.com/
Wenxin-4.5
Let’s take a look at Wenxin 4.5 first. I sent it a picture to see if it can recognize it.

As a result, it actually recognized it as a Tang Dynasty animal-head agate cup, a cultural relic with rich cultural heritage.
Then I changed the meme picture to see if it could understand the joke.

I didn’t expect that it could basically get the meaning. Its ability to understand pictures is really good.
Then I increased the difficulty and sent a picture without words to test Wenxin 4.5.

As a result, Wenxin 4.5 directly answered that it was “The Shawshank Redemption”, which was completely correct.
It can be said that the combination of Baidu’s index data and multimodal models is really powerful . And it is multimodal, not just limited to pictures, but can also fully recognize audio and video.
What surprised me even more was that Wenxin 4.5 can also generate continuous, multi-scene pictures.
For example, I took a photo of my friend and turned him into Iron Man.

The effect is really cool.
Now let’s talk about writing skills. Wenxin 4.5 performs quite well when writing relatively rigid and templated content.
But if I were to write a story, my writing skills would need to be improved. Sometimes I feel that the stories it writes are a bit “stiff” and not refined enough.
Reasoning model X1
Next, we continue to test another core model of Baidu – Wenxin X1
Let X1 rewrite the story:
The things written by Wenxin X1 seem to have their own style, not as stiff as before. It’s quite interesting to read, and feels a bit like a novel.
Let me try something else. I will ask it to comment in a sarcastic tone on the refurbished sanitary napkin incident exposed at this year’s CCTV 315 Gala. In order to keep up with real-time hot spots, I checked the online search.

Let’s first look at Wenxin X1’s thought process. Judging from its thinking path, the logic is very clear and it is real “thinking”.

Finally, look at its answer:

This mouth is really as bad as DeepSeekR1. It seems that Baidu has put a lot of effort this time.
Since it is a reasoning model, its logical reasoning ability must be tested carefully.
The test question is still the classic ball collision code question, which not only tests the model’s understanding of physics, but also involves mathematical calculation and programming capabilities.
I have already tested Grok3, DeepSeek and ChatGPT for this task in previous articles. The results vary. Interested friends can go and have a look: 👉Who is the strongest AI?! Testing Grok3, deepseek, and ChatGPT, the results of the four dimensions are unexpected
Prompt words : Write a piece of HTML code. There is a regular hexagon in the middle of the webpage. There is a particle in the hexagon with an initial velocity. It bounces back when it touches the boundary of the hexagon. Every time it touches the boundary, the boundary changes color randomly.
Let’s take a look at the performance of Wenxin X1 this time:
First of all, I waited for 3 minutes during the thinking process, which was a bit slow. Secondly, the running effect was not very good, and the ball could only hit the same two sides.

This shows that in terms of logical reasoning, Wenxin X1 may indeed have some shortcomings. At least from this test, it is still a certain distance away from the industry’s top reasoning models.
But I think the tool calling capabilities of Wenxin X1 are really eye-catching.
Let me tell you an example that shocked me. I asked it to help polish a novel, and then it generated a doc document with the modified content and delivered it to me.
Let’s look at its operation: first it calls the Document Q&A tool, then it uses the Code Interpreter tool, for a total of three tool calls.

After more than a minute of revision, it handed me a neat and tidy doc document.
This may be the first deep thinking model in the industry that supports autonomous tool calling. The reasoning ability plus networking function, plus powerful tool calling ability, is really amazing.
Moreover, its API price is very cheap, half the price of DeepSeek’s R1 in terms of both input and output.
Overall, Wenxin X1’s performance this time really impressed me.
Three sentences.
Let me share this today. Finally, I will summarize it in three sentences:
1. Wenxin 4.5 is a multimodal model that can understand text, pictures, videos, audio and other things, and has a good interpretation ability.
2. Wenxin X1 still has some shortcomings in logical reasoning and there is still a gap between it and the industry’s top reasoning models.
3. The tool calling capability of Wenxin X1 is really eye-catching.