百度最近又有大動作,一次發布兩款新模型——文心4.5和推理模型X1 。
更令人驚訝的是,這兩款車型原本計劃在4月1日免費對外開放,但現在卻免費向所有人開放體驗。

文心4.5是多模態模型,可以全面解讀文字、圖片、影片、音訊等內容,能力遠超OpenAI的GPT-4o。
與DeepSeek-R1一樣,X1具有深入思考、理解、規劃、反思和發展所提出的問題的能力,並且還支持多模態性。
更令人驚訝的是,X1也率先實現了進階搜尋、文件問答、AI繪圖、程式碼解釋器、網頁連結閱讀、百度學術搜尋等諸多特色工具的自動呼叫。
思考。
不得不說,溫馨一言的發展真是戲劇性。
兩年前,它突然出現。這是我國第一艘通用大型模型。當時大家都很興奮,彷彿看到了中文版的ChatGPT。
但好景不長。隨著越來越多的「對手」加入,溫馨一燕開始暴露出一些問題。
首先,產品定位有些不穩。他們一會兒說要增強搜索,一會兒又開始投入創意創作。他們的核心競爭力越來越模糊。當然,這也是大多數模型公司共同的痛點。
另外就是收費策略有點太倉促了。此舉讓不少用戶覺得性價比不高,選擇觀望,或乾脆轉向其他平台。
直到DeepSeek火了之後,人們才意識到模型本身可以成為吸引用戶的核心產品,沒必要急著想出其他花樣。
不過話說回來,以我的經驗來看,溫馨一言X和4.5版本的更新確實取得了一些明顯的進步,應該是為了彌補這兩年落後於其他大機型的差距而做出的努力。
放眼整個國內市場,隨著DeepSeek、豆寶、Kimi等產品的不斷迭代升級,以及溫馨一言此次的發力,國內大模型領域正在形成一個良性的競爭局面,頗具看點。
哎+。
今天就帶大家來評測百度的兩款車型。
I put the URL here: https://yiyan.baidu.com/
Wenxin-4.5
Let’s take a look at Wenxin 4.5 first. I sent it a picture to see if it can recognize it.

As a result, it actually recognized it as a Tang Dynasty animal-head agate cup, a cultural relic with rich cultural heritage.
Then I changed the meme picture to see if it could understand the joke.

I didn’t expect that it could basically get the meaning. Its ability to understand pictures is really good.
Then I increased the difficulty and sent a picture without words to test Wenxin 4.5.

As a result, Wenxin 4.5 directly answered that it was “The Shawshank Redemption”, which was completely correct.
It can be said that the combination of Baidu’s index data and multimodal models is really powerful . And it is multimodal, not just limited to pictures, but can also fully recognize audio and video.
What surprised me even more was that Wenxin 4.5 can also generate continuous, multi-scene pictures.
For example, I took a photo of my friend and turned him into Iron Man.

The effect is really cool.
Now let’s talk about writing skills. Wenxin 4.5 performs quite well when writing relatively rigid and templated content.
But if I were to write a story, my writing skills would need to be improved. Sometimes I feel that the stories it writes are a bit “stiff” and not refined enough.
Reasoning model X1
Next, we continue to test another core model of Baidu – Wenxin X1
Let X1 rewrite the story:
The things written by Wenxin X1 seem to have their own style, not as stiff as before. It’s quite interesting to read, and feels a bit like a novel.
Let me try something else. I will ask it to comment in a sarcastic tone on the refurbished sanitary napkin incident exposed at this year’s CCTV 315 Gala. In order to keep up with real-time hot spots, I checked the online search.

Let’s first look at Wenxin X1’s thought process. Judging from its thinking path, the logic is very clear and it is real “thinking”.

Finally, look at its answer:

This mouth is really as bad as DeepSeekR1. It seems that Baidu has put a lot of effort this time.
Since it is a reasoning model, its logical reasoning ability must be tested carefully.
The test question is still the classic ball collision code question, which not only tests the model’s understanding of physics, but also involves mathematical calculation and programming capabilities.
I have already tested Grok3, DeepSeek and ChatGPT for this task in previous articles. The results vary. Interested friends can go and have a look: 👉Who is the strongest AI?! Testing Grok3, deepseek, and ChatGPT, the results of the four dimensions are unexpected
提示詞:寫一段HTML程式碼。網頁中間有一個正六邊形。六邊形中有一個粒子,其初始速度為。當它觸及六邊形的邊界時就會反彈。每次觸及邊界時,邊界就會隨機改變顏色。
讓我們來看看這次穩信X1的表現如何:
首先,思考過程中我等了3分鐘,有點慢。其次,跑動效果不太好,球只能打到同樣的兩側。

由此可見,從邏輯推理上來說,溫馨X1可能確實存在一些不足之處。至少從本次測試來看,它距離業界頂尖的推理模型還有一段距離。
但我覺得文心X1的工具呼叫能力確實讓人眼睛一亮。
我講一個令我震驚的例子。我讓它幫忙潤飾一本小說,它就把修改後的內容產生一個doc文件寄給我。
讓我們來看看它的運作方式:先呼叫文件問答工具,然後呼叫程式碼解釋器工具,一共呼叫了三次工具。

經過一分多鐘的修改,它交給了我一份乾淨俐落的doc文件。
這可能是業界第一個支援自主工具呼叫的深度思考模型。推理能力加上連網功能,再加上強大的工具呼叫能力,真是讓人驚嘆。
而且它的API價格非常便宜,無論是輸入還是輸出,價格都只有DeepSeek的R1的一半。
整體來說,文心X1這次的表現確實讓我印象深刻。
三句話。
今天就讓我來分享一下。最後我用三句話來總結:
1.文心4.5是多模態模型,可以理解文字、圖片、影片、音訊等事物,並且具有良好的解釋能力。
2.問心X1在邏輯推理方面仍存在一些不足,與業界頂尖的推理模型仍有差距。
3.文心X1的工具呼叫能力確實讓人眼睛一亮。