Upgraded Again! Google Unveils an Enhanced Gemini 3 Deep Think Model for Scientific Challenges

Upgraded Again! Google Unveils an Enhanced Gemini 3 Deep Think Model for Scientific Challenges — Shares Rise Against the Market Trend

熱門文章

Google has launched a product that could redefine the rules of the AI race — a major upgrade to the Gemini 3 “Deep Think” reasoning mode.

Recently, Wall Street’s attitude toward the AI narrative has undergone a fundamental shift. Investors no longer applaud ambitious roadmaps, nor are they willing to blindly fund hundred-billion-dollar capital expenditure plans. What the market now demands is proof — evidence that the money being burned is turning into tools capable of solving real-world problems. Google’s newly released Gemini 3 Deep Think upgrade arrives precisely at this inflection point in market sentiment.

Benchmark Scores Highlight the Weight of the Upgrade

On ARC-AGI-2 — a benchmark designed to test the core reasoning capabilities of artificial general intelligence and deliberately resistant to “training data memorization” — Gemini 3 Deep Think achieved an accuracy rate of 84.6%, verified by the ARC Prize Foundation.

For comparison: Claude Opus 4.6 (Thinking Max) scored 68.8%, GPT-5.2 (Thinking xhigh) achieved 52.9%, and three months ago Gemini 3 Pro Preview stood at just 31.1%.

On “Humanity’s Last Exam” — an extreme test compiling PhD-level interdisciplinary knowledge — the model scored 48.4% without external tools, significantly outperforming GPT-5.2’s 34.5%. More important than the absolute number is the context: an independent study released the previous week showed that the average failure rate of the seven most advanced frontier models on this benchmark was as high as 85.2%.

On the competitive programming platform Codeforces, its Elo rating surged to 3455. To put that in perspective, among elite human competitors, a rating above three thousand is legendary. A score of 3455 implies consistent gold-medal competitiveness in most timed algorithm competitions. In the International Mathematical Olympiad of 2025, it achieved gold-medal-level performance.

Concrete Demonstrations of Capability

Google highlighted a particularly tangible application: converting hand-drawn sketches into 3D-printable model files. Users can draw a rough diagram, and Deep Think analyzes the shapes, constructs complex geometric models, and generates files suitable for additive manufacturing. This is no longer a “concept demo” of potential usefulness — it directly enters the multi-billion-dollar computer-aided design software market.

Another compelling validation comes from academia. Rutgers University mathematician Lisa Carbone used Deep Think to review a technical mathematics paper, and the model identified a subtle logical flaw that had not been caught during the human peer-review process. This is no longer merely an assistive tool; it is becoming a parallel verifier of intellectual labor. In a world where millions of scientific papers are published annually and qualified reviewers are scarce, the commercial and societal value of this capability may be significantly underestimated.

Google emphasized that this upgrade was developed in close collaboration with scientists and researchers. That statement deserves careful reading. Over the past two years, large-model development has largely been driven by architectural engineering — larger parameters, longer context windows, more efficient attention mechanisms. But scientific research operates differently: problems often lack clear boundaries, data is incomplete, answers may be multiple or evolving. This differs fundamentally from standardized tasks like code generation, document summarization, or customer service automation. The Deep Think upgrade shows verifiable performance gains in chemistry, physics — including theoretical physics — and other scientific fields.

It May Even Reshape AI Competition

Viewed within a longer industrial cycle, this release marks a critical turning point. The dimension of competition among AI giants is shifting from “who has the smartest model” to “who can provide higher-density productivity tools for professional intellectual work.”

OpenAI holds a first-mover advantage with GPT-5.2. Microsoft leverages deep integration between Azure and OpenAI to dominate enterprise access. Anthropic has built a moat around safety alignment. Google’s newly revealed card is this: in the hardest-to-automate and most intellectually demanding domain — scientific research — it is currently leading.

This is not an isolated model release. It is a signal that Google is integrating DeepMind’s foundational research capabilities, Google Cloud’s compute infrastructure, and Gemini’s productization engine into a vertically integrated solution for high-intellectual-density industries. Its competitors are not only OpenAI but also long-standing professional software firms that have survived through knowledge asymmetry and tool complexity.

This represents a genuinely scarce asset in the current AI investment narrative. Compute power can be replicated, parameters can be scaled. But deeply embedding frontier models into professional workflows — and enabling end users to tangibly feel a leap in efficiency — requires deep understanding of vertical domains, long-term collaboration with research communities, and design excellence that reduces product complexity to the point of requiring no manual.

That is the real battleground of the next phase of AI.