Anthropic Announces Claude 3 AI Models; Beats GPT-4 and Gemini 1.0 Ultra

Another week, another AI model surpassed GPT-4, at least on benchmarks. This time, it’s Anthropic, the company formed by ex-OpenAI members Daniela and Dario Amodei, who are siblings. The company haslauncheda family of Claude 3 models featuring Opus (largest and most capable), Sonnet (mid-size), and Haiku (smallest) models. Anthropic says the Claude 3 Opus model beatsGPT-4 and Gemini 1.0 Ultraon all popular benchmarks.

Claude 3 Benchmarks

Claude 3 Benchmarks

Anthropic has tested all three models on popular benchmarks like MMLU, GPQA, GSM8K, MATH, HumanEval, HellaSwag, and more. On MMLU, Claude 3 Opusscored 86.8%whereas GPT-4 has a reported score of 86.4%. Gemini 1.0 Ultra got 83.7% on the same 5-shot prompting technique.

On the HumanEval benchmark that tests coding ability, the largest Opus modelscored 84.9%, much higher than GPT-4’s 67% and Gemini 1.0 Ultra’s 74.4% score. The Clade 3 Opus model even defeated GPT-4 in the HellaSwag test but with a slight margin. It scored 95.4% whereas GPT-4 got 95.3% and Gemini 1.0 Ultra achieved 87.8%.

Claude 3 Capabilities

Claude 3 Capabilities

Overall, the largest Claude 3 Opus model looks very promising and we will definitely test it againstGPT-4, Gemini 1.5 Pro, andMistral Largeso stay tuned with us. Apart from that, Anthropic says that all three models have great capabilities in analysis and forecasting, nuanced content creation, code generation, and fluency in international languages likeSpanish, Japanese, and French.

Claude 3 models also have vision capability, however, Anthropic is not marketing them as multimodal models. Anthropic says the vision capability in Claude 3 can help enterprise customers process charts, graphs, and technical diagrams. On benchmarks, itdoes better than GPT-4Vbut slightly lags behind Gemini 1.0 Ultra.

200K Context Length

In terms of context length, Anthropic says that all three models will initially offer a context window of 200K tokens, which is quite large, I must say. In addition, the company says that Claude 3 family models canprocess more than 1 million tokens, however, this capability will be available to select customers only.

On the Needle In A Haystack (NIAH) test with over 200K tokens, the Opus model performed exceptionally well withover 99% accurate retrieval, just like Gemini 1.5 Pro. Claude has been one of the best AI models for long context retrieval, and the performance has significantly improved with Claude 3.

Performance and Pricing

Coming to performance, Anthropic states that Claude 3 models are quite fast and the largest Opus model offers the same performance as Claude 2 and 2.1, but with better intelligence. The mid-size Sonnet model is almost2x faster than Claude 2and 2.1. On top of that, Anthropic mentions that Claude 3 models are significantly less likely to refuse to answer, which was an issue in earlier models.

You can start using the flagship Opus model by subscribing toClaude Prowhichcosts $23.60after taxes. And the mid-size Claude 3 Sonnet is already deployed on the free version of claude.ai (visit). Finally, developers can immediately access APIs for Opus and Sonnet models.

As for the API pricing, Claude 3 Opus with a 200K context window costs $15 per one million tokens (input) and$75 per one million tokens (output). In comparison to GPT-4 Turbo ($10 input / $30 output with 128K context), the pricing seems quite expensive.

Nevertheless, what do you think about the new family of models released by Anthropic, especially the Opus model? Let us know in the comment section below.

Arjun Sha

Passionate about Windows, ChromeOS, Android, security and privacy issues. Have a penchant to solve everyday computing problems.

Add new comment

Name

Email ID

Δ

01

02

03

04

05