Introduction

Our platform provides access to a diverse range of Language Learning Models (LLMs) from leading AI research companies such as OpenAI, Anthropic, Meta, and Mistral. These models are integral to the function of AI agents within our ecosystem, each offering unique capabilities and performance characteristics tailored to different user needs.

Below is an overview of the LLMs currently available on our platform, including their Elo ratings, speed metrics, and compute point consumption.

Available Models

Model NameProviderElo Rating 🏆Context Window SizeCPs / 1K TokensTool Use
GPT-4oOpenAI1287128,000 tokens1.16667Yes
Claude 3.5Anthropic1271200,000 tokens1.16667No
GPT-4 TurboOpenAI1257128,000 tokens1.66667Yes
Claude 3Anthropic1249200,000 tokens1.16667No
Llama3-70BMeta12468,192 tokens1.11667No
GPT-4o miniOpenAI1238128,000 tokens0.08333Yes
Mistral LargeMistral AI115632,000 tokens1.06668Yes
Claude 2Anthropic1119200,000 tokens4.16667No
Mistral SmallMistral AI111432,000 tokens0.26667Yes
Claude InstantAnthropic1109100,000 tokens0.41667No

Elo Rating / Intelligence Points

We have adapted the Elo rating system system to rate the intelligence points of LLMs based on their performance in natural language understanding and generation tasks. The ratings are sourced from the LMSYS Chatbot Arena Leaderboard, which is a crowdsourced open platform for evaluating LLMs through human preference votes. For more details, please visit: LMSYS Chatbot Arena Leaderboard.

Tool Use

Certain LLMs on our platform support tool use capabilities. This feature allows AI agents to perform a variety of tasks beyond text generation, enabling them to interact dynamically with external systems and datasets. As the field of AI continues to advance, we anticipate that tool use capabilities will become standard across all LLMs.

Speed Metrics

Speed is determined by two factors:

  • Time to first token generation: The latency from when a prompt is sent to when the first piece of output is received.
  • Tokens per second generation: The rate at which the model generates tokens after the initial response has begun.

These metrics are crucial for applications that require real-time interaction or high throughput.

Compute Points

As explained previously, compute points are a measure of resource consumption when utilizing LLMs. They correlate with the complexity and size of the models, where lighter models consume fewer points compared to more robust models like GPT-4o/Claude-3.5.

Selecting the Right LLM for Your Needs

Choosing an appropriate LLM depends on several factors, including:

  1. System message alignment and performance/speed requirements.
  2. Distinct writing styles preferred for your application.
  3. Model responsiveness and speed for real-time interactions.
  4. Language proficiency if operating in languages other than English.

Our platform’s flexibility allows users to tailor AI agents for specialized tasks by selecting from these varied LLMs.

Conclusion

Understanding the unique attributes and capabilities of each available LLM on our platform will enable you to make informed decisions about which model best suits your needs. Whether you prioritize eloquence, speed, multilingual support, or cost-effectiveness in terms of compute points, our selection aims to meet a comprehensive range of requirements.