What's New
New updates and improvements to Shiro
OpenAI o1-preview Model Added
New
OpenAI's latest o1-preview model trained with reinforcement learning is now active on Shiro for free and premium users.
The o1 series of large language models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user.
Learn about the capabilities and limitations of o1 models in the OpenAI reasoning guide.
Language model name:
The o1 series of large language models are trained with reinforcement learning to perform complex reasoning. o1 models think before they answer, producing a long internal chain of thought before responding to the user.
Learn about the capabilities and limitations of o1 models in the OpenAI reasoning guide.
Language model name:
o1-preview
Mistral's new NeMo model added
New
We've just added Mistral's new NeMo model to Shiro. Premium users can now create, test, and deploy prompts using the Mistral NeMo model.
NeMo is Mistral's new best small model. A state-of-the-art 12B model with 128k context length, built in collaboration with NVIDIA, and released under the Apache 2.0 license.
Language model name:
NeMo is Mistral's new best small model. A state-of-the-art 12B model with 128k context length, built in collaboration with NVIDIA, and released under the Apache 2.0 license.
Language model name:
open-mistral-nemo
OpenAI GPT-4o and GPT-4o-mini Models Added
New
OpenAI's new flagship model GPT-4o ("o" for "omni") is now active on Shiro as well as the lighter GPT-4o-mini model. Both models are available for free and premium users.
GPT-4o matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API.
Language model name:
GPT-4o mini scores 82% on MMLU and currently outperforms GPT-4 on chat preferences in LMSYS leaderboard. It is priced at 15 cents per million input tokens and 60 cents per million output tokens, an order of magnitude more affordable than previous frontier models and more than 60% cheaper than GPT-3.5 Turbo.
Language model name:
GPT-4o matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API.
Language model name:
gpt-4o
GPT-4o mini scores 82% on MMLU and currently outperforms GPT-4 on chat preferences in LMSYS leaderboard. It is priced at 15 cents per million input tokens and 60 cents per million output tokens, an order of magnitude more affordable than previous frontier models and more than 60% cheaper than GPT-3.5 Turbo.
Language model name:
gpt-4o-mini
Gemini Pro 1.5 Model Added
New
We've just added Google's new Gemini 1.5 Pro model to Shiro. Premium users can now create, test, and deploy prompts using the Gemini-1-5-Pro model.
Gemini 1.5 delivers dramatically enhanced performance. It represents a step change in Google's approach, building upon research and engineering innovations making Gemini 1.5 more efficient to train and serve. It’s a mid-size multimodal model, optimized for scaling across a wide-range of tasks, and performs at a similar level to 1.0 Ultra, Google's largest model to date. It also introduces a breakthrough experimental feature in long-context understanding.
Language model name:
Gemini 1.5 delivers dramatically enhanced performance. It represents a step change in Google's approach, building upon research and engineering innovations making Gemini 1.5 more efficient to train and serve. It’s a mid-size multimodal model, optimized for scaling across a wide-range of tasks, and performs at a similar level to 1.0 Ultra, Google's largest model to date. It also introduces a breakthrough experimental feature in long-context understanding.
Language model name:
gemini-1.5-pro
Claude 3.5 Sonnet Now Available
New
Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus and is now available on Shiro for premium users. This performance boost, combined with cost-effective pricing, makes Claude 3.5 Sonnet ideal for complex tasks such as context-sensitive customer support and orchestrating multi-step workflows.
Language model name:
Language model name:
claude-3-5-sonnet-20240620
Claude 3 Haiku Anthropic's Fastest Model Now Available
Update
Claude 3 Haiku, Anthropic's fastest and most affordable model in its intelligence class is now available on Shiro. With state-of-the-art vision capabilities and strong performance on industry benchmarks, Haiku is a versatile solution for a wide range of enterprise applications. The model is now available alongside Sonnet and Opus for Shiro subscribers.
Speed is essential for enterprise users who need to quickly analyze large datasets and generate timely output for tasks like customer support. According to Anthropic, Claude 3 Haiku is three times faster than its peers for the vast majority of workloads, processing 21K tokens (~30 pages) per second for prompts under 32K tokens.
Speed is essential for enterprise users who need to quickly analyze large datasets and generate timely output for tasks like customer support. According to Anthropic, Claude 3 Haiku is three times faster than its peers for the vast majority of workloads, processing 21K tokens (~30 pages) per second for prompts under 32K tokens.
Language model name:
claude-3-haiku-20240307
Claude-3 Models from Anthropic Now Available
Update
Yesterday Anthropic announced the release of the new claude-3 models. These are now available in Shiro as language model options when creating prompts.
Claude 3 Opus
Anthropic's most intelligent model, with best-in-market performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Opus shows us the outer limits of what’s possible with generative AI.
Anthropic's most intelligent model, with best-in-market performance on highly complex tasks. It can navigate open-ended prompts and sight-unseen scenarios with remarkable fluency and human-like understanding. Opus shows us the outer limits of what’s possible with generative AI.
Language model name:
Strikes the ideal balance between intelligence and speed—particularly for enterprise workloads. It delivers strong performance at a lower cost compared to its peers, and is engineered for high endurance in large-scale AI deployments.
Language model name:
claude-3-opus-20240229
Claude 3 Sonnet Strikes the ideal balance between intelligence and speed—particularly for enterprise workloads. It delivers strong performance at a lower cost compared to its peers, and is engineered for high endurance in large-scale AI deployments.
Language model name:
claude-3-sonnet-20240229
New Mistral Large Model Released Today is Now Available
Update
Au Large, Mistral's new cutting-edge text generation model was just released today February 26, 2024. It reaches top-tier reasoning capabilities. It can be used for complex multilingual reasoning tasks, including text understanding, transformation, and code generation.
Mistral Large achieves strong results on commonly used benchmarks, making it the world's second-ranked model generally available through an API (next to GPT-4).
Mistral Large is now available on Shiro as a language model option when creating prompts.
Mistral Large achieves strong results on commonly used benchmarks, making it the world's second-ranked model generally available through an API (next to GPT-4).
Mistral Large is now available on Shiro as a language model option when creating prompts.
Language model name:
mistral-large-latest
Mistral Large comes with new capabilities and strengths:- It is natively fluent in English, French, Spanish, German, and Italian, with a nuanced understanding of grammar and cultural context.
- Its 32K tokens context window allows precise information recall from large documents.
- Its precise instruction-following enables developers to design their moderation policies – we used it to set up the system-level moderation of le Chat.
- It is natively capable of function calling. This, along with constrained output mode, implemented on la Plateforme, enables application development and tech stack modernization at scale.
Mistral Models Added as Language Model Options
New
We've added all generative Mistral.ai models that are accessible through the Mistrail API. This includes mistral-tiny, mistral-small, and mistral-medium. These are now available in Shiro as language model options when creating prompts.
Tiny
This generative endpoint is best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.
This generative endpoint is best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.
Currently powered by Mistral-7B-v0.2, a better fine-tuning of the initial Mistral-7B released, inspired by the fantastic work of the community.
Language model name:
open-mistral-7b
Small
Higher reasoning capabilities and more capabilities.
Higher reasoning capabilities and more capabilities.
The endpoint supports English, French, German, Italian, and Spanish and can produce and reason about code.
Currently powered Mixtral-8X7B-v0.1, a sparse mixture of experts model with 12B active parameters.
Language model name:
mistral-small-latest
Medium
This endpoint currently relies on an internal prototype model.
This endpoint currently relies on an internal prototype model.
Language model name:
mistral-medium-latest
Gemini Pro Model Added
New
We've just added Google's new Gemini-1-Pro model to Shiro. Users can now create, test, and deploy prompts using the Gemini-1-Pro model.
API Endpoint added for deployments: GenerateCompletion
New
We've just released an update to our API. Shiro users can now use their prompt deployments through the API via the
/api/v1/generate_completion
endpoint. Check out the GenerateCompletion endpoint documentation for more information.Evaluation Metrics for Quantitative Analysis of Prompts
New
We've just added a new feature to test cases called Evaluation Metrics (evals) to provide the ability to perform quantitative analysis of AI-generated responses to your prompts.
Related Doc: Using Evaluation Metrics for Quantitative Analysis of Prompts
Related Article: Quantitative and Qualitative Analysis of AI Generated Prompt Responses
Related Doc: Using Evaluation Metrics for Quantitative Analysis of Prompts
Related Article: Quantitative and Qualitative Analysis of AI Generated Prompt Responses
Shiro API V1 now available
New
We've released V1 of the Shiro API. Please note that this API is under active development and in Beta.
Related Article: Using the Shiro API
Related Doc: Shiro API Documentation
Related Article: Using the Shiro API
Related Doc: Shiro API Documentation
Use your own API keys for OpenAI
Improvement
We've added the ability to add both development and production API keys for the OpenAI API. To learn more please see the documentation.
GPT-4 32K now available through OpenAI
Update
We've added the GPT-4 32K model which is now available through OpenAI. The GPT-4 32K model is available on all of our plans.
Microsoft Azure OpenAI Service is now supported
New
We've just added support for Microsoft Azure OpenAI Service APIs. Shiro enterprise teams can now use hosted versions of OpenAI's cutting edge models with additional layers of trust and security beyond the standard OpenAI models.
Microsoft's integrated safety system provides protection from undesirable inputs and outputs and monitors for misuse. Microsoft Azure OpenAI Service models use built-in responsible AI and enterprise-grade Azure security, read more about Microsoft data privacy.
Your prompts (inputs) and completions (outputs), your embeddings, and your training data:
Microsoft's integrated safety system provides protection from undesirable inputs and outputs and monitors for misuse. Microsoft Azure OpenAI Service models use built-in responsible AI and enterprise-grade Azure security, read more about Microsoft data privacy.
Your prompts (inputs) and completions (outputs), your embeddings, and your training data:
- are NOT available to other customers.
- are NOT available to OpenAI.
- are NOT used to improve OpenAI models.
- are NOT used to improve any Microsoft or 3rd party products or services.
- are NOT used for automatically improving Azure OpenAI models for your use in your resource (The models are stateless unless you explicitly fine-tune models with your training data).
- Your fine-tuned Azure OpenAI models are available exclusively for your use.
The Azure OpenAI Service is fully controlled by Microsoft; Microsoft hosts the OpenAI models in Microsoft’s Azure environment and the Service does NOT interact with any services operated by OpenAI (e.g. ChatGPT, or the OpenAI API).
OpenAI GPT-3.5 Turbo now available
Update
We've updated our base models with both OpenAI and Microsoft Azure to GPT-3.5 Turbo!
4x faster data embeddings and enhanced security with pgvector
Improvement
We've just made an upgrade to store our vector embeddings in our own PostgreSQL database using pgvector. Previously we had been using a 3rd party hosted vector database with Pinecone.
This enhancement makes it 4 times faster to process embeddings for similarity search. This also improves security since we can now store this data in our own database with our own managed database backups. We no longer have to trust Pinecone to keep this data secure and properly backed up.
Overall the reasons we chose to bring this functionality in-house were:
This enhancement makes it 4 times faster to process embeddings for similarity search. This also improves security since we can now store this data in our own database with our own managed database backups. We no longer have to trust Pinecone to keep this data secure and properly backed up.
Overall the reasons we chose to bring this functionality in-house were:
- Postgres provides a wide range of vector-relevant features, including database backups, row-level security, client library support, and support for Object-Relational Mapping (ORM) in 18 programming languages. It also boasts complete ACID compliance and offers efficient bulk updates and deletes, with metadata updates taking just seconds.
- Consolidating all data within a single Postgres instance reduces the number of roundtrips in a production environment and allows for the convenient running of the entire developer setup on a local machine.
- Reducing the number of external databases reduces operational complexity and lowers the learning curve for managing data.
- Postgres is a battle-tested and robust database system, whereas many specialized vector databases have yet to establish a track record of reliability.