Perplexity Labs has recently introduced a new, fast, and efficient API for open-source Large Language Models (LLMs) known as pplx-api. This innovative tool is designed to provide quick access to various open-source LLMs, including Mistral 7B, Llama2 13B, Code Llama 34B, and Llama2 70B. The introduction of pplx-api marks a significant milestone in the field of AI, offering a one-stop-shop for open-source LLMs.
One of the key features of pplx-api is its ease of use for developers. The API is user-friendly, allowing developers to integrate these models into their projects with ease using a familiar REST API. This ease of use eliminates the need for deep knowledge of C++/CUDA or access to GPUs, making it accessible to a wider range of developers.
Perplexity Lab pplx-api
The pplx-api also boasts a fast inference system. The efficiency of the inference system is remarkable, offering up to 2.9x lower latency than Replicate and 3.1x lower latency than Anyscale. In tests, pplx-api achieved up to 2.03x faster overall latency compared to Text Generation Inference (TGI), and up to 2.62x faster initial response latency. The API is also capable of processing tokens up to 2x faster compared to TGI. This speed and efficiency make pplx-api a powerful tool for developers working with LLMs.
Benefits of the pplx-api
Ease of use: developers can use state-of-the-art open-source models off-the-shelf and get started within minutes with a familiar REST API.
Blazing fast inference: thoughtfully designed inference system is efficient and achieves up to 2.9x lower latency than Replicate and 3.1x lower latency than Anyscale.
Battle tested infrastructure: pplx-api is proven to be reliable, serving production-level traffic in both Perplexity answer engine and Labs playground.
One-stop shop for open-source LLMs: Perplexity Labs is dedicated to adding new open-source models as they arrive. For example, we added Llama and Mistral m
The infrastructure of pplx-api is reliable and battle-tested. It has been proven reliable in serving production-level traffic in both Perplexity’s answer engine and Labs playground. The infrastructure combines state-of-the-art software and hardware, including AWS p4d instances powered by NVIDIA A100 GPUs and NVIDIA’s TensorRT-LLM. This robust infrastructure makes pplx-api one of the fastest Llama and Mistral APIs commercially available.
API for open-source LLMs
The pplx-api is currently in public beta and is free for users with a Perplexity Pro subscription. This availability allows a wider range of users to test and provide feedback on the API, helping Perplexity Labs to continually improve and refine the tool. The API is also cost-efficient for LLM deployment and inference. It has already resulted in significant cost savings for Perplexity, reducing costs by approximately $0.62M/year for a single feature. This cost efficiency makes pplx-api a valuable tool for both casual and commercial use.
The team at Perplexity is committed to adding new open-source models as they become available, ensuring that pplx-api remains a comprehensive resource for open-source LLMs. The API is also used to power Perplexity Labs, a model playground serving various open-source models. The introduction of pplx-api by Perplexity Labs represents a significant advancement in the field of AI. Its ease of use, fast inference system, reliable infrastructure, and cost efficiency make it a powerful tool for developers working with open-source LLMs. As the API continues to evolve and improve, it is expected to become an even more valuable resource for the AI community.
In the near future, pplx-api will support:
Custom Perplexity LLMs and other open-source LLMs.
Custom Perplexity embeddings and open-source embeddings.
Dedicated API pricing structure with general access after public beta is phased out.
Perplexity RAG-LLM API with grounding for facts and citations.
How to access pplx-api
You can access the pplx-api REST API using HTTPS requests. Authenticating into pplx-api involves the following steps:
1. Generate an API key through the Perplexity Account Settings Page. The API key is a long-lived access token that can be used until it is manually refreshed or deleted.
2. Send the API key as a bearer token in the Authorization header with each pplx-api request.
3. It currently support Mistral 7B, Llama 13B, Code Llama 34B, Llama 70B, and the API is conveniently OpenAI client-compatible for easy integration with existing applications.
For more information, visit the official Perplexity Labs API documentation and Quickstart Guide.
Filed Under: Technology News, Top News
Latest TechMehow Deals
Disclosure: Some of our articles include affiliate links. If you buy something through one of these links, TechMehow may earn an affiliate commission. Learn about our Disclosure Policy.