Baseten

Baseten

Software Development

San Francisco, CA 3,723 followers

Fast, scalable inference in our cloud or yours

About us

At Baseten we provide all the infrastructure you need to deploy and serve ML models performantly, scalably, and cost-efficiently. Get started in minutes, and avoid getting tangled in complex deployment processes. You can deploy best-in-class open-source models and take advantage of optimized serving for your own models. We also utilize horizontally scalable services that take you from prototype to production, with light-speed inference on infra that autoscales with your traffic. Best in class doesn't mean breaking the bank. Run your models on the best infrastructure without running up costs by taking advantage of our scaled-to-zero feature.

Website
https://www.baseten.co/
Industry
Software Development
Company size
11-50 employees
Headquarters
San Francisco, CA
Type
Privately Held
Specialties
developer tools and software engineering

Products

Locations

Employees at Baseten

Updates

  • View organization page for Baseten, graphic

    3,723 followers

    We’re thrilled to introduce Chains, a framework for building multi-component AI workflows on Baseten! ⛓️ 🎉 Chains enables users to build complex workflows as modular services in simple Python code—with optimal scaling for each component. Read our announcement blog to learn more: https://lnkd.in/eHsqG4yV After working with AI builders at companies like Patreon, Descript, and many others, we saw the increasing need to expand the capabilities of AI infrastructure and model deployments for multi-component workflows. Our customers found that: 🫠 They were often writing messy scripts to coordinate inference across many models 🫠 They were paying too much for hardware by not separating CPU workloads from GPU ones 🫠 They couldn’t quickly test locally, which drastically slowed down development Other solutions either rely on DAGs or use bidirectional API calls to make multi-model inference possible. These approaches are too slow, inefficient, and expensive at scale. They also fail to enable heterogeneous GPU/CPU resourcing across models and code, leading to overprovisioning and unnecessary compute costs. We built Chains in response to customer needs to deliver reliable and high-performance inference for workflows using multiple models or processing steps. Using Chains, you can: ✅ Assemble distinct computational steps (or models) into a holistic workflow ✅ Allocate and scale resources independently for each component ✅ View critical performance metrics across your entire Chain Chains is a game-changer for anyone using or building compound AI systems. We’ve seen processing times halve and GPU utilization improve 6x. With built-in type checking, blazing-fast deployments, and simplified pipeline orchestration, Chains is our latest step in enhancing the capabilities and efficiency of AI infrastructure! 🚀 Try Chains today with $30 free credits and tell us what you think! https://lnkd.in/ecjknaZM

    Introducing Baseten Chains

    Introducing Baseten Chains

    baseten.co

  • View organization page for Baseten, graphic

    3,723 followers

    🏆 After considering technical specifications, customer conversations, and doing our own testing, we put together a list of the best LLMs across 6 categories: ✅ The best big LLM ✅ The best small LLM ✅ The best-aligned chat LLM ✅ The best LLM for code generation ✅ The best LLM for fine-tuning ✅ The best overall open-source LLM 📝 Check out the list: https://lnkd.in/gFx7kvyM Philip Kiely breaks down what we love about these models and what to watch out for, especially as new models are released all the time.

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    3,723 followers

    🤹♀️ The NVIDIA A10 GPU is an Ampere-series graphics card popular for common ML inference tasks, from running seven billion parameter LLMs to models like Whisper and Stable Diffusion XL. 🤔 But you won’t find any A10s on AWS. Instead, AWS has a special variant: the A10G, which powers their G5 instances. Philip Kiely breaks down the difference between the A10 and A10G for model inference on the Baseten blog: https://lnkd.in/ewKM6URX

    • No alternative text description for this image
  • View organization page for Baseten, graphic

    3,723 followers

    After seeing AI builders reach the limitations of real-time inference, we're excited to announce asynchronous inference on Baseten! 🔄 🎉 Anyone can run async inference on any model—whether trained in-house, fine-tuned, or open-source—without making any changes to their code. 😎 Check out our blog to learn more: https://lnkd.in/epWRE9Xp You’ll learn: ⚡ How async inference works ⚡ When to use async inference instead of real-time ⚡ How we built a robust async inference solution with a delightful developer experience Async inference on Baseten protects against different types of inference failures while leveraging idle compute to lower costs. You can reliably schedule thousands of inference requests without worrying about the complexity of queueing, model capacity, or scaling GPUs at inconvenient times. 🚀 If you’re curious about async inference, don’t miss our live webinar and Q&A on August 15th featuring Samiksha Pal and Helen Yang! https://lnkd.in/e6KaYi5G

  • View organization page for Baseten, graphic

    3,723 followers

    Join our engineering team! 🚀 We just opened a new role for Site Reliability Engineers! 🔧 https://buff.ly/3zwIrlv Interested in designing, implementing, and maintaining critical systems that support our cutting-edge AI and machine learning products? Reach out! Looking for another engineering position? We’re still hiring for 5 other roles as well, check them out: https://buff.ly/3WggFTx

    Site Reliability Engineer

    Site Reliability Engineer

    jobs.ashbyhq.com

  • View organization page for Baseten, graphic

    3,723 followers

    💡 It's clear that optimizing an ML model is key to high-performance inference, but the infrastructure used to serve that model can have an even greater impact on its performance in production. 🌐 Our co-founder Philip Howes broke down how globally distributed model serving infrastructure (both multi-cloud and multi-region) benefits availability, cost, redundancy, latency, and compliance. Check it out: https://lnkd.in/ene3pPVV

    The benefits of globally distributed infrastructure for model serving

    The benefits of globally distributed infrastructure for model serving

    baseten.co

  • View organization page for Baseten, graphic

    3,723 followers

    Did you know you can launch Stable Video Diffusion from our model library? 📽 Sidharth Shanker first introduced Stable Video Diffusion on our blog (https://lnkd.in/eB9AkTQp), and it still stands out as a fun way to create stock footage or bring storyboards and personal photos to life. 🏞 Launch it on an A10G in just a few clicks, and show us what you make! 🎨 https://lnkd.in/eC_J_yA3

    Stable Video Diffusion | Model library

    Stable Video Diffusion | Model library

    baseten.co

Similar pages

Browse jobs

Funding