Dissecting the developer strategies of 3 leading AI startups

And how we got here to begin with

Mar 16, 2023

Diffusion Models

In the early 2020s, excitement was growing around using diffusion models for generative AI. Promising applications for image manipulation were emerging.

Latent diffusion models (like Stable Diffusion) work by training on removing artificially added noise to an embedding (a more semantically dense representation) of an image.

Because the denoising step happens on the image's embedding and not the image itself, the result is a model that can flexibly operate on different inputs and tasks. These include extrapolating an existing image to its surroundings ("outpainting"), filling in missing details in an image ("inpainting"), and importantly, creating high-fidelity images from a text prompt.

OpenAI's Dall-E (and subsequent release Dall-E 2) stunned the world with their ability to create compelling images and graphics which could revolutionize logo design, stock photography, and more.

Large Language Models

Large Language Models (LLMs) were the next big thing.

ChatGPT felt like the Sputnik moment - suddenly, consumers could ask complex questions and get surprisingly coherent and well-versed responses.

LLMs are NLP models trained on vast amounts of text content, using advanced AI techniques like Transformers and more powerful compute. While ChatGPT was one of the first publicly revealed LLMs, tech giants like Facebook, Google, and Baidu are racing to catch up with their own projects, such as LLaMA, Bard, and Ernie.

This new AI arms race is reminiscent of the Big Data and Cloud Computing waves of the late 2000s and early 2010s. VCs are increasingly convinced that AI is a durable trend that will fundamentally change how companies operate and possibly even the role of individuals in society, sparking discussions of UBI.

Just like the Cloud Computing wave, we're seeing growing enthusiasm from both Startups and Enterprises hungry to get in on the action.

A wave of startups is scrambling to position themselves as the "X for AI" (X being AWS, Plaid, Stripe, etc). Meanwhile, an even broader set of startups are being built as thin wrappers atop OpenAI's APIs.

Meanwhile, the world's largest companies are answering existential questions, looking for ways to limit their own disruption (even Google is confronting how LLMs may affect their Search business) while exploring new opportunities for employing AI.

API Strategy for AI Businesses

In the midst of the AI frenzy, one thing is clear: Web APIs and API Strategy are crucial for AI-based businesses to succeed.

So, let's take a look at three AI startups and how they're leveraging APIs in this new marketplace:

Startup 1: OpenAI

Background

Currently the big dog
Originally founded in 2015 as a non-profit by tech luminaries (Elon Musk, Sam Altman and co)
Pivoted to a for-profit model in 2018
Has maintained a close partnership with Microsoft
- Invested $1B in 2019 and signed an exclusivity agreement for Azure
- Invested $10B in early 2023

Their developer platform

OpenAI offers several APIs, generally starting with limited API access to their latest products (currently GPT-4)
Pre-built models that have a defined domain (eg. GPT-3, ChatGPT)
APIs for tailoring different types of models to specific data sets

Who's using OpenAI

Bing is undergoing a serious revitalization and is going all-in on AI-based search
- Expect for Google to follow suit
So many startups!
- “AI startups that are just a thin wrapper atop OpenAI’s APIs” are becoming a bit of an inside joke on VC Twitter

Big takeaways

The current market leader
- Flying high with two breakthrough models available as APIs
- Continues to ship improvements to their models
Putting serious pressure on larger companies to build more compelling AI offerings (eg. Google with Bard, FB with LLaMA)
- Reminiscent of Tesla pushing the rest of the car industry to lean into EVs
Winning in part through making approachable interfaces (API and user)
- ChatGPT's UI is an unsung hero in enabling consumers to "feel the power of AI" rather than hearing about some abstract breakthrough research paper

Startup 2: Hugging Face aka 🤗

Background

Named after the emoji 🤗
Founded in NYC in 2016 with a focus on making it easier to build products that use Natural Language Processing (NLP)
Offers a centralized “hub” for sharing models, data sets, and ML-based apps
Famous for 🤗’s Transformers library, a holistic set of APIs and pre-trained models for handling a variety of tasks
- Includes pre-trained implementations of BERT, GPT-2 and many more
- Updated as new NLP papers come out, generally tracking alongside the cutting edge
- ASIDE: Transformers are a recent breakthrough in NLP models
  - Transformer-based models are trained on wholesale, variably weighted chunks of input vs. sequential training of recurrent neural networks
  - Seminal paper for Transformers: Attention Is All You Need

Their developer platform

Inference API
- Allows Developers to "run their own API" using a model that lives on Hugging Face's model repository
- Developers can create a custom endpoint, choosing their underlying IaaS provider (Microsoft Azure, Google Compute Engine, or AWS)
- Once hosted, clients (eg. Web apps) can call these endpoints and get inference results
  - eg. I can pass text input to a "text-to-image" model and expect an image binary in the API response

Who's using Hugging Face

Square
Intel
Grammarly

Big takeaways

Hugging Face is winning by making ML concepts collaborative and interactive
- Docs contain many references, samples, and "try it yourself" models
It's clear that their ambitions go far beyond sharing models and ML libraries. Their Inference offering is an opinionated Platform-as-a-service that empowers developers to deploy their models as Web APIs.
Exciting to see Hugging Face move in this direction and support developers throughout their journeys from model discovery and fine-tuning to hosting and deployment

Startup 3: Speechly

Background

Founded in Finland in 2019
Focused on enabling developers to integrate faster speech recognition, transcription, and custom voice commands
Offers APIs for web and mobile apps to use cutting-edge NLP

Their developer platform

Targeting app developers that want voice-enabled apps
Offers APIs for transcription, understanding/entity detection, and the ability to create custom commands

Who's using Speechly

Doss - AI-enabled digital assistant for buying, selling, and renting homes
Musgrave - Top Irish supermarket change, enabling their new voice shopping functionality
Zoan - Metaverse startup with big-name clients (eg. Nike, Ikea)

Big takeaways

Speechly is singularly focused on voice but offers a range of developer APIs to address its ranging customer needs within this niche
- IMO, it’s smart to be taking a vertical slice and owning it end-to-end

They provide tools to adapt models to new data, allowing clients to customize their integration for specific use cases (eg. non-standard words or terminology)
Developers can choose to use Speechly's ML models on-device, on-premise, or via Web API.

Parting thoughts

The tech industry is buzzing with excitement and VCs are pouring in cash. Over the next decade, we're going to see massive companies emerge, collapse, and transform based on these technologies. Some jobs will be made obsolete, and others augmented and made more efficient.

In this rapidly changing landscape, a focused API strategy will be crucial for success.

By examining just a few of the platforms that are competing in this market, what's becoming clear is that winning platforms have a lot in common: they understand their audience, they offer comprehensive solutions and they obsess over the developer experience.

Router by Dmitry Pimenov

Discussion about this post