Diffusion Models
In the early 2020s, excitement was growing around using diffusion models for generative AI. Promising applications for image manipulation were emerging.
Latent diffusion models (like Stable Diffusion) work by training on removing artificially added noise to an embedding (a more semantically dense representation) of an image.
Because the denoising step happens on the image's embedding and not the image itself, the result is a model that can flexibly operate on different inputs and tasks. These include extrapolating an existing image to its surroundings ("outpainting"), filling in missing details in an image ("inpainting"), and importantly, creating high-fidelity images from a text prompt.
OpenAI's Dall-E (and subsequent release Dall-E 2) stunned the world with their ability to create compelling images and graphics which could revolutionize logo design, stock photography, and more.
Large Language Models
Large Language Models (LLMs) were the next big thing.
ChatGPT felt like the Sputnik moment - suddenly, consumers could ask complex questions and get surprisingly coherent and well-versed responses.
LLMs are NLP models trained on vast amounts of text content, using advanced AI techniques like Transformers and more powerful compute. While ChatGPT was one of the first publicly revealed LLMs, tech giants like Facebook, Google, and Baidu are racing to catch up with their own projects, such as LLaMA, Bard, and Ernie.
This new AI arms race is reminiscent of the Big Data and Cloud Computing waves of the late 2000s and early 2010s. VCs are increasingly convinced that AI is a durable trend that will fundamentally change how companies operate and possibly even the role of individuals in society, sparking discussions of UBI.
Just like the Cloud Computing wave, we're seeing growing enthusiasm from both Startups and Enterprises hungry to get in on the action.
A wave of startups is scrambling to position themselves as the "X for AI" (X being AWS, Plaid, Stripe, etc). Meanwhile, an even broader set of startups are being built as thin wrappers atop OpenAI's APIs.
Meanwhile, the world's largest companies are answering existential questions, looking for ways to limit their own disruption (even Google is confronting how LLMs may affect their Search business) while exploring new opportunities for employing AI.
API Strategy for AI Businesses
In the midst of the AI frenzy, one thing is clear: Web APIs and API Strategy are crucial for AI-based businesses to succeed.
So, let's take a look at three AI startups and how they're leveraging APIs in this new marketplace:
Startup 1: OpenAI
Background
Currently the big dog
Originally founded in 2015 as a non-profit by tech luminaries (Elon Musk, Sam Altman and co)
Pivoted to a for-profit model in 2018
Has maintained a close partnership with Microsoft
Invested $1B in 2019 and signed an exclusivity agreement for Azure
Invested $10B in early 2023
Their developer platform
OpenAI offers several APIs, generally starting with limited API access to their latest products (currently GPT-4)
Pre-built models that have a defined domain (eg. GPT-3, ChatGPT)
APIs for tailoring different types of models to specific data sets
Who's using OpenAI
Bing is undergoing a serious revitalization and is going all-in on AI-based search
Expect for Google to follow suit
So many startups!
“AI startups that are just a thin wrapper atop OpenAI’s APIs” are becoming a bit of an inside joke on VC Twitter
Big takeaways
The current market leader
Flying high with two breakthrough models available as APIs
Continues to ship improvements to their models
Putting serious pressure on larger companies to build more compelling AI offerings (eg. Google with Bard, FB with LLaMA)
Reminiscent of Tesla pushing the rest of the car industry to lean into EVs
Winning in part through making approachable interfaces (API and user)
ChatGPT's UI is an unsung hero in enabling consumers to "feel the power of AI" rather than hearing about some abstract breakthrough research paper
Startup 2: Hugging Face aka 🤗
Background
Named after the emoji 🤗
Founded in NYC in 2016 with a focus on making it easier to build products that use Natural Language Processing (NLP)
Offers a centralized “hub” for sharing models, data sets, and ML-based apps
Famous for 🤗’s Transformers library, a holistic set of APIs and pre-trained models for handling a variety of tasks
Includes pre-trained implementations of BERT, GPT-2 and many more
Updated as new NLP papers come out, generally tracking alongside the cutting edge
ASIDE: Transformers are a recent breakthrough in NLP models
Transformer-based models are trained on wholesale, variably weighted chunks of input vs. sequential training of recurrent neural networks
Seminal paper for Transformers: Attention Is All You Need
Their developer platform
Inference API
Allows Developers to "run their own API" using a model that lives on Hugging Face's model repository
Developers can create a custom endpoint, choosing their underlying IaaS provider (Microsoft Azure, Google Compute Engine, or AWS)
Once hosted, clients (eg. Web apps) can call these endpoints and get inference results
eg. I can pass text input to a "text-to-image" model and expect an image binary in the API response
Who's using Hugging Face
Square
Intel
Grammarly
Big takeaways
Hugging Face is winning by making ML concepts collaborative and interactive
Docs contain many references, samples, and "try it yourself" models
It's clear that their ambitions go far beyond sharing models and ML libraries. Their Inference offering is an opinionated Platform-as-a-service that empowers developers to deploy their models as Web APIs.
Exciting to see Hugging Face move in this direction and support developers throughout their journeys from model discovery and fine-tuning to hosting and deployment
Startup 3: Speechly
Background
Founded in Finland in 2019
Focused on enabling developers to integrate faster speech recognition, transcription, and custom voice commands
Offers APIs for web and mobile apps to use cutting-edge NLP
Their developer platform
Targeting app developers that want voice-enabled apps
Offers APIs for transcription, understanding/entity detection, and the ability to create custom commands
Who's using Speechly
Doss - AI-enabled digital assistant for buying, selling, and renting homes
Musgrave - Top Irish supermarket change, enabling their new voice shopping functionality
Zoan - Metaverse startup with big-name clients (eg. Nike, Ikea)
Big takeaways
Speechly is singularly focused on voice but offers a range of developer APIs to address its ranging customer needs within this niche
IMO, it’s smart to be taking a vertical slice and owning it end-to-end
They provide tools to adapt models to new data, allowing clients to customize their integration for specific use cases (eg. non-standard words or terminology)
Developers can choose to use Speechly's ML models on-device, on-premise, or via Web API.
Parting thoughts
The tech industry is buzzing with excitement and VCs are pouring in cash. Over the next decade, we're going to see massive companies emerge, collapse, and transform based on these technologies. Some jobs will be made obsolete, and others augmented and made more efficient.
In this rapidly changing landscape, a focused API strategy will be crucial for success.
By examining just a few of the platforms that are competing in this market, what's becoming clear is that winning platforms have a lot in common: they understand their audience, they offer comprehensive solutions and they obsess over the developer experience.