The State of Generative AI
This is a weekly newsletter about the business of the technology industry. To receive Tanay’s Newsletter in your inbox, subscribe here for free:
Hi friends,
I touched on the explosion of Generative AI in my predictions piece. Given it seems to be *the* thing in tech currently, I figured it’s worth taking stock of where we are today.
The rapid speed of development of Generative AI is transforming the way we think about technology. Not only has it opened up a world of creative possibilities, but it has also made numerous everyday tasks easier and more efficient.1
So this week, I’ll be discussing a bit about where we are in generative AI, some use cases we’ve started to see, and what we can expect to see.
I’ll discuss it by the medium that is generated:
Text
Code
Image
Audio
Video
Multi-modal
I’ll touch on the first three this week, and discuss the latter three next time around.
Text
We spend much of our time every day searching for information, reading, or writing.
Text generation models, which are also capable of things like summarization and question answering, can help assist us across these tasks are one of the areas where Generative AI is the furthest along, both in terms of development and usage by real people.
From a model perspective, today the leading model out there that is accessible to developers and others is OpenAI’s GPT-3, A 175B parameter model.
There are also several open-source models out there that provide alternatives, including:
BLOOM, a 176B parameter model by BigScience
GPT-J, a 6B parameter model by EleutherAI
OPT, a 66B parameter model by Meta
OpenAI’s GPT-4, a once rumored to be 100T parameter model is expected to launch sometime this year. A graphic about the relative number of parameters has been floating around which has become a bit of a meme. OpenAI has later confirmed that the numbers of parameters will likely only be somewhat larger than GPT-3, and not 100T. Second, actual performance doesn’t necessarily scale with parameters.
However, everyone does expect GPT-4 to be significantly better than GPT-3, and output text that is even more human-like, which will just further advance the use cases below.
Use Cases
A few common use cases of text generation that are seeing customer traction include:
I/ Text generation for marketing and sales:
This includes tasks such as generating copy for websites, blog posts for content marketing, captions for social media/ads, and cold emails for sales.
Some of the leading companies offering this today are:
Jasper, a generalized text generation platform is reportedly making over $50M in annual recurring revenue
Copy.ai, a marketing and sales assistant, which is over $10M in ARR
Lavender, which focuses on the outbound sales email use case
II/ Reading and Writing Assistants
This is similar to the above category but more focused on assisting those that quickly read and write emails, articles, etc better / faster, including their research process. Many of these are essentially building new forms of text editors.
Notion, a popular productivity app which added AI that can be used across multiple use cases
Lex, an AI-powered editor for writing
Bearly, an AI-powered research assistant
Orchard, an AI-powered text editor
III/ Text generation for specific use cases:
This includes tasks such as drafting contracts and other specialized documents which typically require some additional structure/knowledge such as product development, contracts, etc.
While GPT-3 can generate these itself, it typically needs to be fine-tuned or require writing specific prompts, which is where these companies are today trying to add value.
Examples include:
Spellbook from Rally, which is an AI-assisted plugin to help draft contracts for lawyers
WritemyPRD, which is an AI-assisted Product Requirement document writer.
IV/ Search, Triage and Synthesis for specific use cases
Related to the above, both for horizontal use cases such as product development, customer support, and others, as well as vertical use cases (finance industry, etc), there are opportunities to build products that can summarize, synthesize, and generate the appropriate response or answer to a question, automating a lot of manual work.
Examples include:
Cohere, an AI-based Customer Support tool that allows reps to be more efficient
Viable, an AI-based Product development tool that analyses customer feedback data to inform decision making
Hebbia, which allows for search and question-answering on a corpus of documents
Code
Code is of course a form of Text, but is worth discussing separately.
Most existing LLMs such as GPT-3 that can generate text can also generate code, but some versions of models are specifically trained and fine-tuned for code generation.
For example, OpenAI Codex, based on GPT-3, is a general-purpose programming model.
Other models trained specifically for code include:
CodeGen, an open-source model by Salesforce
AlphaCode, a model by Deepmind which is not available to others
Use Cases
Code generation has pretty obvious use cases, but let’s go over the form it can take.
I/ AI-Assisted Programming
GitHub CoPilot, an AI pair programmer who was first announced in 2021 became publicly available last year and is the flagship product in code generation.
GitHub Copilot is changing the way developers work. For developers who have it installed, it can write up to ~40% of the code.
As that number continues to go up, these products have that
Other examples of products that help programmers include:
II/ SQL Generation / Data Analysis
While SQL is just a subset of code, one of the interesting applications of SQL generation is that it can go a long way towards automating questions that business users have about data in their organization. In this way, code generation could be used to make data analysis faster and available to users who don’t know SQL.
The same could also be applied to formulas in spreadsheet apps and similar and there are several plugins that allow using GPT-3 or similar in Google Sheets / other spreadsheet apps.
Examples of companies doing this include:
AirOps is building an AI data sidekick which understands a company’s data schemas and can convert natural language questions into SQL / the resulting answer.
Seek which is building a conversational engine to ask questions and get the answer / SQL to the answer.
III/ App Builders
The holy grail in code generation would be going from a natural language prompt to a custom software/app which does what the user wants. Essentially, low-code or no-code on steroids. While this is unlikely to work for super complicated use cases or say realistic as a way to maintain and build upon a large app in production, this could be particularly feasible for internal apps and for simple apps that stitch together a bunch of existing products, the kind that people used things like Zapier for in the past.
A few examples of companies tackling this are:
If you don’t yet receive Tanay's newsletter in your email inbox, please join the 5,000+ subscribers who do:
Images
While OpenAI’s Dall-E launched in January 2021, it was last year where Image Generation models exploded.
2022 saw the launch of Dall-E 2, Stable Diffusion and Midjourney, 3 text-to-image generation models (which also support a few other things).
Dall-E 2 by OpenAI which was initially in closed beta but then launched publicly, allowing anyone to sign up to OpenAI and get ~15 credits per month of images.
MidJourney made itself accessible via a Discord server, which today has more than 8.7M people. Anybody can join that server and start generating images.
Stable Diffusion which is an open-source model released in collaboration between Stability AI, Runway, and LMU Munich
The models continue to evolve – Midjourney for example is onto version 4 within a year – but the below gives you a sense of the differences between them in their output.
And here is what the prompt for kitten produces for different versions of Midjourney as it evolved.
Use Cases
Image generation can have a variety of use cases, from fun to useful. While the models by themselves can be used directly for various things, companies and products will pop up for specific use cases that fit in with the workflows or are fine-tuned for that use case.
Let’s discuss some of the things we’ve started to see.
I/ Consumer Social
Given the nature of the medium, images are more appealing than text for consumption and so there are several consumer social type use cases where these models can and have taken off.
One area which saw a big explosion on Social Media at the tail end of 2022 was AI Avatars with the leading example being Lensa, which was at one point generating more than $2M/day in the app store (though now down to ~$200K/day). These apps allowed users to upload 10-15 images of themselves and created an AI avatar form of them. While the app fizzled out, I expect more use cases, such as a better version of Bitmoji, iMessage apps that help visualize text conversations based on Avatars, and other similar products.
Another area I expect to see more of is digital influencers. We’ve of course had a bunch of them such as Lil Miquela by Brud years ago before this technology, but with the AI tech improving, they’re now easier to create and more accessible, and so I expect to see a lot more, such as AiLice.
II/ Marketing / Sales
Images are a key medium used to help tell stories in websites, presentations, and advertisements. Image Generation is a great fit for these use cases.
Products have already popped up to serve these use cases, whether it be creating ad creatives for social media or magazines, creating product shoots for eCommerce websites, or even finding the perfect assets for presentations or sales collateral.
They also can effectively replace the need for stock images/photos in many cases. Some platforms, such as Adobe Stock and Shutterstock, have embraced AI and allow the selling of AI-generated images.
Canva has a text-to-image feature that can be used to create an image to drop into a design of posters/ads, etc
Microsoft has announced an AI-assisted design tool Microsoft Designer
Flair takes simple product images and creates high-quality product shots / branded content
Botika seeks to reinvent fashion shoots using AI
III/Design-Related Tasks
Aside from the graphic design tasks touched on above, the image models can generate pretty good UI designs, as below.
It’s easy to see how they can be used to assist designers and non-designers with various kinds of design jobs either by providing inspiration or starting points. But this is true not just for graphic/UI design, but also for 3d design of characters, interior design, and architecture among others which has been used across many industries such as real estate, gaming, media, and technology.
Some examples include:
Uizard is an AI-assisted UI design tool that can generate components with AI
Hypar is an AI-assisted building system design platform
Mirage can be used to generate video game assets
Thanks for reading! If you liked this post, give it a heart up above to help others find it or share it with your friends.
If you have any comments or thoughts, feel free to tweet at me.
If you’re not a subscriber, you can subscribe for free below. I write about things related to technology and business once a week on Mondays.
This paragraph was generated by ChatGPT :)