Raising our third fund has resulted in an incredible amount of LP conversations. Telling the evolution of our firm to so many new people made us realize that the world has changed quite a bit since we wrote our initial thesis outline in 2018. Despite being right in the middle of a fundraise process, we decided it would make sense for our team to revisit our Fund’s thesis to determine if it was still relevant in the current market (tldr, yes more than ever). We digested a wide range of feedback from our fundraising meetings, which ultimately led to a subtle evolution, which we’re calling Thesis 2.0, that we hope better articulates the opportunity in software we see today.
Our firm initially began its investing journey focused on financial services, and we soon realized that Fintech was split into two sets of opportunities, the first created by innovations in distribution (internet, mobile) and the second by innovations in data (machine learning and data science). The distribution oriented opportunities had massive scale, but required incredible amounts of capital to meet high customer acquisition costs. We found ourselves more attracted to data oriented opportunities that had more capital efficient paths to growth and greater product defensibility.
We quickly learned that when a company had a valuable dataset, there were parties interested in licensing that data, but the even larger opportunity was to use that data to power novel products that could rapidly increase the enterprise value of a small company. This led to our Thesis 1.0, that software was becoming a commodity, and proprietary data was the key to unlocking new markets at scale. This led to some of our investments such as Ocrolus, Entera, Windfall, Verikai, and Zentist — all companies with unique datasets they could build on to scale.
As we approached investors for our latest fund, one question kept coming up — why was data valuable? Everyone seems to recognize its value, and agreed with our focus on data as a differentiator vs software features, but people struggled to connect the dots between data and value creation. This led to numerous internal debates at VSV around how data creates value in more concrete terms.
We started by asking “what are our companies doing with their datasets?” Invariably the answer was that the companies were using those datasets to train specialized Machine Learning or Machine Vision models. When we reflected on “why are our companies building models?” The most obvious answer was “to predict something” which is indeed literally what AI models do today. But the more nuanced answer was “that’s how computers understand how something works”. Computers don’t understand the natural world in the same way humans do because current AI models are essentially a highly complex mathematical prediction of what will occur given a set of inputs. As it turns out, this is enough for computers to approximate how an aspect of the world works, and then we can use software to do useful things with that understanding, sometimes without any human involvement at all. This is the fundamental recipe that our portfolio companies are employing — build a unique dataset, leverage that dataset to train an AI model, and build applications on top of the model to capture more software spend.
Alchemy, one of our 2020 Fund portfolio companies, is a powerful example of this. The company offers an Electronic Lab Notebook solution for chemical and materials companies. These companies spend billions of dollars developing custom products for customers in labs by combining various compounds to get just the right set of physical properties. As a by-product, Alchemy aggregates a large amount of data on chemical compounds, interactions, and resulting physical properties. With that data Alchemy is developing an AI model of physical property and chemical formulation prediction that helps the company’s proprietary software “understand” how compounds interact and what formulations are most likely to deliver the desired set of physical properties. This can drastically decrease real-world lab expenses and time to design new products, opening a whole new area of market growth to the Alchemy team.
Historically, Vertical SaaS has been a challenging sector for venture capital. Often, the software is so specialized that the number of target companies and seat licenses at any given company is low, leading to small TAMs. But these Vertical SaaS businesses can be treasure troves of data unique to a particular industry. When these datasets are used to train specialized AI models, they can power new products that drive higher levels of automation and productivity. Simply put, combining unique datasets with AI creates a step change in the opportunity set for Vertical SaaS companies. Not only can they expand products, they can automate or enhance processes that formerly required humans — becoming what we call “Vertical AI”.
This increased use of AI and automation is already beginning to change how companies think about pricing and unit economic models. When customers are just as likely to consume your product through an API or software agent, seat-based pricing begins to make less sense. Indeed, transactional pricing models are increasingly more prevalent in our portfolio than seat based pricing. While this is not new at the cloud infrastructure level, it is new at the vertical application layer — where seat based SaaS has long dominated. This opens up new risks in the form of revenue volatility, but also creates paths to significant growth as companies can scale with our customer’s usage, not just their headcount. In a world of increasing automation, scaling with processes feels like a better bet than scaling with headcount.
All of this led us to our Thesis 2.0, an updated articulation of why data is important and the opportunity we see for the decade.
VSV Thesis 2.0
The next era of software opportunity will come from using AI to model how the world works and building scalable applications on those models. The “Vertical AI” layer has dramatically expanded the size of the existing Vertical SaaS market. Specifically, Vertical AI increases the operational leverage — by increasing revenues or decreasing costs — that software can provide to a business. Winning companies in this space will leverage their own sustainable, proprietary data acquisition platforms to train their own specialized AI models, creating a data feedback loop that increasingly sets them apart.