While the COVID-19 pandemic has halted many venture fundings, Tecton.ai has been able to buck the trend. This week the company announced a $20 million investment from Andreessen Horowitz and Sequoia (last year there was a $5 million angel round).
Tecton is a platform that morphs raw data into AI models that can be successfully deployed. And yes, this is far from a trivial process.
“The foundational success of an AI-based technology revolution or even the build of a very simple algorithm ultimately lies in the health of the data,” said Kim Kaluba, who is the Senior Manager for Data Management Solutions at SAS. “However, in survey after survey organizations continue to report problems with accessing, preparing, cleansing and managing data, ultimately stalling the development of trustworthy and transparent analytical models.”
Consider that the data wrangling is often the most time-consuming and expensive part of the AI process. “Some data scientists report spending 80% of their time collecting and cleaning data,” said Jen Snell, who is the Vice President of Product Marketing and Intelligent Self Service at Verint. “This problem has become so ubiquitous that it’s now called the ‘80/20 rule’ of data science.”
Regarding Tecton, the technology is the result of deep experience of its three founders, who helped build the AI platform for Uber (called Michelangelo). “When we got to Uber, everything was breaking because of the extreme growth,” said Mike Del Balso, who is the CEO and co-founder of Tecton. “Data was spread across silos and there were challenges with the deployment of models. With Michelangelo, we made an end-to-end platform that was targeted for the average data science person. We didn’t want to create huge engineering teams. We also built Michelangelo with the focus on production, collaboration, visibility and reusability.”
Within a couple years, the platform would lead to the development of thousands of AI models, helping with such capabilities as ETA, safety and fraud scores. The result was more sustainable growth and stronger competitive advantages for Uber.
Why Is Data So Complicated?
Data is actually fairly simple. It’s just a string of numbers, right?
This is true. But data does present many tough challenges for enterprises, even for some of the most advanced technology companies.
“Oftentimes the data that we receive is ‘dirty,’” said Melissa McSherry, who is the SVP Global Head of Credit and Data Products at Visa. “Think about your credit card statement. The merchant names are sometimes unrecognizable—that has to do with the way merchants are set up in the system. When we clean up the data, we can often generate amazing insight. But that is significant work. Oftentimes organizations don’t understand how much work is required and are disappointed in what it takes to actually get results.”
Another issue with data is organizational. “Enterprises enforce data security and governance policies that weren’t designed to feed data science teams with a steady stream of up-to-date, granular business data,” said Bethann Noble, who is the Senior Director of Product Marketing and Machine Learning at Cloudera. “As data science teams start new projects with different stakeholders, they have to solve for data access once again, which could mean a different journey through a different bureaucratic maze every time. And the necessary data can be anywhere, in any form—residing across different data centers, cloud platforms, or edge devices. It needs to be moved and pre-processed to be ready for machine learning, which can involve complex analytical pipelines across physical and organizational silos.”
Keep in mind that the data problem is only getting more complicated. Based on research from IDC, the total amount of global data will reach 175 zettabytes by 2025, up from 33 zettabytes in 2018 (a zettabyte is 10 to the 21st power or 1 sextillion bytes!)
“In this digital age, we are suffering from ‘InfoObesity’—gorging ourselves on an inconsumable amount of data that is not just unwieldy but can become dysfunctional, especially as we increase the amount of data we collect without scaling our ability to support, filter and manage it,” said Michael Ringman, who is the CIO of TELUS International. “While investing in Big Data is easy, efficient and effective use of it has become difficult.”
Oh, and then there are the privacy and security issues. “Given the mass amounts of data used for complex algorithms, data science platforms can be hot targets for data breaches,” said Ross Ackerman, who is the Director of Analytics and Transformation at NetApp. “Often, the most important data for algorithms contain or can be mapped to CII (Customer Identifiable Information) or PII (Personal Identifiable Information).”
For enterprise AI applications, there are really two main approaches. First, there are analytical models, which provide insights like forecasted churn rates. These types of applications do not need real-time data.
Next, there are operational models. These are embedded in a company’s product, such as a mobile app. They need highly sophisticated data systems and scale. “This is where you can create magical experiences,” said Del Balso.
For the most part, Tecton is about operational models, which are essentially the most demanding–but can provide the most benefits. “It’s high stakes,” said Del Balso.
Tecton is built to streamline the data pipeline, which means that data scientists can spend more time on building effective models. An essential part of this is a feature store that allows for the seamless transition between data scientists and data engineers. Tecton, of course, has other cutting-edge features–and the funding will definitely accelerate the innovation (the platform is currently in private beta).
“For decades, companies have worked to develop technology, knowledge, skills and infrastructure to handle and harvest unstructured data in pursuit of unlocking answers to the most difficult questions,” said Michal Siwinski, who is a Corporate VP at Cadence Design Systems. “However, there’s more work to be done. Because the technology is still continuing to evolve, data is a virtually untapped resource with only as high as 4% of today’s data being analyzed.”