Startup Trifacta embracing the data scientist in all of us

Organizations drowning in big and small data will soon have a new way to wrangle, munge or transform it however you want to describe the process thanks to software from startup Trifacta that's now in beta tests.

The San Francisco company, whose staff has grown from its three computer scientist founders a year ago to a robust 22 employees, today announced a second round of venture funds totaling $12 million led by Greylock Partners and Accel Partners. That brings overall funding to $16.3 million. The new funds will allow Trifacta to bulk up product design, engineering, sales and marketing, and should pave the way for the company to roll out its data management technology next year.

First-time CEO Joe Hellerstein, who this week will present at the Data Science Summit http://venturebeat.com/events/databeat2013/ in Redwood City, Calif., says Trifacta software sits "in the lifecycle of data between the time it has landed in infrastructure typically something like Hadoop and the time it is consumed in business intelligence or predictive analytics tools." Expect Trifacta to partner with companies whose products bookend its own, such as Cloudera, Pivotal and Tableau.

Trifacta seeks to make cleaning up that raw data faster so that data scientists and business analysts can spend more of their time analyzing it and making use of it (And the company isn't overlooking "small data," either, such as enabling users to transfer contact information from one app to another.). "There's a real user interface challenge," Hellerstein says. (Hear him talk about the startup as well in the embedded audio clip.)

The product will consist of a Web-based user interface and lightweight service on the back end that Hellerstein says can be hosted on relatively modest hardware and that will contain metadata useful for predicting how users might employ the software the longer they have it. The serious number crunching still takes place on existing big data infrastructure, he says.

While the hullabaloo over big data might elude those running enterprise networks, Hellerstein sees plenty of ways in which such IT pros might benefit from getting a better handle on the streams and reams of information in their organizations.

"Everyone's a data scientist nowadays, everybody is trying to be more data driven," he says. "People in IT obviously have a great interest in looking at machine-generated data in one form or another...such as in data security (not that Trifacta is targeting that vertical market particularly)."

One beta customer example cited by Hellerstein involves talking to an IT department at a large computing equipment manufacturer that's attempting to enable product divisions to manage call-home logs for their devices. Such logs can be what Hellerstein describes as "hilarious" in the way they are mis-formatted as only a hardware designer could do, and the challenge is to translate the data so that it can be analyzed by the product team and not have IT serve that function.

Another scenario involves a large financial institution seeking to gets its arms around machine failure predictions in IT. This goes beyond just looking at logs and trends, but rather running algorithms to tease out what comments (yes, unstructured data) found in trouble tickets might indicate about impending system failures.

Trifacta gets its name in part from a play on horseracing's trifecta  -- picking the first three finishers in the correct order. But it also has to do with needing to address the biggest challenges in data management by solving people, data and computation sides of the problem.

That's where Trifacta's triple threat of a founding team comes in. The company's technology was developed by computer science professors Hellerstein of the University of California Berkeley (a database and distributed systems expert) and Jeffrey Heer of the University of Washington (who did his Ph.D. at Berkeley while Hellerstein was on the faculty and is a data visualization and human-computer interaction expert), along with Stanford University Ph.D. Sean Kandel. You can get a sense of where the technology comes from as well by checking out this Stanford video of the DataWrangler project that Trifacta's founders built. Hellerstein says Trifacta's team generalized from the ideas in Wrangler to build a "broader and more powerful platform" based on all new code.

A year ago, when Trifacta emerged from stealth mode, Hellerstein was quoted as saying he didn't see much in the way of competition. A musician who has literally tooted his horn and other instruments alongside the likes of Harry Connick, Jr., the CEO still says that Trifacta is out in front. But he acknowledges that Trifacta now has plenty of company, including from startups (i.e., Paxata, Data-Tamer) and established data warehouse and integration companies (i.e., Informatica). "It's no surprise this area's heating up," Hellerstein says.

Read more about software in Network World's Software section.

Join the Computerworld newsletter!

Error: Please check your email address.

Tags applicationsPivotalbig datasoftwaredata miningcloudera

More about Accel GroupGreylock PartnersInformaticaPivotalStanford University

CIO
ARN
Techworld
CMO