How Machine Learning Revolutionizes Content Management: An Interview with Jeff Kiske
When Jeff Kiske told his father that his deep-learning company, Engine ML, was being acquired by digital-transformation leader Ripcord, he recognized there was a certain amount of irony in the situation. Kiske's father owns a printing company in St. Louis, where Kiske, Ripcord's new director of machine learning systems, grew up.
"I joked with him ahead of the acquisition that it's his job to turn PDFs into paper, and it's our job to turn paper back into PDFs," Kiske recalled in a recent interview.
Such an explanation oversimplifies both Kiske's engineering aptitude and the ingenuity of Engine ML's solutions. Engine ML technology creates the infrastructure to train machine-learning models, which for Ripcord customers will soon mean the ability to pull otherwise hidden, business-boosting insights from mounds of data.
From Self-Driving Cars to Mountains of Paper
Perhaps surprisingly, the technology has roots in driverless cars, the sector where Kiske got his professional start.
Kiske, who completed his undergraduate studies at the University of Pennsylvania, began his "journey in machine learning," as he calls it, in 2013. He was a graduate student at Stanford University, doing research in the school's Artificial Intelligence Lab with Professor Andrew Ng, founder and CEO of Landing AI, founder of deeplearning.ai, and co-founder of Coursera. In 2015, Kiske and several former labmates co-founded the self-driving startup Drive.ai, which Apple Inc. bought last year.
"Over the past seven years or so, I've been moving around different self-driving-vehicle companies, from trucks to cars," Kiske said. "My primary focus has been in the computer-vision side of things. Since around 2012, there's been a big shift in how companies use computer vision, primarily using deep-learning methods to get actual insights from their data."
With Ripcord's purchase of Engine ML, machine-learning technology that Kiske and his colleagues previously used to help identify and avoid patterns in a driverless car's route will now be utilized to the advantage of businesses in insurance, retail, energy, and numerous other sectors.
Using Engine ML's integrated solution, Ripcord will be able to train machine-learning algorithms on behalf of clients so that those clients can better automate their business processes, saving time and cash and freeing up human workers for the kind of higher-level, creative thinking that only people can do.
Using ML to Find What Clients Don't Know They Have
"Some customers have so much content that they don't even know what exists out there, so the first process is discovery," Kiske said. "Machine learning can help them learn what data exists. Then Ripcord can build new machine-learning models to find information" within that data."
Kiske gave the example of an accounting department whose workers need to pull various line items from an invoice. A traditional machine-learning model that might be used to help cut down on the department's time and effort outlay may have seen 100,000 types of invoices and thus know how to extract data from those documents, but what if this department's invoices don't look like any of those others? Can it still hope to extract data this way?
With Ripcord technology enhanced with the machine learning technology from Engine ML, the answer is a definitive "yes." Using the Engine ML technology, this theoretical accounting department need only annotate "maybe 100 documents, or maybe even 10 documents" and with some of the pre-trained models already on-hand as well as a bit of fine-tuning, the department will get the same level of accuracy it would have gotten with a highly trained human being culling the data.
It may all sound like a long, cumbersome process but, in fact, with Ripcord and Engine ML, significant time and effort will be saved.
"Training these models can take a long time," Kiske said, citing weeks or months as usual timeframes for such work. The solution Engine ML brings to Ripcord "can shrink that down" from weeks to mere days, and from days to just hours, he noted.
That's all fine for structured data, the skeptic might say. But what about unstructured data, the type that makes up video and audio files, for example? Put simply, the technology works there, too.
A model extracts insights from unstructured data by taking a cluster of recognized information and setting it aside, at which point a human being can review it, Kiske explained.
"When you have a billion documents, having a human look through all of those documents can take a long time," Kiske said. "By using unsupervised machine-learning methods, you can very quickly extract clusters of similar-looking content, and then you can use those clusters to seed the supervised machine-learning tasks."
Of course, the use of the new technology will only take an organization so far; to reach its full potential, a business or other entity must know what it wants from the technology and where it should go from there. The tech can't tell a company how best to capitalize on the extracted data once it's gotten it.
"You can build very specific detectors for parts of that field," Kiske said. But "the machine learning aspect is to get as much raw data out of that content as possible. It's up to business logic tools to take care of the rest."
Still, there's a MacGyver-like resourcefulness and ingenuity to the technology that will soon be available to Ripcord clients. For example, Engine ML's solution uses machine learning to locate staples, paperclips, and other fasteners amid veritable seas of paper documents. And it's now using this same technology to recognize, locate, and pinpoint various company and corporate branding symbols such as logos.
"It's exciting how quickly the technology has accelerated in its development," Kiske said, noting that he recently read machine-learning model size has doubled in size every eight months since approximately 2013. "When it comes to making sense of the world, we are much better equipped than we ever have been."
Use with Care
While in the COVID-19 era we hardly need reminding of it, technological advancement can be a double-edged sword. That's why setting and maintaining a solid understanding of the ethics of machine learning and its use is so important, according to Kiske.
"Any sort of new technology can be used in an inappropriate fashion," he said. "And I think it's really up to the researchers and the community to make decisions on what is appropriate and what is not appropriate" for machine learning to do. We should consider machine-learning technologies as a tool versus a replacement for humans," Kiske said.
The integration of Ripcord and Engine ML technologies put powerful tools in the hands of organizations looking to get the most from the mountains of paperwork they currently have. Not just any document management solution will do. Find out more about what Ripcord can do today!