The Houston Astros became the World Champions of Major League Baseball in 2017 by leveraging Data. Sure they had the best team, made up of individual players, which won the championship. But they got to the team and its performance by leveraging the data they had on players. Players who were existing pros in both the major and minor league, and draft candidates. In his seminal book Astroball, Ben Reiter documents both the success the Astros had with decisions made by drawing inferences and predictions from their models and the pitfalls they encountered. Both the hits and the misses. Their biggest hit was really not the championship in 2017, but the resulting transformation of Baseball into a truly data-driven sport. The transformation of baseball teams — players, coaches, scouts, management — into data-driven businesses which treated Data as a product the team developed and made decisions with. Businesses which had succeeded in blending the insights from Data with the intangibles of human passion and grit that players of the sport brought, into their operating model. The broader world which has been on the much-hyped ‘Big Data’ journey for over a decade unfortunately still cannot say the same.
…crucially, they [the Astros] also recognized its [data’s] shortcomings, with which the world at large had only begun to wrestle…
– Ben Reiter, Astroball
My first exposure to the class of Machine Learning algorithms called Neural Networks back in the mid-90s. Neural Networks and Genetic Programming form the core set of advanced Machine Learning techniques powering what is today referred to as Deep Learning. In the early to mid-90s, in the animation research world, which I was a part of back then, the big buzz was around how researchers at Pixar were using Neural Networks and Genetic Programming to teach Luxo to do the Limbo. For those who do not know Luxo, you actually do. Luxo is the table lamp that appears in the Pixar logo at the beginning of all Pixar movies, bouncing on the ‘I’ of Pixar. Back then Pixar was not the mega-movie studio of today but was a small software company that had developed advanced 3D rendering software that was beginning to be utilized, mostly by researchers, to develop cool animated short films. The ultimate goal for any graphics and animation researcher was to have your animation selected to be showcased at SIGGRAPH, the annual Computer Graphics conference. The short animated video titled Luxo Jr. from SIGGRAPH 1986 was the gold standard of cutting edge Animation and 3D rendering of its time. By the mid-90s researchers at Pixar and George Washington University had taken the Luxo lamp and using Genetic Programming were putting it thru repeated runs thru a limbo action with a learning algorithm that learned every time Luxo knocked over the limbo bar until eventually, it learnt how to limbo! Artificial Intelligence — a computer program that could learn from its mistakes and improve its accuracy — all by itself.
Todays Neural Networks and other Machine Learning (ML) algorithms are mostly not new inventions. Most algorithms are based on mathematical and statistical models that are decades old. What is new is the sheer volume of data based upon which the algorithms can be trained to draw accurate inferences and predictions, and the computing power available to process the data.
Computers have always been ‘Data Processing’ Machines first. It is called ‘Information’ Technology after all. Could have very well been called Data Technology. The very early computers just processed data. Transactions and records, as information stored in files and databases. Big Data came about from the recognition that the amount of data we generate and store today is several orders of magnitude greater than ever in the past, and it is growing exponentially year over year. AND, we now have the computing power to store and process this data at a cost and speed that makes it viable to extract business value from the data. Still, the promise and hype of ‘Big Data’, nearly a decade since the term become a buzzword has not borne fruition.
The real investment in Data Science and in commercializing AI and ML at scale is a recent phenomenon — Watson won Jeopardy a mere eight years ago — but still, all the players expected more from the ‘revolution’, despite the investments made by most of the large organizations to AI-enable their companies. While startups have given us highly accurate advertisements on web pages predicting what we are most likely to click on, and devices we can talk to in plain natural language, most large enterprises are limited to chatbots on their websites as the highlight of their foray into Machine Learning. Most ambition projects with massive investments, from diagnosing diseases to robotic automation of financial advising, to fraud detection, have not truly paid the returns expected.
…data-related challenges are a top reason IBM clients have halted or canceled artificial-intelligence projects.
– Arvind Krishna, SVP, Cloud and Cognitive Software, IBM, quoted in the Wall Street Journal, May 28, 2019
What are organizations trying to achieve from AI and Machine Learning and why are they struggling? Exploring these questions is going to be the focus of the next set of blog posts. My thesis of the root causes behind these failures is two-fold:
- Misalignment of Data Strategy and Business Needs: This misalignment can come from both directions — the business having unrealistic expectations of what the exploitation of Data can do for them to derive and drive business value, and from the IT organization not transforming to position themselves to adopt the right Data strategy and technology to properly leverage the value in the Data.
- Lack of a modern Data Culture: Most organizations do not treat Data as a product or even an asset that has both intrinsic and extrinsic value that can be monetized. They are not ready to share data beyond the closed silos of data owners and managers, usually separated by business unit boundaries. They are not willing to modernize their processes and technology stacks to properly process the data to extract business value. And lastly, they are not evolving their organizational structures to allow for the free flow and access of data across the organization, for both internal and third party data.
I will explore both of these areas, and more in more detail in following blog posts. Stay Tuned.