When Steve Jobs was in second grade, the future was already clear: George Jetson would push a button to turn his flying car into a briefcase when he got to work, and his wife Jane would talk to her friends … on Skype. It was September 1962, and no one talked about the Hadoop skills gap.
There’s a certain utopian affliction to the kind of bright future that “Big Data” seems to promise. Now that open source has made innovation as frictionless as Teflon and data scientists put the finishing touches on moving the data pipeline from MapReduce to Spark, we’ll be all set.
Really? To listen to discussions of the Big Data skills gap, you’d think that what’s holding back utopia is all those people who don’t get how it’s going to work. Except “those people” is everyone. In this week’s Hadooponomics Podcast, which Arcadia Data underwrites, James Haight of Blue Hill Research hosts Alex Williams and Benjamin Ball of The New Stack, respectively Founder/Editor-in-Chief and Technical Editor. They talk about the chicken-and-egg problem of bringing down the barriers to big data and other disruptive technologies. Says Williams:
“We’ve seen that Big Data’s one thing, and data scales but data scientists do not … there’s a need to find more ways for people to use the data without needing to have the deep skill sets that have traditionally been required.”
The role of APIs, Williams goes on to say, is that they enable the developer to look at data through the lens of APIs, and let the underlying data system do the work for you. Big data needs to become an application development platform. Ball thinks we’re moving in the right direction.
“I think that most developers, and really the workforces that we’re addressing, not only have the capabilities to learn these technologies broadly, but they are [doing so] right now.”
Skills gap? Or keyword mismatch in Linked-In profiles?
What continues to drive closed the skills gap is the continuing increase in the level of abstraction provided by the underlying platform. Visual Basic opened the floodgates of basic application programming to millions of non-computer-scientists. The continuously improving abstraction in using Hadoop as a platform – capabilities expressed as services, rather than constant tuning of underlying machinery – can do the same for non-data-scientists.
The virtuous cycle is that as developers of varying skills drawn in by simpler APIs, more people use the software. More people using software makes it better. In fact, that’s exactly the dynamic behind open source. But, as Haight observes, it’s not the non-confrontational, room for everyone, all-are-welcome, pure innovation utopian meritocracy. The emergence of foundations – notably the OpenStack Foundation, the Linux Foundation, and the Open Data Platform Initiative – shows that for all the virtues of democratization, there’s big money at stake. Open Source, says Ball, has
“grown into a really political environment. [It] makes sense, there’s a lot at stake in a lot of these emerging technologies, on maybe not who’s gonna be on top, but whose base technology is gonna be on top, whose spec is gonna run it all. It’s the House of Cards of the open source community right now.”
Politics meets economics. Is that really so new? Not in Big Data or the small data, much of it still with us, that came before. Gerrymandering – the drawing of boundaries to lock-in economic and political power and lock in disenfranchisement – is as much a factor in the data management stack as it is the convoluted geography of congressional districts in Alabama.
It doesn’t have to be that way. The more people get access to big data platforms, the more it becomes clear that these boundaries may not be drawn for the benefit of the people who get value from the data. After all, George Jetson didn’t have an iPhone, and still ground it out every day working for The Man at Spacely Sprockets. But nobody said he didn’t have the skills.