When the likes of Yahoo, Google, Facebook, Twitter, Amazon and Netflix said, Come to our door; we have the online experience that you want, they got what they asked for, and then some. Overrun by a torrent of user-generated content, web log data, and an insatiable demand for new, more sophisticated features, these companies would run into technology show-stoppers caused by the inability of legacy compute, storage and insight tools to perform at online, web scale. We know what happened next: much to the chagrin of the big-iron, tier-1 vendor, they (and others) collectively created and donated what is now accepted as the modern-day, open-source data analytics stack.
A partial taxonomy of this new ecosystem includes:
Multiple distributed NoSQL databases for persistence
Hadoop Map/Reduce and Apache Spark for distributed computation
Hadoop HDFS for scalable storage
Apache Storm and Spark Streaming for real-time, distributed CEP
Apache Solr for textual search & analytics
Visualization tools
APIs & libraries required to stitch them all together.
To be precise, this is more like toolchain, and requires the would-be adopter to (1) commission a body to evaluate & select from among numerous work-in-progress (read: evolving) components, then (2) acrobatically paste them together towards some meaningful business analytic end, and finally (3) hopefully derive enough business benefit to offset the cost of undertaking (1) and (2).
And therein lies the complication that gives organizations pause.
Raw Platform Development versus Business Value Development
To understand why, consider what an organization bargains for when they purchase, say, an enterprise disk storage array. They buy packaged technology that allows them to maintain a focus on building core Line-of-Business value solutions. Now a lot is required to build one of these disk storage units, just as a lot is required to build and sustain a big data platform:
Strategically placed memory to cache real-time I/O requests
Compression codecs to efficiently ferry and store records
Algorithms for intelligent transfers between caches and persistence layers
Hierarchies of progressively cheaper storage for information lifecycle management (ILM)
Versioned snapshots
and among other things, firmware to make it all work
That building one of these units incurs years of engineering expertise, evolution road maps, blueprint redesigns, industry grade testing, and even the spectacular engineering failure of designs that worked on paper but not in practice; means that this BUY also relieved the purchasing organization from the cost, risk and expertise required to do the same.
And yet, designing, building and operating a big data platform suddenly requires exactly that: beyond traditional LOB-focused value development, organizations now face core platform engineering responsibility, using components and paradigms that are evolving faster than they can be understood.
For Netflix and Twitter, this isn’t a problem. Behind the movie streaming service and messaging platform, are R&D technology powerhouses with get-it-done creative licenses to match. With small armies of data scientists, programmers and architects, not only do they have the know-how, but also the organizational green light to nimbly combine, augment and even mutate the disparate components of this open source toolchain, into cohesive solutions.
A Tail Of Different Cultures
But standing in contrast to those ideal settings are long-established firms who, while having data and analytic needs of their own, also have legacies that will challenge their ability to enjoy similar benefit from these new tools. Banks for instance, have long used technology to enable their lines of businesses, yet remain today what they have always been: regulated financial institutions — not tech companies. So while they do have ongoing development, they are LOB-sponsored & focused business affairs; not the open source, R&D-centric, extreme software engineering that we see emerge from these tech firms. The truth is that most firms that pre-date the open source movement (circa 1993), are BUY+800-SUPPORT centric -vs- BUILD centric (and so are not accustomed to solutions that are not integrated, vendor off-the shelf). However, not only are big data platforms BUILD affairs, they are ITERATIVE BUILD affairs (as those who have dealt with data skew, GC pauses in real-time use cases, data corruption, schema evolutions and many other in-practice challenges, are always ready to explain).
A Cautious Approach is Needed
From a big data technology adoption point of view, the Tech -vs- Bricks & Mortar cultural distinctions are significant because they impact both the willingness to accept and ability to absorb technology adoption risk. To name a very few:
Rapid and, in some cases, forklift technology evolution
Support organizations that are non existent or not mature enough to know what mission-critical support entails
Sunset/EOL risks, driven by the possible acquisition of providers and technologies during the - yet to come - industry consolidation phase
Here again the tech savvy firm will have no problem adjusting to these vagaries, but most others will require risk and use-case considered selection and implementation guidance in order to minimize exposure.
And then we must consider the big data adoption balance-sheet: cost, risk, benefit.
This one is also easy for the tech and social media firm to reconcile because core business revenue is directly impacted by their ability to do big data gymnastics on their prime assets ... assets, by the way, which -- excluding Amazon -- are nearly 100% digital/virtual (not physical):
Profiles & preferences of user accounts numbering in the hundreds of millions
User emails, tagged pictures, locations, demographics, cyber connections, etc.
Media assets, which can be recommended via machine learning
Other digital content that needs to be stored, classified and mined
Relationships that can be learned and exploited
In the tech setting, one can see how the benefits of big data technology offsets, and even supplants it’s costs and adoption pains. You can practically draw a straight line from one (adoption) to the other (benefit). But for legacy brick & mortar companies considering big data technology with a bolt-on approach -- as an ad-hoc means to alleviating downstream business or tactical pressures -- the cost / benefit / risk equation is not so clear, not so direct. And again, this is especially true given that the technology landscape is evolving so quickly. On the benefit side of the ledger alone, we should expect and embrace a healthy dose of skepticism from companies that still haven’t cashed-in on costly ERP, CRM and BI investments made just 10 – 15 years ago — technologies that held a similar business promise.
It may not be popular -- certainly not with the big data marketing and evangelist crowds -- but a measured approach is prudent: Put your CTO hat on and be mindful that the application of this ecosystem beyond the digital sea in which tech savvy firms almost exclusively operate, will likely not translate similar benefit-to-cost multiples in legacy operations.
Hi! I am a content-detection robot. I found similar content that readers might be interested in:
https://www.linkedin.com/pulse/challenge-modern-day-analytics-platforms-when-youre-google-vega
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Yes indeed. That's my profile on LinkedIn. =:)
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit