In the previous posts on Big Data, we talked about some of the base technology tools that are in use today by companies all over the world to drive their Big Data programs. We talked about the Hadoop ecosystem and a few of the projects or tools that have become common in commercial use cases. Things like Hbase, Hive, Impala, Storm, Spark etc...
But, we can't limit the big data world to just the world of Apache Hadoop. There are dozens and dozens of other technology tools, frameworks, platforms and applications that have been developed over even the past 5 years, that drive real value for organizations. Take a look at the chart below and you can see that there are a ton of players. But I will only dig in to a few of them that I have seen real value be generated within the folks I work with everyday.
As I said, a TON of companies making a play in the Big Data arena. Still lots of opportunity I think to build more useful apps, but that is for another post down the road.
Out of these many dozen companies on this graphic, I want to call out a couple:
Elastic:
Formerly known as ElasticSearch, Elastic is an open source indexing and search tool. Similar to Apache Solr, Elastic is used by companies to take documents or chunks of data or even individual log events, index them, make them searchable and then purge when space is needed. The beauty to me of the Elastic platform is not just the search mechanism, but also the other tools that have been built to enhance the user experience in using Elastic. In particular, I call out Kibana as a great tool that sits on top of Elastic and makes it very easy to find data that you are looking for.
NiFi:
When it comes to the world of big data, almost nothing is more important that actually being able to easily and quickly move data from one place to another. But not just move the data, but securely move it, at scale and the ability to recover in case something happens in transit. This is where Apache NiFi shines. NiFi has been picked up with incredible speed by some of the largest companies in the world to fill in the gap that they all have of more effectively moving their data around their organization.
Neo4j:
As we start to move to a world that is based more and more on relationships and networks, graph databases start to become more and more important and that is what Neo4j is all about. Every company, whether they like it or not, over the next few years will need to start connecting the dots that exist about their customers, partners, suppliers etc... Doing this with traditional databases is almost impossible and there are not any really great tools out there within the common frameworks that make graph database a possibility, besides Spark. So, we will begin to see real growth in this area is my view and should be an area to keep an eye on for new, more user friendly kinds of solutions.
Ok, this is the end of this part of the series on Big Data, focused on the technology. Again, my goal was not to go toe to toe with all of the architects of the world on what big data technology really is or how it works. My goal was to help business teams get just enough of the detail about this technology that it helps them make more informed decisions with their internal and external technology partners for their big data programs.