WhamTech developed a Hadoop connector that enables EIQ Products™ to externally and independently index and query Hadoop data, providing relational SQL access to, but leaving the data stored in Hadoop. 

No reliance on Hadoop for any query processing and Hadoop data can be combined with other data sources.  Other non-Hadoop cloud storage connectors can be developed. EIQ Products™ in normal configurations enable cloud-like access to internal and external data sources, exposing only standard data models for standard drivers and SQL access.

Many cloud solutions store very large volumes and ingest very high rates of data.  WhamTech is developing the following solution for such a scenario, enabling 10s of billions of records per day to be indexed and simultaneously queried, including complex Event Processing.  The challenge for such a solution apart from high index rates, is non-sequential (random) high cardinality (almost unique) data, which is the worst case scenario for indexing high rates, as random access to stored Indexes is required.  Sequential distributions of high cardinality data, such as a timestamp is relatively simple to deal with using high performance conventional hard drives (HDDs).  Solid state drive (SSDs) are, in general, faster than conventional HDDs, but, more importantly, can cope much better with the random access associated with non-sequential high cardinality indexes.  But as SSDs are at least an order of magnitude more expensive than HDDs, once indexes have been built, they can reside on less expensive high performance HDDs, initially for frequent access, and eventually migrated to even less expensive lower performance HDDs for longer-term, less-frequent access.  HDDs can cope with the more sequential access oriented queries associated with business intelligence and analytics applications.

Therefore, including in-memory cache, disk cache, SSDs, high performance HDDs and lower performance HDDS, there at least five forms of storage involved in the indexing and simultaneous query processing of large volume, high rate cloud storage data (system cache could be important as well).  Some storage vendors, such as one of WhamTech’s partners, EMC, offer mixed storage with internal, self-optimizing automatic data management.  WhamTech can also automatically optimize storage for solutions involving HadoopEIQ™, whether stand-alone or as an add-in for other EIQ Products™.

The following diagram CEIQ1 represents the process and components involved in the indefinitely scalable HadoopEIQ™ stand-alone and add-on EIQ Products™:

CEIQ1: The process and components involved in the indefinitely scalable HadoopEIQ™ stand-alone and add-on EIQ Product™

More information on WhamTech products, click here.