The Dirtiest Little Secret about Big Data: Short Supply of Expensive Data Scientists Spending over 50% of their Time Preparing Data

The Dirtiest Little Secret about Big Data: Short Supply of Expensive Data Scientists Spending over 50% of their Time Preparing Data

The Dirtiest Little Secret about Big Data: Short Supply of Expensive Data Scientists Spending over 50% of their Time Preparing Data

A recent article: http://www.zdnet.com/article/the-dirtiest-little-secret-about-big-data-jobs/ has “Jobs” listed as the dirtiest little secret, which is good news if you are a union for data scientists, but that is bad news for companies committed, or committing, to Big Data.  The article talks about the amount of time and cost of data preparation before analytics can be run.  And even then, there are lingering questions about the veracity of the data and therefore the analytics results.  There are other dirty secrets about Big Data, such as (1) copying highly structured, canonically organized and semantically mapped data from enterprise systems and IoT devices to an unstructured, schemaless and unmapped Big Data system, (2) the security, latency and multiple copies of data as it is moved from a Big Data Lake, through a Big Data Refinery into an analytics/graph database, and (3) the inability to connect the analytics backend with enterprise operations interactively in near real-time.

WhamTech offers a unique index-based federated data access system called SmartData Fabric™ that maintains its own clean indexes, seamlessly and automatically addresses all data preparation aspects of data security, masking, tokenization, encryption, governance, quality, aggregation and links/relationships, and master data.  SmartData Fabric™ also complements and leverages existing systems, applications and tools such as data governance, data transformation in ETL and master data management.  WhamTech is actively working with companies to (a) provision highly curated data to Big Data/analytics environments with all data prepared for ready-to-run reporting, BI and analytics, and (b) enable distributed query processing, on the edge.  The near real-time, distributed architecture is ideal for almost all applications, including analytics with the advent of segmented datasets in Big Data/NoSQL databases and the rise of Distributed R, for example.  The architecture also lends itself to near real-time interaction between the reporting, BI and analytics applications, and enterprise operations, through event processing, BPM, CRM and operational dashboards.

from Gavin Robertson, Chief Technology Officer at WhamTech.

 

Leave a Reply

Your email address will not be published. Required fields are marked *