Solving the Big Data Crisis
SmartData Fabric® (SDF) adds a virtual layer of data governance and management, master data management and support for applications, including reporting, BI and analytics. Applications access SDF through standard JDBC and ODBC drivers, REST APIs and Web/data services, SQL and SQL variations, such as Oracle, PostgreSQL and SQL Server, that allow EIQ Products on their own and collectively to appear as though they are single, highly curated databases. Any application that uses standard access and SQL can work with EIQ Products. WhamTech has technology alliance partnerships with a number of specific application vendors such as Tableau and Cambridge Intelligence for KeyLines, because offers these application vendor customers unique data virtualization capabilities.
Most Big Data and projects fail, with Gartner reporting that only 15% make it to production. Successful data management comes from understanding fundamental truths:
- Each data source is designed and created for a specific application(s), which is(are) fit-for-purpose, and as a result, data sources and, in many cases, applications are generally siloed
- Each data source may have its own version of data governance and standards, but unlikely to be organization-wide
- Significant data management and master data management processes are needed to make data readily usable by a business user, whether in a:
- Data warehouse
- Big Data store/Data Lake
- Federated data access system
- Master data is the glue that enables integration of multiple disparate data sources, converting an operational data store of some form into a data warehouse/mart of some form
- Each data source is its own self-standing ontology/semantic data model
- Master data resolves and unifies entities within and across data sources using data from these sources
- Logical relationships enable the integration of entities and other data, and are based on the physical relationships within and across data sources
- Ultimately, master data enables all data sources to be viewed as a single ontology/semantic data model
Fostering a Healthy Data Reservoir
There are many reasons why data lakes are attractive for businesses; such as an increasingly large volume and number of available data types, overcoming poor access to original data sources, high query performance and supporting a large number of concurrent users. However, there are still a lot of hurdles for data lakes to overcome and ignoring these could cause a data lake to turn into a data swamp. These hurdles include:
- No built-in data processing, data management or master data management
- Multiple copies of data subsequently made from a Data Lake to eventually land curated and usable in an analytics environment
- CIO-level, government regulations or some similar high-level imposition needed to obtain all data, especially across organizations and internal organization boundaries
- Data ownership concerns
- On-soil/on-premise data retention requirements
- Security and privacy concerns
- Replicated data/storage concerns – even though low cost, can add up, especially in the Cloud
- Internally perceived competition to data warehouse, therefore, usually sold as something different and not core to the organization’s business, which leads to failure
- Difficult to integrate Big Data and Data Lakes into the organization’s business operations
SDF prevents data lakes from turning into data swamps and relieves overloaded systems, turning them into well-governed data reservoirs. The distributed nature of SDF eliminates a centralized environment by leaving all data where it resides or, as an option, in indexes; retrieving results only when it is needed.
These virtual views can represent data in multiple ways. In addition to Standard Data Model-based virtual views, there are also data object, business object and graph database/ontological/semantic virtual views.