TECHNOLOGIES

WhamTech combines a number of own unique and third-party open source technologies to provide a unique approach to data virtualization, federation, integration and interoperability solutions

UNIQUE APPROACH

The following is a summary of the basic steps in a unique approach process embodied in EIQ Products™- for more detail, see documents for various summaries and white papers.

  • External index and query (EIQ) Products are based on index and query processing layers created and maintained external to data sources, where the indexes consist of cleansed, transformed and standardized data read from data sources and then discarded -- data is not copied from data sources and stored somewhere else such as with a data warehouse
  • EIQ Product™ layers can reside behind or in front of corporate firewalls, be locally, regionally or centrally located, and absorb the heavy load of maintaining indexes, processing queries, monitoring changed data , etc.
  • At least twelve change data capture (CDC) options are used to read source data to create and maintain indexes, ranging from real-time to batch
  • Indexes typically have the same structures or schema as the respective data sources; relational, non-relational or unstructured - EIQ Product™ indexes do not normally need structure or schema transformation, apart from some splits and combinations, e.g., FULL_NAME to LAST_NAME, FIRST_NAME and MIDDLE_NAME
  • EIQ Products™ are accessed through standard drivers (ODBC, JDBC, OLE-DB, etc.) and  Web Services, and queries submitted in standard query language (ANSI SQL) - translations from PL/SQL, other SQL flavors and non-SQL query languages, e.g. NoSQL, OQL and SPARQL, are either being developed or enabled through third-party tools
  • Query processing typically yields a list of pointers back to raw data in data sources, e.g. record numbers, indexed key(s), URLs, RDFs or file positions, but there are exceptions, e.g., derived value index results, indexed view results, COUT, EXISTS and when results data is reassembled from inverted indexes
  • Raw results data is retrieved from data sources through standard or proprietary user-level access, and typically cleansed, transformed and standardized or optionally, left as raw data, for eventual presentation to calling applications

OTHER APPROACHES

Data warehouses, conventional federated adapters and enterprise search (to a lesser extent) are all conventional approaches to the same challenge of dealing with data; where it resides; how to access it; how to cope with "dirty data", typos, etc.; different types; different standards; different formats; different security; different locations; different owners; different systems; etc.

Each conventional approach has its advantages and disadvantages. WhamTech combined technologies from these three approaches to provide dataless hybrid products, retaining the advantages and overcoming the disadvantages of each approach.

For structured database-type queries, EIQ Product™ query success is the same as, if not better than data warehouses and considerably better than conventional federated adapters.

For unstructured text and metadata search, EIQ Product™ search success is similar to advanced search engines.

For combined structured database-type queries and unstructured text and metadata search, EIQ Product™ query and search success surpasses all conventional approaches.

The reason for the high query success is that EIQ Products™ uniquely have indexes that comprise cleansed, transformed and standardize representations of data that is read from data source logs, or some other means, and then discarded.  This enables queries submitted to the indexes to be highly successful.  Further, EIQ Products™ have advanced indexing including text search, fuzzy match, aggregation, calculation, compound, join, denormalized, embedded value and Link Indexes™.

EIQ Products™ retain the advantages of data warehouses:

  • Clean data -- indexes and results data
  • Multiple indexes and types
  • Query processing: multiple options
  • Security: data and access

…and at the same time, retain the primary advantage of conventional federated adapters:

  • Data remains at source

LEGACY DEVELOPMENT

EIQ Products™ derive their unique index and query processing technologies from previously marketed database and search products.

WhamTech and its predecessors developed a relational very large database (VLDB) technology, called D the Data Language and later, Thunderbolt, that was extremely fast compared to other similarly configured database systems.  This appealed to a niche data processing intensive market; however, the real technology differentiators were the unique index and query processing technologies now used in EIQ Products™, which index and process queries against almost any and all data sources.

From an operational and structural point of view, the relational index and query management system (RIQMS) embedded in EIQ Products™, is a conventional relational database technology, with virtual tables, indexes, and typical database operation commands and queries.  It is NOT a memory-resident system; NOT a read-only system; NOT a fully inverted database system; and NOT a retrieval-based storage system.  There are, however, significant performance and capability differences that distance EIQ Products™' RIQMS from other database and related technologies:

UNIQUE METHOD OF QUERY EXECUTION

EIQ Products™ RIQMS has a unique method of isolating, connecting, arranging (sorting), processing (updating), and presenting (displaying) data.  This unique technology enables real-time data isolation and access; no matter the database size, number of concurrent users, or query complexity. Unique Combination of Technologies EIQ Products RIQMS speed and advanced capabilities arise from a unique combination of three binary-level methodologies involving the three "Bs" of computing that are normally associated with static data warehousing, not on live data sources, as EIQ Products™ provide for, and would be very difficult to improve on:

  • Balanced binary trees that scale to billions of records, terabytes of data
  • Virtual bitmap representations of intermediate and final query result-sets - these can be integer lists or actual bitmaps, depending on field data node-level "data density"
  • Boolean operations on bitmaps

Balanced binary trees are a technology from the 1960s and the attraction then, as it is now, is that binary searches are considered to be the fastest method of searching ordered lists[1], however, there are a number of practical problems associated with balanced binary trees; all of which WhamTech has solved (and that is one of the secrets to our success):

  • PROBLEM: Levels tend to get very deep, whereby a binary tree consisting of a billion nodes, for example, needs 30 levels; this translates into time to traverse
    WHAMTECH SOLUTION: WhamTech's implementation does not conform to the conventional n = log2(x+1) balanced binary tree rule, where n = number of levels and x = number of nodes.  Instead, WhamTech's implementation "leaps" levels to make binary tree traverse-time measurable in microseconds rather than milliseconds or seconds
  • PROBLEM: Rebalancing and rotation after an insert or delete can take considerable time and a very large number of nodes can be affected
    WHAMTECH SOLUTION: The maximum number of nodes that need rotated is just over 100; however, typically only 10 to 15 nodes are rotated after an insert or delete
  • PROBLEM: A worst-case scenario of deletion of a top node, which is faced by almost all tree structures WHAMTECH SOLUTION: This is one of the easiest operations for EIQ Products™ RIQMS to deal with, as it fits the technology well With EIQ Products™ RIQMS, when a balanced binary tree is subjected to a query, the result is an integer list set of pointers to raw data in the data source.  For subsequent query-related operations, this set is either used directly or is converted to a Collection; an example is shown in the following diagram T1:

T1: Index binary tree with associated virtual bitmap (integer list) and physical bitmap

Complex queries are rendered simple by treating them as combinations of lightning-fast queries on multiple balanced binary tree indexes.  Once single-field Collections are isolated, they are combined using the full range of Boolean arithmetic operations to provide a complex query result set, as shown in the following example diagram T2:

T2: Boolean arithmetic operations on bitmaps to yield a query result-set bitmap

The bits in the final complex query result set represent the final result-set record numbers in the data source or multiple data sources.  Remember that Collections need not be bitmaps and are in most cases integer lists.  Boolean operations can be performed on actual, virtual bitmaps (integer lists) or both in combination.

EIQ Products™ RIQMS allows extremely fast queries and updates involving a minimal number of nodes, regardless of the size of the data source or the cardinality of the data.  WhamTech solved problems that have perplexed balanced binary tree researchers for decades and forced most companies to tend towards non-binary index tree structures, such as B+ trees, or other forms of indexes, where branches are > 2.  These non-binary tree structures do not allow for a simple 0 or 1 decision, but impose more complex decision-making algorithms on query processing at every node encountered, which increases traverse-time.

The core indexing technology code has remained untouched for over 20 years, as the algorithms and code are stable and bug-free.  As an example, WhamTech's legacy database product, Thunderbolt, was used at a NYSE $2+ billion per year revenue IT services company for mission critical 24/7 operations support for over 12 years, generating considerable revenue.

The intellectual property associated with EIQ Products™ RIQMS is indirectly protected through awarded patents - see patents for more information.

REAL-TIME INDEXES

Even though index updates were fast and WhamTech was using state-of-the-art sorting methods, in the new world of billions or records and terabytes of data, index updates were taking too long to execute.  This was primarily due to the integer lists of pointers to raw data in the data source, illustrated in diagram T1, were taking too long to sort after each batch of updates.  WhamTech took advantage of the remarkably fast binary tree balance maintenance method to use binary trees to sort the integer lists, resulting in extremely fast index updates, even with larger indexes.  The following diagram T3 illustrates these real-time indexes.

T3: Index binary tree for values with associated binary tree for an integer list that can be further converted to a virtual bitmap (integer list) or physical bitmap

EIQ Products™ real-time indexes achieve rates from 100s/1000s to 10s of 1000s of records inserted/updated per second on low-level servers.  Real-time indexes establish a new method for dealing with large-scale data and information issues, from active (or real-time) data warehousing to near real-time database performance thought only possible with memory-resident databases.  Many applications are tending towards real-time, e.g., interactive customer relationship management (iCRM), inventory management, supply chain management (SCM), and decision support systems (DSS).

The largest problem faced by database vendors, in general, is enabling simultaneous queries (simple and complex) and data changes (inserts, deletes, and updates) on data and indexes for VLDBs such that they remain synchronized.  High cardinality data, in particular, presents a challenge for high performance indexing.  WhamTech is developing indefinite scalability options involving memory cache, SSDs, high performance RAIDed HDDs and lower performance RAIDed HDDs for archive.

Most database systems are designed for transactions, a large number of users, and simple queries; for such systems, updates are mainly insertions and sequential.  Most data warehouses are designed to be normally static, with reduced subsets of data, a small number of users, and complex queries; for such systems, updates are typically batched regularly.  Complex queries on most database systems can cripple performance, particularly on VLDBs, but not EIQ Products™...

...EIQ Products™' unique RIQMS allows for an entirely new approach to near real-time data and information integration and sharing systems.

COMPLEMENTARY VS. REPLACEMENT TECHNOLOGY

EIQ Products™ are intended to complement, not replace, existing operations database systems, which are usually transaction-oriented, to enable:

  • Growth to accommodate a larger number of users
  • Additional capabilities for existing systems
  • Advanced capabilities not available with existing systems

EIQ Products™ are the ultimate non-intrusive system, as it externally indexes and processes queries against existing data sources, including most databases. EIQ Products™ also allow unstructured text search on databases as well as unstructured data sources such as files, documents and e-mail, and semi-structured data sources such as spreadsheets and XML. EIQ Products™ have a unique approach to querying semi-structured data sources using structured queries. EIQ Products™ can also execute structured queries on unstructured data if entity extraction tools are used to build indexes.

 A GREAT COMBINATION – DATABASE AND SEARCH TECHNOLOGIES

Many professionals in the database industry are predicting the convergence of database and search technologies.  Some database vendors already bundle limited search capabilities in with their products, and some search engine companies bundle limited database capabilities in their products. EIQ Products™ are a much more seamless integration of database and search technologies -- because the search technology itself is based on the same RIQMS as the database query technology.

It is estimated that 85% of all enterprise data exists outside structured databases, and with company mergers and internal reorganizations, the remaining 15% could be in separate databases, with different structures and field names. EIQ Products™, potentially offer one-stop access to all corporate data regardless where it resides through either structured queries, unstructured text searches or BOTH on the same data sources. EIQ Products™are truly a great combination of database and search technologies.

Reference: 1. C. William Gear, Applications and Algorithms in Computer Science, Science Research Associates, Inc, 1978, p. A107.

 SmartData Fabric™: UNLEASH the value of data.

For more related information, please visit the pages listed below.