Technologies Home Contact Search Site
Company
Solutions
Products
Benefits
Technologies
Documents
Customers
Articles
Demos

Patents

A unique approach to providing data and information integration, sharing and interoperability solutions

Unique Approach

bulletExternal index and query (EIQ) Products are based on index and query processing layers built and maintained external to data sources, where the indexes use cleansed, formatted and standardized data read from data sources and then discarded -- data remains stored in data sources
bulletEIQ Product layers reside behind or in front of corporate firewalls, can be locally, regionally or centrally located, and absorb the heavy load of maintaining indexes and processing queries
bulletAt least twelve change data capture (CDC) options are used to read source data to maintain indexes, ranging from real-time to batch
bullet Indexes typically have the same structures or schema as the respective data sources; relational, non-relational or unstructured -- there is no structure or schema transformation
bulletEIQ Products are accessed through standard drivers (ODBC, JDBC, OLE-DB, etc.) or  Web Services, and queries submitted in standard query language (SQL)
bulletQuery processing yields pointers back to raw data in data sources -- record numbers, indexed key(s), URLs (with or without locations), RDFs, file positions, etc.
bulletRaw data is retrieved from data sources through standard or proprietary user-level access, and cleansed, formatted and standardized for eventual provision to calling applications

Other Approaches

Data warehouse, conventional federated adapters and enterprise search (to a lesser extent) systems are all conventional approaches to the same challenge of dealing with data; where it resides; how to access it; how to cope with "dirty data", typos, etc.; different types; different standards; different formats; different security; different locations; different owners; different systems; etc.

Each conventional approach has its advantages and disadvantages. WhamTech combined technologies from these three approaches to provide dataless hybrid products, retaining the advantages and overcoming the disadvantages of each approach.

For structured database-type queries, query success is the same as, if not better than data warehouses and considerably better than conventional federated adapters.

For unstructured text search, query success is similar to search engines.

The reason for the high query success is that WhamTech products uniquely have indexes that comprise cleansed, formatted and standardize data that is read from data source logs, or some other means, and then discarded.  This enables queries submitted to the indexes to be highly successful. Further, WhamTech products have advanced indexing including text search, fuzzy, aggregation, calculation, compound, join, denormalized, embedded value and link indexes.

WhamTech products retain the advantages of data warehouse systems:

bulletClean data -- indexes and results data
bulletMultiple indexes and types
bulletQuery processing: multiple options
bulletSecurity: data and access

…and at the same time, retain the primary advantage of conventional federated adapters:

bulletData remains at source

Legacy Development

EIQ Products derive their unique index and query processing technologies from previously marketed database and search products.

WhamTech and its predecessors developed a relational very large database (VLDB) technology, called D the Data Language and later, Thunderbolt, that was extremely fast compared to other similarly configured database systems.  This appealed to a niche data processing intensive market; however, the real technology differentiator were the unique index and query processing technologies used and in the current EIQ Products, which index and process queries against almost any and all data sources.

From an operational and structural point of view, the relational index and query management system (RIQMS) embedded in EIQ Products, is a conventional relational database technology, with tables (virtual, in the case of EIQ Products), indexes, and typical database operation commands and queries.  It is NOT a memory-resident system; NOT a read-only system; NOT a fully inverted database system; and NOT a retrieval-based storage system.  There are, however, significant performance and capability differences that distance EIQ Products' RIQMS from other database and related technologies:

Unique Method of Query Execution

WhamTech's RIQMS has a unique method of isolating, connecting, arranging (sorting), processing (updating), and presenting (displaying) data.  This unique technology enables real-time data isolation and access; no matter the database size, number of concurrent users, or query complexity.

Unique Combination of Technologies

WhamTech's RIQMS' speed and advanced capabilities arise from a unique combination of three binary-level methodologies involving the three "Bs" of computing that are normally associated with static data warehousing, not on live data sources, as WhamTech provides for, and would be very difficult to improve on:

bulletBalanced binary trees that scale to billions of records, terabytes of data
bulletVirtual bitmap representations of intermediate and final query result-sets (known as Collections) - these can be integer lists or actual bitmaps, depending on field data node-level "data density"
bulletBoolean operations on Collections

Balanced binary trees are a technology from the 1960s and the attraction then, as it is now, is that binary searches are considered to be the fastest method of searching ordered lists[1], however, there are a number of practical problems associated with balanced binary trees; all of which WhamTech has solved (and that is the main secret to our success):

bulletPROBLEM: Levels tend to get very deep, whereby a binary tree consisting of a billion nodes, for example, needs 30 levels; this translates into time to traverse

WHAMTECH SOLUTION: WhamTech's implementation does not conform to the conventional n = log2(x+1) balanced binary tree rule, where n = number of levels and x = number of nodes.  Instead, WhamTech's implementation "leaps" levels to make binary tree traverse-time measurable in microseconds rather than milliseconds or seconds
 
bulletPROBLEM: Rebalancing and rotation after an insert or delete can take considerable time and a very large number of nodes can be affected

WHAMTECH SOLUTION: The maximum number of nodes that need rotated is just over 100; however, typically only 10 to 15 nodes are rotated after an insert or delete
 
bulletPROBLEM: A worst-case scenario of deletion of a top node, which is faced by almost all tree structures

WHAMTECH SOLUTION: This is one of the easiest operations for WhamTech's RIQMS to deal with, as it fits the technology well

With WhamTech's RIQMS, when a balanced binary tree is subjected to a query, the result is an integer list set of pointers to data in the data source.  For subsequent query-related operations, this set is either used directly or is converted to a Collection; an example is shown in the following diagram:

Complex queries are rendered simple by treating them as combinations of lightning-fast queries on multiple balanced binary tree indexes.  Once single-field Collections are isolated, they are combined using the full range of Boolean arithmetic operations to provide a complex query result set, as shown in the following example:

The bits in the final complex query result set represent the final result-set record numbers in the data source or multiple data sources.  Remember that Collections need not be bitmaps and are in most cases integer lists.  Boolean operations can be performed on actual, virtual bitmaps (integer lists) or both in combination.

WhamTech's RIQMS allows extremely fast queries and updates involving a minimal number of nodes, regardless of the size of the data source or the cardinality of the data.  WhamTech solved problems that have perplexed balanced binary tree researchers for decades and forced most companies to tend towards non-binary index tree structures, such as B+ trees, or other forms of indexes, where branches are > 2.  These non-binary tree structures do not allow for a simple 0 or 1 decision, but impose more complex decision-making algorithms on query processing at every node encountered, which increases traverse-time.

The core indexing technology code has remained untouched for over 20 years, as the algorithms and code are stable and bug-free.  As an example, WhamTech's legacy database product, Thunderbolt, was used at a NYSE $2+ billion per year revenue IT services company called ACS (www.acs-inc.com) for mission critical 24/7 operations support for over 12 years, generating considerable revenue for ACS.

The IP associated with WhamTech's RIQMS is indirectly protected through awarded patents (see Patents).

Real-time Indexes

WhamTech's real-time indexes achieve rates from 100s/1000s to 10s of 1000s of records inserted/updated per second on low-level servers - a high-end example achieved a query and insert rate of 80,000 records per second on a dual-933 MHz server.  Real-time indexes establish a new method for dealing with large-scale data and information issues, from active (or real-time) data warehousing to near real-time database performance thought only possible with memory-resident databases.  Many applications are tending towards real-time, e.g., interactive customer relationship management (iCRM), inventory management, supply chain management (SCM), and decision support systems (DSS).

The largest problem faced by database vendors, in general, is enabling simultaneous queries (simple and complex) and data changes (inserts, deletes, and updates) on data and indexes for VLDBs such that they remain synchronized.

Most database systems are designed for transactions, a large number of users, and simple queries; for such systems, updates are mainly insertions and sequential.  Most data warehouses are designed to be normally static, with reduced subsets of data, a small number of users, and complex queries; for such systems, updates are typically performed in regular batches.  Complex queries on most database systems can cripple performance, particularly on VLDBs, but not WhamTech's...

... WhamTech's unique RIQMS allows for an entirely new approach to near real-time data and information integration and sharing systems.

Complementary Versus Replacement Technology

WhamTech's EIQ Products are intended to complement, not replace, existing operations database systems, which are usually transaction-oriented, to enable:

bulletGrowth to accommodate a larger number of users
bulletAdditional capabilities for existing systems
bulletAdvanced capabilities not available with existing systems

EIQ Products are the ultimate non-intrusive system, as it externally indexes and processes queries against existing data sources, including most databases.  EIQ Products also allow unstructured text search on databases as well as unstructured data sources such as files, documents and e-mail, and semi-structured data sources such as spreadsheets and XML.  EIQ Products have a unique approach to querying semi-structured data sources using structured queries.  EIQ Products can also execute structured queries on unstructured data if entity extraction tools are used to build indexes.

A Great Combination - Database and Search Technologies

Many professionals in the database industry are predicting the convergence of database and search technologies.  Some database vendors already bundle limited search capabilities in with their products, and some search engine companies bundle limited database capabilities in their products.  EIQ Products are a much more seamless integration of database and search technologies -- because the search technology itself is based on the same RIQMS as the database query technology.

It is estimated that 85% of all enterprise data exists outside structured databases, and with company mergers and internal reorganizations, the remaining 15% could be in separate databases, with different structures and field names.  EIQ Products, potentially offer one-stop access to all corporate data regardless where it resides through either structured queries, unstructured text searches or BOTH on the same data sources.  EIQ Products are truly a great combination of database and search technologies.

Reference: 1. C. William Gear, Applications and Algorithms in Computer Science, Science Research Associates, Inc, 1978, p. A107.

Send To Printer

    

Home | Insider Login | Site Map
Copyright © 1998 - 2010 WhamTech, Inc • www.whamtech.com • 972-991-5700 • info@whamtech.com

U.S. Patents Pending