Virtual real-time data access, analytics, integration, sharing and interoperability
EIQ Products™
WhamTech EIQ Products™ have been designed and built to plug-and-play in existing architectures, and be solutions in their own right or be part of larger solutions
EIQ Products provide the virtual real-time data access, analytics, integration, sharing and interoperability capabilities for a number of solutions. In many cases, EIQ Products enable solutions that would either be too difficult or too expensive using conventional approaches. Organizations are awash with data challenges and almost none are exempt from the potential benefits that EIQ Products offer. Most solutions combine EIQ Product capabilities and can be broadly categorized as follows:
Federated Data Access, Information Sharing, Virtual Operational Data Stores and Cloud Data Services
Basic EIQ Products configuration provides cleansed, transformed and standardized indexes, and high quality query processing, without additional loads on data source systems. EIQ Products overcome one of the major disadvantages of conventional federated adapters - access to data when data source systems are unavailable. EIQ Products can optionally retrieve and assemble query results data by inverting indexes. Typically, middleware or other applications manage the interaction with adapter products through schema mapping to standard data models. EIQ Products are no different in that respect. WhamTech also offers a sub-middleware/simple middleware product called EIQ Federation Server™ to manage the interaction with, and among, EIQ Products, including other EIQ Federation Servers. Physical operational data stores (ODSs) are typically copies of operational/transactional data in original data formats and schemas, and stored separately from operational/transactional systems. ODSs are typically used to archive data for regulatory purposes and as a source for data warehouses, which in-turn are sources for data marts. The two major advantages that EIQ Product-based virtual ODSs have over physical ODSs are:
- DATA QUALITY - EIQ Products can clean, transform and standardize
data in indexes and result sets - the original data remains
untouched, thus allowing regulatory compliance, but also achieving
high quality queries and results.
- UNIVERSAL ACCESS SCHEMA - EIQ Products can present a standard data model to an application regardless of how the indexes (or data sources) are physically configured. Indexes can be mapped to more than one standard data model.
Cloud data services make data available regardless of its location. EIQ Products have a universal access schema that removes the need to directly specify individual data sources or their location. Data access occurs in the background or in the cloud. Various rules reflecting preferences can be imposed on results data, including de-duplication, ranking and LIFO/FIFO in the case that data sources receive the same queries from a different EIQ Product.
Virtual Data Warehouses and Virtual Data Marts
Physical data warehouses (DWs) and data marts (DMs) have two additional major advancements on ODSs in general, whether virtual or physical (note that numbering is continued from above):
- ARCHIVED DATA – Historic data can be retained along with
current data.
- DERIVED DATA - Pre-aggregated, pre-calculated, fuzzy, text and other indexes in addition to indexes that mirror data sources.
EIQ Products also have virtual (index) versions of the above two capabilities (3 and 4) enabling virtual DW and DM solutions that directly compare with physical DWs and DMs. WhamTech virtual DW and DM solutions enable applications such as business intelligence, data mining and logistics that were thought to be the sole domain of physical DW and DM solutions. For an interesting and controversial discussion on virtual data warehouses, read an article published by Information Management (also linked in the ARTICLES section).
Virtual or Hybrid (virtual and physical) Master Data Management
Conventional federated adapters for master data management (MDM) are difficult to implement either partially or in full because of three main problems that centralized databases/data warehouses solve:
- Data in data sources is unclean and in many
cases, unusable
- Data in data sources may not have indexes
available for querying, or data sources cannot accommodate improved
indexes such as fuzzy matching, address clean-up, name variation,
and pre-aggregated and pre-calculated fields
- Query performance, where data sources are not capable of executing more advanced queries, queries are slow, or queries are blocked due to load considerations and operational/transactional activity on data sources
However, copying large amounts of data into centralized databases/data warehouses can be expensive and time-consuming to establish and maintain, and raises data-related responsibility, accountability, security, privacy and legal issues. But perhaps, the most difficult problem is keeping everything up-to-date. WhamTech has worked with several MDM vendors to design a hybrid solution whereby more frequently accessed master data is copied to a centralized database or MDM hub, and less frequently accessed data stays in data sources. EIQ Products cleanse, transform and standardize data used to build indexes, and provide pointers as well as cross-references to data based on global IDs. Identification of the same customer, vendor or product, for example, in multiple data sources can be performed when indexes are built or once master data is in the MDM hub. EIQ Products fuzzy matching and link mapping can help in this process by finding similar data across multiple heterogeneous systems. Finally, a decision has to be made whether the data source systems are responsible for updating the master data or the master data updates data sources; this is called harmonization. More available soon on WhamTech Virtual or Hybrid Master Data Management solutions in a separate document.
Virtual Access to Mainframe Data Files
As a variation of the virtual ODS, WhamTech developed a solution to virtually access archived mainframe data files. An example of the solution approach is a recently completed project for a large, publically-traded environmental company to enable standard driver and SQL relational access to their archived IBM mainframe, Cobol-generated VMS data files:
WhamTech used the original archived very large data files as the data source and developed a VMS file format reader, a parser to build both hierarchical and relational indexes, and a light version of an ODBC driver. Now, SQL is submitted to an EIQ SuperAdapter that resolves queries and in-turn retrieves results data on the VMS data files. The EIQ SuperAdapter uses internal file pointer data to read specific sections of the very large data files to retrieve results data and avoid having to read the entire file. In this case, the customer company is in the process of moving all files from, and shutting down, the mainframe, with the goal of saving significant maintenance fees.
The solution could be extended to access live data files on mainframes similar to EIQ Products accessing other data sources.
Click here to read more on Virtual Access to Archived Mainframe Data
Files.
Click here to read about a proposed customer solution to transition
access, and migrate data, from legacy applications and databases to
modern applications and databases.
Intelligent Spider
More to come.
Virtual Link Mapping and Virtual Link Analysis
Most link analysis applications have the following limitations:
- Scalability: Data is moved in entirety into a
single database
- Federated access to data sources using
conventional adapters with associated poor results due to data quality
- Difficult to combine structured and
unstructured data
- No near-real-time updates
- No fuzzy matching, probabilities or favorability/threat scores
WhamTech developed an option that works in conjunction with its federated data access solution based on EIQ SuperAdapters to overcome the above-mentioned limitations. This option takes advantage of content indexes and allows the capture of links in near-real-time among the same or similar entities in specialized indexes, called Link Indexes™. Similar to content indexes, Link Indexes are created and maintained at the data source level, but not on data source systems, and can scale through distributed parallel processing across multiple disparate data sources. The combination of content indexes and Link Indexes enable virtual views of entity data and links between entity data. User-driven, interactive visualization displays only entity data and links of interest, as needed. Link Indexes capture the following five types of links for structured data sources:
- Internal data source, multiple table joins using
primary key (PK) - foreign key (FK) relationships
- Internal data source, single table self-joins using
PK-FK relationships
- Internal data source, single table self-joins
using same or similar data
- Internal data source, multiple table joins
using same or similar data
- External data source, multiple table joins using same or similar data
...and two types for unstructured data in either structured or unstructured sources:
- Structured data captured through entity
extraction using same or similar data
- Unstructured text search using same or similar data
For all the above types of links, EIQ Products can be selective about the entities used to establish links, as it may not prove of value to link all entities (types and data). Also, EIQ Products can apply fuzzy matching using a product from a third-party vendor for structured data and text matching algorithms for unstructured data. Probabilities of match can be calculated and stored with links, or calculated on-the-fly as links are analyzed. Plus, given threat/favorability scores for specific entities combined with probabilities, threat/favorability link analysis networks can be displayed. WhamTech recognized through its work on Web search engines that all networks and relationships can be represented through a combination of links (one-to-one). WhamTech uses its binary tree indexes to capture and maintain the links as link maps (one-to-many) in Link Indexes, and Boolean operations on bitmap representations to combine link maps for link analysis (many-to-many). In their basic form, Link Indexes are join indexes, which are pre-formed joins - both internal and external to data sources. Link Indexes can significantly cut down on the computing time and resources needed to execute queries involving joins. More complex queries, like nested selects, can take advantage of Link Indexes to execute n degrees of separation queries, again, without much computing time and resources needed. The more obvious use of Link Indexes is for link analysis applications, where one-to-many, many-to-one and many-to-many relationships between predefined entities, can be visually represented and subjected to social network analysis. As EIQ Products also provide federated data access in conjunction with Link Indexes, whereby an analyst can interact with data sources, without being aware of it, through link analysis visualization. EIQ Products can work with both commercial and open source link visualization tools. EIQ Products keep a log of all interactions within link analysis for subsequent use in legal proceedings or for probable cause. Analysts can retain link analysis networks and resume/retrieve them in subsequent sessions. More available soon on WhamTech Virtual Link Mapping and Virtual Link Analysis solutions in a separate document.
Social Media Analytics
More to come.
Living Networks (real-time link analysis)
Once an initial link analysis is performed, analysts typically save the file for subsequent retrieval, however, when retrieved, it contains dated information. The options usually involve completely refreshing the link analysis or with more advanced systems, incrementally updating it with data from queries to a database or federated queries to multiple data sources. Living Networks are an extension of WhamTech's virtual link mapping and link analysis that allows an analyst to subscribe to any updates occurring in near-real-time to a link analysis network. These updates are for entities or links that are either in the network or are within n degrees of separation from the network. When an update is identified for a particular retained network as a whole, in part, or specific entities and/or links, updates are automatically made available to the analyst for updating the network and/or the analyst is notified of the updates. Living Networks are a culmination of most of the capabilities that WhamTech's EIQ Products provide and could significantly improve an analyst's ability to find, represent, monitor and present complex information. This is particularly true when probabilities of both entities and links are combined with threat/favorability scores. More available soon on WhamTech Living Network solutions in a separate document.
eDiscovery
WhamTech OEMs an information geometry categorization tool from a large system integrator that uses it primarily with intelligence agencies. The tool provides a powerful way to search for documents and emails that are relevant to a legal case. This capability complements WhamTech's advanced text search, and link mapping and link analysis. Together, these tools comprise WhamTech's eDiscovery tool, called Teracase. Most eDiscovery tools rely heavily on text search and some include concept searches with varying degrees of success. WhamTech recognized the leading edge capabilities of the information geometry tool where it allows the end-user to quickly converge on a category model with relatively little input and time compared to other categorization tools. The category model is used as a filter either on its own or combination with other filters for use against the entire set of e-mails and documents. WhamTech's primary market is Early Case Assessment; both after a case had been identified and corporate preemption. WhamTech's secondary market is full eDiscovery. Other WhamTech solutions will benefit from either Teracase itself for information discovery or the information geometry categorization tool.
Teracase version 1.3 Features
-
Simple user access
-
Supported data sources:
-
E-mails and attachments
-
Documents (text, compressed, PDF, Microsoft Office documents, Microsoft Works, HTML and variants, WordPerfect and others)
-
-
De-duplication using "same as" algorithm, "similar as" coming soon
-
Case Management
-
Category Management
-
Search/Analysis with various filters (include/exclude, keywords with various options - stemming, synonym, phrase, etc., data source type, email filters - sent/received date-time, sender/recipient domain, sender/recipient email address, with/without attachments, document filters - type, last modified date-time, author, attachment/free-standing Save filters, apply saved filters
-
Results with options (list of matching text body id, path-link to open, data source type, ordered by keyword relevancy and category model score, n results per page, next n/previous n, view email/document text, tag selected, mark for a category model)
-
Create, save, refine category models
-
Score entire corpus for selected category models
-
Predefined tags (Hot - highly relevant email/documents that are found early and will most likely be shown at trial, Responsive - maybe, Non-responsive - irrelevant and Privileged - these are files governed by attorney/client privilege and will be presented to a judge (if the matter goes to trial), but can initially be set aside for review by the lawyer
-
User-assigned tag to any email/document
-
Teracase Administration
-
Audit log - log of all user actions at the web server level
Near-future features:
-
User management and role assignment
-
Billing Information - ingest volume size (before and after unroll and unzipped) and de-duplication by volume
-
Statistics and reports
-
Link mapping and link analysis for emails
-
Interactive visualization for both emails and documents
More available soon on WhamTech eDiscovery solutions in a separate document.