Multi-Model Database Evaluation

This presentation was to leadership regarding ongoing work to identify a storage solution for a large scale searchable address database. The related product never went to market.

DB_Eval.pptx1450.5KB

Multi-Model Databases - Magic or Madness

Current State

Default Choice: PostgreSQL (RDS / Aurora on AWS)
Document DB: MongoDB / DocumentDB

Concerns about clustering and data integrity.

Graph DB: Neo4J, Neptune (TBD)

Property style (Gremlin) or RDF (SPARQL) style; Neptune supports both.

Search DB: Elastic Search (SOLR / Lucene)

Contenders

Multi-Model Databases: MarkLogic, ArangoDB, OrientDB, FoundationDB, Microsoft SQL Server, Oracle DB NoSQL, Datastax, Redis.

Each contender provides all-in-one capabilities for document, graph, and search.

Evaluation Criteria

Functional and Non-functional Requirements:

Fitness to specific data types and derivative data models.

Ease of Use: Supported languages and documentation.
Operational Characteristics: Deployment model, scaling, clustering, performance, security, backup and recovery, tools support (e.g., Tableau).
Cost Considerations:

Building a POC (Proof of Concept) use case.

Challenges and Concerns

Data Population: Origin of data and AWS constraints.
Feature Analysis: Navigating through hype to find practical feature sets.
Performance: Concern if multi-model DBs are mediocre at each function.
Vendor Support: Need for vendors to provide long-term engagement.

Execution Strategy

Paper Study: Documenting features against evaluation criteria.
Hands-On: Implementing small POCs with actual code.
Vendor Engagement: Engaging vendors for long-term support.

POC (Proof of Concept)

Data Ingestion: Ingest data from sources to a target schema.
Search Capability: Demonstrate a search functionality for finding restaurants based on location, with expanding search radius.

Evaluation of Specific Databases

MarkLogic:

XML internal representation, supports RDF, JSON, XML, binaries.
ACID compliance, entity matching, strong security controls.
Known internal experience and relationship with the product.

ArangoDB:

Native multi-model with ACID compliance.
Supports JOINs, unified query language (AQL), search functionality from IResearch acquisition.

OrientDB:

Started as document-focused, now multi-model.
ACID compliance, uses SQL and supports JOINs, Lucene-based search.

Outcomes

Guidelines for Usage: Recommendations for when to use multi-model databases.
Guardrails: Direct users to internal resources for decision-making.
Solutions Architecture: Aid in integrating solutions into specific designs.

Challenges and Concerns

Data origin and AWS constraints.
Digging through hype to evaluate feature sets.
Performance concerns of multi-model databases.
Ensuring contenders are close to best-in-class features.
Historical performance issues of multi-model databases.