Multi-Model Database Evaluation

This presentation was to leadership regarding ongoing work to identify a storage solution for a large scale searchable address database. The related product never went to market.

DB_Eval.pptx1450.5KB

Multi-Model Databases - Magic or Madness

Current State

  • Default Choice: PostgreSQL (RDS / Aurora on AWS)
  • Document DB: MongoDB / DocumentDB
    • Concerns about clustering and data integrity.
  • Graph DB: Neo4J, Neptune (TBD)
    • Property style (Gremlin) or RDF (SPARQL) style; Neptune supports both.
  • Search DB: Elastic Search (SOLR / Lucene)

Contenders

  • Multi-Model Databases: MarkLogic, ArangoDB, OrientDB, FoundationDB, Microsoft SQL Server, Oracle DB NoSQL, Datastax, Redis.
    • Each contender provides all-in-one capabilities for document, graph, and search.

Evaluation Criteria

  • Functional and Non-functional Requirements:
    • Fitness to specific data types and derivative data models.
  • Ease of Use: Supported languages and documentation.
  • Operational Characteristics: Deployment model, scaling, clustering, performance, security, backup and recovery, tools support (e.g., Tableau).
  • Cost Considerations:
    • Building a POC (Proof of Concept) use case.

Challenges and Concerns

  • Data Population: Origin of data and AWS constraints.
  • Feature Analysis: Navigating through hype to find practical feature sets.
  • Performance: Concern if multi-model DBs are mediocre at each function.
  • Vendor Support: Need for vendors to provide long-term engagement.

Execution Strategy

  1. Paper Study: Documenting features against evaluation criteria.
  2. Hands-On: Implementing small POCs with actual code.
  3. Vendor Engagement: Engaging vendors for long-term support.

POC (Proof of Concept)

  • Data Ingestion: Ingest data from sources to a target schema.
  • Search Capability: Demonstrate a search functionality for finding restaurants based on location, with expanding search radius.

Evaluation of Specific Databases

  • MarkLogic:
    • XML internal representation, supports RDF, JSON, XML, binaries.
    • ACID compliance, entity matching, strong security controls.
    • Known internal experience and relationship with the product.
  • ArangoDB:
    • Native multi-model with ACID compliance.
    • Supports JOINs, unified query language (AQL), search functionality from IResearch acquisition.
  • OrientDB:
    • Started as document-focused, now multi-model.
    • ACID compliance, uses SQL and supports JOINs, Lucene-based search.

Outcomes

  • Guidelines for Usage: Recommendations for when to use multi-model databases.
  • Guardrails: Direct users to internal resources for decision-making.
  • Solutions Architecture: Aid in integrating solutions into specific designs.

Challenges and Concerns

  • Data origin and AWS constraints.
  • Digging through hype to evaluate feature sets.
  • Performance concerns of multi-model databases.
  • Ensuring contenders are close to best-in-class features.
  • Historical performance issues of multi-model databases.