This presentation was to leadership regarding ongoing work to identify a storage solution for a large scale searchable address database. The related product never went to market.
DB_Eval.pptx1450.5KB
Multi-Model Databases - Magic or Madness
Current State
- Default Choice: PostgreSQL (RDS / Aurora on AWS)
- Document DB: MongoDB / DocumentDB
- Concerns about clustering and data integrity.
- Graph DB: Neo4J, Neptune (TBD)
- Property style (Gremlin) or RDF (SPARQL) style; Neptune supports both.
- Search DB: Elastic Search (SOLR / Lucene)
Contenders
- Multi-Model Databases: MarkLogic, ArangoDB, OrientDB, FoundationDB, Microsoft SQL Server, Oracle DB NoSQL, Datastax, Redis.
- Each contender provides all-in-one capabilities for document, graph, and search.
Evaluation Criteria
- Functional and Non-functional Requirements:
- Fitness to specific data types and derivative data models.
- Ease of Use: Supported languages and documentation.
- Operational Characteristics: Deployment model, scaling, clustering, performance, security, backup and recovery, tools support (e.g., Tableau).
- Cost Considerations:
- Building a POC (Proof of Concept) use case.
Challenges and Concerns
- Data Population: Origin of data and AWS constraints.
- Feature Analysis: Navigating through hype to find practical feature sets.
- Performance: Concern if multi-model DBs are mediocre at each function.
- Vendor Support: Need for vendors to provide long-term engagement.
Execution Strategy
- Paper Study: Documenting features against evaluation criteria.
- Hands-On: Implementing small POCs with actual code.
- Vendor Engagement: Engaging vendors for long-term support.
POC (Proof of Concept)
- Data Ingestion: Ingest data from sources to a target schema.
- Search Capability: Demonstrate a search functionality for finding restaurants based on location, with expanding search radius.
Evaluation of Specific Databases
- MarkLogic:
- XML internal representation, supports RDF, JSON, XML, binaries.
- ACID compliance, entity matching, strong security controls.
- Known internal experience and relationship with the product.
- ArangoDB:
- Native multi-model with ACID compliance.
- Supports JOINs, unified query language (AQL), search functionality from IResearch acquisition.
- OrientDB:
- Started as document-focused, now multi-model.
- ACID compliance, uses SQL and supports JOINs, Lucene-based search.
Outcomes
- Guidelines for Usage: Recommendations for when to use multi-model databases.
- Guardrails: Direct users to internal resources for decision-making.
- Solutions Architecture: Aid in integrating solutions into specific designs.
Challenges and Concerns
- Data origin and AWS constraints.
- Digging through hype to evaluate feature sets.
- Performance concerns of multi-model databases.
- Ensuring contenders are close to best-in-class features.
- Historical performance issues of multi-model databases.