Skip to main content

Week 4 - Databases

Discussion

Welcome to Week 4! This week is all about databases. But before we go there, as always, we'll begin this week with a discussion of last week's asynchronous content.

Review

Review & Discussion

Your web service

  • What language / framework did you use?
  • What storage mechanism did you use?
  • Tell us about the API design.
  • Did you tackle any of the extra challenges? (e.g. authentication, logging, reverse proxy, etc.)

Docker

  • What did you learn?
  • Did you use Docker Compose or a single Dockerfile?
  • How big are your Docker images for your application?

Any other interesting topics from the last week?

This Week

This week we'll talk about databases. Since all bootcamp participants have some previous experience with relational databases, we'll focus entirely NoSQL databases. NoSQL databases are a broad category of databases. In general NoSQL databases are designed to be more performant and scalable than relational database management systems (RDBMS). However, they come with some trade-offs a demand a new way of thinking about data modeling.

We'll begin by breaking down NoSQL into more distinct categories.

Breakdown of database categories

Document Databases are the most popular category of NoSQL databases. They are designed to store and retrieve documents, frequently stored using JSON-like data structures. Document databases are typically semi-structured and schemaless. Frequently, engineers tend to apply RDBMS patterns to document databases, which can lead to increased complexity performance issues.

Wide Column Databases are designed to store and retrieve data in a tabular format. However, unlike relational databases, they do not enforce a schema and behave differently when it comes to joins and aggregations. Cassandra, for instance, does not support joins and instead encourages denormalization, or duplication, of data.

Key-Value Databases are designed to store and retrieve data using a key-value interface. They are typically very fast and are often used as a cache or as a primary data store for applications that do not require complex queries.

Reading & Async Content

Overview of NoSQL Databases

This talk by Rick Houlihan at AWS provides a great overview on how to think about NoSQL databases and how they differ from relational databases.

MongoDB

You can experiment with MongoDB by running it locally or using a free tier instance on MongoDB Atlas. The following resources will help you get started:

Cassandra

You can experiment with Cassandra by running it locally or using a free tier instance on DataStax Astra (see pricing model). The following resources can help you get started:

DynamoDB

DynamoDB is AWS's NoSQL database offering. It is a fully managed service that scales automatically and is highly available. You can use it with limitations in AWS's free tier. The following resources will help:

Neo4j

Neo4j is a graph database. You can experiment with Neo4j by running it locally or using the free tier of AuraDB, Neo4j's cloud offering. The following resources will help you get started:

Redis

Redis is a key-value database. You can experiment with Redis by running it locally or using the free tier. Redis also supports additional modules such as RedisJSON and RedisGraph to support document storage and graph storage capabilities. Some resources to get you started are:

Practice

A Shopping API.

In new pairs, build a simple ShoppingCart API with PUT and GET endpoints for orders.

  1. Orders have the following information: OrderReferenceNumber, CustomerName, CustomerAddress, Items[], TotalPrice, OrderDate, and Notes
  2. Items have the following information: ItemName, Quantity, Price, and Description
  3. Generate some test data (try 10,000 - 1,000,000 orders)
  4. Benchmark your API endpoints.

Adjust your queries and indexing to improve your performance.

Some requirements to consider:

  1. You MUST have 1 (and only one) PUT /orders endpoint for creating/replacing orders.
  2. You MUST have 1 (and only one) GET /orders endpoint for retrieving orders.
  3. Your GET endpoint MUST support query parameters for filtering by OrderReferenceNumber and CustomerName.
  4. Add a GET /reports?item={ItemName} endpoint that allows you to report the total sales and quantity of items sold for a given item.

Extra Challenge

Option 1. Re-implement using a different DB type

Re-implement this using a different storage solution. This could be:

  1. Document or Key-Value store (e.g. MongoDB, DynamoDB, Cassandra, etc.)
  2. Explore one of the major cloud offerings (e.g. DynamoDB, CosmosDB, Firestore, etc.)
  3. A graph database (e.g. Neo4j, Neptune, etc.)
  4. A time series database (e.g. InfluxDB, TimescaleDB, etc.)
  5. Other databases (e.g. Apache Accumulo, Apache Druid, HBase, CouchDB, etc.)

Explore additional storage solutions

Explore additional non-database storage solutions. This could be:

  1. Archive your data using S3
  2. Implement a cache using Redis or Memcached
  3. Explore text indexing and data analysis using Elasticsearch & Kibana