Week 4 - Databases
Discussion
Welcome to Week 4! This week is all about databases. But before we go there, as always, we'll begin this week with a discussion of last week's asynchronous content.
Review
Your web service
- What language / framework did you use?
- What storage mechanism did you use?
- Tell us about the API design.
- Did you tackle any of the extra challenges? (e.g. authentication, logging, reverse proxy, etc.)
Docker
- What did you learn?
- Did you use Docker Compose or a single Dockerfile?
- How big are your Docker images for your application?
Any other interesting topics from the last week?
This Week
This week we'll talk about databases. Since all bootcamp participants have some previous experience with relational databases, we'll focus entirely NoSQL databases. NoSQL databases are a broad category of databases. In general NoSQL databases are designed to be more performant and scalable than relational database management systems (RDBMS). However, they come with some trade-offs a demand a new way of thinking about data modeling.
We'll begin by breaking down NoSQL into more distinct categories.
Document Databases are the most popular category of NoSQL databases. They are designed to store and retrieve documents, frequently stored using JSON-like data structures. Document databases are typically semi-structured and schemaless. Frequently, engineers tend to apply RDBMS patterns to document databases, which can lead to increased complexity performance issues.
Wide Column Databases are designed to store and retrieve data in a tabular format. However, unlike relational databases, they do not enforce a schema and behave differently when it comes to joins and aggregations. Cassandra, for instance, does not support joins and instead encourages denormalization, or duplication, of data.
Key-Value Databases are designed to store and retrieve data using a key-value interface. They are typically very fast and are often used as a cache or as a primary data store for applications that do not require complex queries.
Reading & Async Content
Overview of NoSQL Databases
This talk by Rick Houlihan at AWS provides a great overview on how to think about NoSQL databases and how they differ from relational databases.
MongoDB
You can experiment with MongoDB by running it locally or using a free tier instance on MongoDB Atlas. The following resources will help you get started:
- MongoDB Documentation
- MongoDB: The Definitive Guide, 3rd Edition (Shannon Bradshaw, Eoin Brazil, Kristina Chodorow)
Cassandra
You can experiment with Cassandra by running it locally or using a free tier instance on DataStax Astra (see pricing model). The following resources can help you get started:
- Cassandra Documentation
- Cassandra: The Definitive Guide, 3rd Edition (Jeff Carpenter, Eben Hewitt)
DynamoDB
DynamoDB is AWS's NoSQL database offering. It is a fully managed service that scales automatically and is highly available. You can use it with limitations in AWS's free tier. The following resources will help:
Neo4j
Neo4j is a graph database. You can experiment with Neo4j by running it locally or using the free tier of AuraDB, Neo4j's cloud offering. The following resources will help you get started:
Redis
Redis is a key-value database. You can experiment with Redis by running it locally or using the free tier. Redis also supports additional modules such as RedisJSON and RedisGraph to support document storage and graph storage capabilities. Some resources to get you started are:
Practice
A Shopping API.
In new pairs, build a simple ShoppingCart API with PUT and GET endpoints for orders.
Orders
have the following information:OrderReferenceNumber
,CustomerName
,CustomerAddress
,Items[]
,TotalPrice
,OrderDate
, andNotes
Items
have the following information:ItemName
,Quantity
,Price
, andDescription
- Generate some test data (try 10,000 - 1,000,000 orders)
- Benchmark your API endpoints.
Adjust your queries and indexing to improve your performance.
Some requirements to consider:
- You MUST have 1 (and only one)
PUT /orders
endpoint for creating/replacing orders. - You MUST have 1 (and only one)
GET /orders
endpoint for retrieving orders. - Your
GET
endpoint MUST support query parameters for filtering byOrderReferenceNumber
andCustomerName
. - Add a
GET /reports?item={ItemName}
endpoint that allows you to report the total sales and quantity of items sold for a given item.
Extra Challenge
Option 1. Re-implement using a different DB type
Re-implement this using a different storage solution. This could be:
- Document or Key-Value store (e.g. MongoDB, DynamoDB, Cassandra, etc.)
- Explore one of the major cloud offerings (e.g. DynamoDB, CosmosDB, Firestore, etc.)
- A graph database (e.g. Neo4j, Neptune, etc.)
- A time series database (e.g. InfluxDB, TimescaleDB, etc.)
- Other databases (e.g. Apache Accumulo, Apache Druid, HBase, CouchDB, etc.)
Explore additional storage solutions
Explore additional non-database storage solutions. This could be:
- Archive your data using S3
- Implement a cache using Redis or Memcached
- Explore text indexing and data analysis using Elasticsearch & Kibana