Production & Infrastructure
Learn to deploy, operate, and scale software in production — from containers to incident response.
Programming Fundamentals
Essential programming concepts for infrastructure work.
Understanding how programs store and manage data
How programs make decisions and repeat actions
Organizing code into reusable, composable units
Catching and recovering from errors gracefully
Handling operations that take time: callbacks, promises, and async/await
Organizing code into reusable, shareable files
Operating Systems
How the OS manages processes, memory, and I/O.
Independent programs running on your computer and how the OS manages them
Lightweight units of execution within a process
How the OS and runtime allocate, use, and reclaim memory
How the OS organizes and stores data on disk
How programs communicate with the outside world through input and output
How programs wait for I/O and why non-blocking matters
Networking
Protocols, load balancing, and reverse proxies.
Two fundamental transport protocols and when to use each
The protocol that powers the web
Encrypting communication to keep data secure in transit
Translating human-readable domain names into IP addresses
Distributing traffic across multiple servers for reliability and performance
Intermediary servers that handle requests on behalf of backend services
Databases
Storage fundamentals — indexing, transactions, and replication.
Organizing data into tables with rows and columns
Speeding up queries with B-tree and hash indexes
Ensuring data consistency with atomic operations
Using caches to reduce database load and improve response times
Copying data across multiple servers for reliability and performance
Production Engineering
CI/CD, deployment, containers, observability, and incident response.
Automated testing on every push and automated releases to production
Packaging applications into portable containers that run anywhere
Running code without managing servers using functions as a service
Understanding what your application is doing in production through logs, metrics, and traces
Safely reverting bad deployments and responding to production incidents
Defining reliability targets and learning from failures through blameless postmortems