Designing for high scalability, reliability and performance has always been a balancing act. Over-engineer too much, too soon and you will waste time, effort and money. However, not thinking about scalability, reliability and performance until too late can leave you on a “burning platform” right at the moment of breakout success.
Linear Scalability, Fault-Tolerant systems, Self-healing Architectures, Zero Downtime (ZDT) installations, Automatic Virtual Machine (VM) Provisioning, Map-Reduce, Sharding, … Today’s technologies are easier to scale than yesterday’s. Nevertheless, many hidden pitfalls remain. For example:
- Are you web servers truly stateless or are they retaining state at a shared chokepoint?
- Should you shard or replicate? Now or after initial growth?
- Is your ORM creating excessive joins and relationships difficult to untangle if you change things?
- Will you lose or retain transactions if you bounce your servers while one is in-flight?
- Can you rollback a non-critical feature without breaking critical functionality?
- How long would it take you to recover if your primary data center fails? Have you ever tried this out?
Over the last 20 years we have designed and built high scale and high reliability architectures for Fortune‑500 companies (400 million transactions per day with 99.9999% reliability) and startups (where 99.9% reliability is fine but the ability to surge scalability 10-fold—in less than an hour—is essential).
We have found that a few hours of planning upfront can save weeks of emergency “in-flight” re-architecture later. Even worse it can often mean the difference between success and failure. We can help you avoid this risk and pain—without breaking the bank up front.