Disclaimer - Trust no one, use your brain! (Work continuously in progress)
- Blueprint/template for a new service
- Documentation, standards, guides (how-to, know-how documents)
- Team support
- Understanding of the whole process by each member
- Pro-active development and support
- Accepted responsibilities and duties for each stage of a service
- Plan for service live circle
- Pre-production development
- Launching
- Rollout backward compatible version of a service
- Hotfixing
- Rollout backward incompatible version of a service
- Data migration
- Switchover
- Service rollback
- Continuous development
- Tests
- Automated
- Unit tests
- Functional tests
- Code style (lints and sniffers)
- Code quality monitoring (Sonar, Scrutinizer)
- Code coverage checks
- Manual
- Feature acceptance/Business acceptance
- A/B tests
- Automated
- Conditions of integration
- Code style checks
- Test results
- Code coverage percentage
- Conditions of disintegration a feature
- Error rate after deploy live
- Helthchecks
- Storing a new tested snapshots/artefact of a service
- Artefact storage (Docker registry)
- Cleanup policy (Delete old tags with timeout)
- Artefact storage (Docker registry)
- Tests
- Continuous delivery of stable artefacts
- Images builder
- Services provisioning(Ansible)
- System layers
- Hardware: Servers and networks
- Scaling (adding new nodes) should not affect consistency of other layers
- Degradation (removing nodes) should not affect consistency of other layers
- Monitoring
- Hardware
- Network
- Resources and load
- Alerting policy
- Cluster: Services management system (Kubernetes, alternatives: OpenShift,Apache Mesos/Apache Karaf)
- Monitoring
- Availability of each node in the cluster
- All services up and running
- Connectivity between different pods and services
- Public endpoints accessibility
- Alerting policy
- Restart (full or partial) should bring cluster and systems up without destruction
- Log aggregation system - collect all logs from all containers
- Execution environment
- Meta-project with topology of the system
- Showroom + Staging
- Separate namespace for each showroom
- Fixed showroom for the staging (last stable pre-release)
- Production
- Configuration
- Secrets
- Configs should be a part of the meta-project
- Configuration
- Showroom + Staging
- Meta-project with topology of the system
- Monitoring
- Service: Application and any service
- Service itself (Docker image)
- Backward compatibility for a few generations
- Cleanup policy for deprecated/unused:
- Logic branches
- Data structures (RDBMS/NoSql)
- Cleanup policy for deprecated/unused:
- One container - one process
- Segregated commands even in one image (management layer can pick any to run)
- Built in commands
- Test service/source code (docker compose to setup required test ENV)
- DEV/DEBUG mode
- Logging
- Writing in stdout (without using containers’ file system) will enforce cluster layer to keep all logs
- Monitoring
- Alerting policies (Prometeus, NewRelic)
- Tracing system agent (zipkin)
- Self-sufficiency
- Interfaces documentation
- Restful API
- Port and service description (README.md files)
- Service should be able to set itself up
- Interfaces documentation
- Backward compatibility for a few generations
- Replication, balancing and scaling on service level
- Failover and self-reorganisation in case of:
- Service crashed
- Physical node out of cluster
- Resources problems on specific node
- Logs system
- Service to collect and access logs grabbed from Cluster layer
- ELK stack/Gray Log/etc
- Service to collect and access logs grabbed from Cluster layer
- Persistent volumes to keep data
- EBS AWS
- Ceph
- NFS
- Service itself (Docker image)
- Hardware: Servers and networks
- Common services
- Tracing system (Zipkin)
- Single sign-on service
- Authentication service (JWT)
- Authorization requests from all services
- Detached processing (CQRS)
- Request-Queue-Processor schema
- Stream data addressing and processing (Reactor)
- Real Time data requests processing
- Reliable data provider/API gateway (sync data retrieving)
- Request-Manager-Service solution
- Reliable data provider/API gateway (sync data retrieving)
- Reliable data-bus for events
- Event-Broker-Subscriber solution (Apache Camel)
- Http/TCP API endpoint to accept events
- Event fulfilment (Earn required information for subscribers)
- Event delivery
- Event delivery policies
- Retry
- Reque
- Giveup
- Event-Broker-Subscriber solution (Apache Camel)
- RDBMS: Postgres cluster
- DB backups: PG backupper
- Key-value + Queue: Redis cluster
- Messages system: Rabbit MQ cluster
- Healthcheck system
- Alerting system