https://github.com/abhishektiwari/operational-excellence-primer

Operational Excellence Primer

https://github.com/abhishektiwari/operational-excellence-primer

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Operational Excellence Primer

Basic Info
  • Host: GitHub
  • Owner: abhishektiwari
  • License: mit
  • Default Branch: main
  • Size: 6.84 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 4 years ago · Last pushed about 4 years ago
Metadata Files
Readme License

README.md

Operational Excellence Primer

Design Principles

  1. Perform operations as code

    • Everything as a source code
    • Compliance, Infrastructure, Delivery Pipeline, Monitoring, Alerting, Playbook
  2. Make frequent, small, reversible changes

    • Feature toggles
    • Vertical vs horizontal slicing
    • Trunk-base development
    • Blue green deployments
  3. Refine operations procedures frequently

    • Run regular game days
    • Keep the playbook up to date
    • Perform chaos engineering experiments
  4. Anticipate failure

    • Perform pre-mortem exercises to identify source of failure
    • Murphy's Law - Expecting the Unexpected
    • Test your failure scenarios and validate impact
  5. Learn from all operational failures

    • Run blameless postmortem
    • Create organizational memory
    • Cause and effect analysis - 5 Why's, Fishbone diagram

How to do

  1. Preparation

    • Pre-mortam what could go wrong
    • GamesDay - chaos experiments
      • active failure vs. team tabletops- simulation
      • fresh understanding
    • Read and update playbooks
  2. Risk Management

    • Frequent, small, reversible changes
    • Feature toggling
    • Blue-green/canary/rolling deployments
    • Bring high-risk items ahead in the project timeline
  3. Troubleshooting

    • Triage first
      • make the system work as well as it can under the circumstances
    • Then Examine
      • Each component of the system
      • Metrics plotted as time series
      • Logs - particularly exception and errors
    • Then Diagnose
      • what changed - deployment or config changes
        • See of changes correlating with system bahaviour
      • divide and conquer
        • data flow between components - distributed tracing
        • divide diagnosis by layers or steps
    • Finally test and treat
    • Pitfalls
      • Looking at symptoms that aren’t relevant
      • misunderstanding the meaning of system metric
  4. Event Response

    • A clear and well-defined line of command
    • Delegated roles and responsibilities
      • Incident commander
      • Response team
      • Communication lead
      • Planning lead
    • Record the state and actions
      • all details of an incident
      • every action on debugging and mitigation
  5. Root Cause Analaysis

    • 5 Whys - cause and effect
    • Fishbone diagram
  6. Organizational Learning

    • Creating, retaining, and transferring knowledge within an organization
    • Every problem as an opportunity to build a better organization response
    • Sharing and transparency
      • applied to all systems and teams organization-wide
      • Conduct cross-team reviews of postmortems
      • Postmartems as a source code (linking to improvements)
    • Run blameless postmortems
    • Review postmartems culture
    • Corrective or Preventive Actions

Owner

  • Name: Abhishek Tiwari
  • Login: abhishektiwari
  • Kind: user
  • Location: NY
  • Company: Amazon

Tech Savant, Servant Leade.

GitHub Events

Total
Last Year

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 4
  • Total Committers: 1
  • Avg Commits per committer: 4.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Abhishek Tiwari a****k@a****m 4
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels