Pedro Sousa, technical lead at Valtech and a Microsoft Azure MVP. With nearly 30 years of experience in IT, Pedro has mastered the art of managing infrastructures, ranging from small mission-critical setups to organizations with over 1000 users. But that’s not all – Pedro is also a renowned public speaker and an Azure techie. In recent years, he has delved into the world of DevOps practices and embraced the role of a cloud architect. Pedro has focused on cloud-first and hybrid scenarios by leading teams in designing and implementing solutions across the Microsoft and AWS stacks. With Pedro’s expertise, we can delve into real-time experiences and gain valuable insights into monitoring approaches for Azure.
Observability, a preferred term over traditional monitoring, is focused on comprehending the intricacies of complex systems. By gathering data from diverse sources, including logs, metrics, and traces, observability provides a holistic understanding of system behavior. Unlike the conventional bottom-up approach, where monitoring starts from the infrastructure level, an alternative top-down perspective is advocated. Beginning with an assessment of business needs and value, engineers prioritize revenue generation and business continuity. Key stakeholders seek a high-level overview, akin to a semaphore, indicating the system’s operational status. Engineers then address any warnings or issues to ensure uninterrupted operations and revenue flow. This top-down methodology enhances system comprehension and promotes effective observability practices.
Azure Monitor operates on core principles that revolve around data sources, central gathering, visualization, alert configuration, incident response, and continuous improvement. Data is collected from various sources like logs, metrics, and traces, and centralized in Azure Log Analytics workspace. Visualization options include Azure Monitor, ManageEngine, workbooks, and Power BI. Alerts are configured to trigger actions, facilitating incident response by teams. Azure Monitor emphasizes continuous improvement, necessitating regular review and refinement of monitoring setups to adapt and optimize performance. By adhering to these principles, organizations can proactively monitor their Azure environments and drive operational excellence.
Data collection in Azure Monitor consists of three key components: logging, metrics, and tracing. For effective logging, structured logs are essential for query ability, metric creation, and visualization. Determining the appropriate logging level helps balance the depth of information with storage costs. Metrics selection should align with the specific workload and resource usage, with careful consideration given to setting meaningful thresholds for alerting. Distributed tracing requires the use of unique identifiers and contextual information to track system flows accurately. By focusing on these aspects of data collection, Azure Monitor enables organizations to gain comprehensive insights into their systems and ensure optimal observability.
Business analysts serve as valuable intermediaries, representing the company’s interests and engaging with customers. Their role is crucial in helping technicians and engineers comprehend the true value of their workloads to both the company and its customers. To bridge the gap between business analysts and technical teams, it is essential to adopt principles akin to the collaborative nature of DevOps. By establishing a strong partnership with business analysts, who possess a deep understanding of business needs, we can effectively translate those requirements into actionable observability strategies. This approach follows a top-down methodology, where conversations with analysts lay the foundation for aligning business objectives with effective observability practices. Working together with business analysts empowers us to navigate the layers of technology with a clear focus on meeting customer expectations and achieving business success.
In the realm of alerting, it is crucial to establish clear guidelines for the teams responsible for responding to alerts. Defining the appropriate recipients and determining when action should be taken is paramount. Equally important is identifying the key metrics to measure and setting thresholds for triggering alerts. Striking the right balance is vital to avoid overwhelming team members with excessive alerts that may lead to apathy or disregard. Managing noise and minimizing flapping, which refers to the repeated firing of alerts in a short period, helps maintain focus and effectiveness. Additionally, testing plays a critical role. Selecting moments when human attention is less prone to distractions, such as lunchtime or weekends, allows for evaluating alert responsiveness and team vigilance. Planning and executing tests, including surprise alerts, help ensure preparedness and awareness within the team.
When it comes to observability, the principles remain consistent whether it’s in an on-premises or cloud environment. If you’re migrating from on-premises to Azure, you have the flexibility to seamlessly integrate your existing on-premises systems with Azure. Alternatively, you can leverage the advantages of the cloud by reimagining your observability practices and taking advantage of Azure’s capabilities. It’s worth noting that Azure Monitor supports monitoring for both cloud and on-premises environments. Therefore, considering observability as one of the initial factors during your transition is essential for a successful and comprehensive monitoring strategy.