Common section

Preface

Thank you for picking up our book on observability engineering for modern software systems. Our goal is to help you develop a practice of observability within your engineering organization. This book is based on our experience as practitioners of observability, and as makers of observability tooling for users who want to improve their own observability practices.

As outspoken advocates for driving observability practices in software engineering, our hope is that this book can set a clear record of what observability means in the context of modern software systems. The term “observability” has seen quite a bit of recent uptake in the software development ecosystem. This book aims to help you separate facts from hype by providing a deep analysis of the following:

· What observability means in the context of software delivery and operations

· How to build the fundamental components that help you achieve observability

· The impact observability has on team dynamics

· Considerations for observability at scale

· Practical ways to build a culture of observability in your organization

Who This Book Is For

Observability predominantly focuses on achieving a better understanding of the way software operates in the real world. Therefore, this book is most useful for software engineers responsible for developing production applications. However, anyone who supports the operation of software in production will also greatly benefit from the content in this book.

Additionally, managers of software delivery and operations teams who are interested in understanding how the practice of observability can benefit their organization will find value in this book, particularly in the chapters that focus on team dynamics, culture, and scale.

Anyone who helps teams deliver and operate production software (for example, product managers, support engineers, and stakeholders) and is curious about this new thing called “observability” and why people are talking about it should also find this book useful.

Why We Wrote This Book

Observability has become a popular topic that has quickly garnered a lot of interest and attention. With its rise in popularity, “observability” has been unfortunately mischaracterized as a synonym for “monitoring” or “system telemetry.” Observability is a characteristic of software systems. Further, it’s a characteristic that can be effectively utilized only in production software systems when teams adopt new practices that support its ongoing development. Thus, introducing observability into your systems is both a technical challenge and a cultural challenge.

We are particularly passionate and outspoken about the topic of observability. We are so passionate about it that we started a company whose sole purpose is to bring the power of observability to all teams that manage production software. We spearheaded a new category of observability tools, and other vendors have followed suit.

While we all work for Honeycomb, our intention here is not to sell you on our tools. We have written this book to explain how and why we adapted the original concept of observability to manage modern software systems. You can achieve observability with different tools and in different ways. However, we believe that our dedication to advancing the practice of observability in the software industry makes us uniquely qualified to write a guide that describes—in great detail—the common challenges and effective solutions. You can apply the concepts in this book, regardless of your tool choices, to practice building production software systems with observability.

This book aims to give you a look at the various considerations, capabilities, and challenges associated with teams that practice using observability to manage their production software systems. At times, we may provide a look at what Honeycomb does as an example of how a common challenge has been addressed. These are not intended as endorsements of Honeycomb but rather as practical illustrations of abstract concepts. It is our goal to show you how to apply these same principles in other environments, regardless of the tools you use.

What You Will Learn

You will learn what observability is, how to identify an observable system, and why observability is best suited for managing modern software systems. You’ll learn how observability differs from monitoring, as well as why and when a different approach is necessary. We will also cover why industry trends have helped popularize the need for observability and how that fits into emerging spaces, like the cloud native ecosystem.

Next, we’ll cover the fundamentals of observability. We’ll examine why structured events are the building blocks of observable systems and how to stitch those events together into traces. Events are generated by telemetry that is built into your software, and you will learn about open source initiatives, like OpenTelemetry, that help jumpstart the instrumentation process. You will learn about the data-based investigative process used to locate the source of issues in observable systems, and how it differs substantially from the intuition-based investigative process used in traditional monitoring. You will also learn how observability and monitoring can coexist.

After an introduction to these fundamental technical concepts, you will learn about the social and cultural elements that often accompany the adoption of observability. Managing software in production is a team sport, and you will learn how observability can be used to help better shape team dynamics. You will learn about how observability fits into business processes, affects the software supply chain, and reveals hidden risks. You will also learn how to put both these technical and social concepts into practice when we examine how to use service-level objectives for more effective alerting and dive into the technical details behind why they make alerts both actionable and debuggable when using observability data.

Then, you’ll learn about inherent challenges when implementing observability solutions at scale. We’ll start with the considerations you should take into account when deciding whether to buy or build an observability solution. An essential property of observability solutions is that they must provide fast answers during iterative investigations. Therefore, we will show you how to address the inherent challenges of efficient data storage and retrieval when managing extremely large data sets. You will also learn when to introduce solutions like event sampling and how to navigate its trade-offs to find the right approach to fit your needs. You will also learn how to manage extremely large quantities of data with telemetry pipelines.

Finally, we look at organizational approaches to adopting a culture of observability. Beyond introducing observability to your team, you will learn practical ways to scale observability practices across an entire organization. You will learn how to identify and work with key stakeholders, use technical approaches to win allies, and make a business case for adopting observability practices.

We started writing this book nearly three years ago. Part of the reason it has taken so long to produce is that the observability landscape has been changing rapidly and practices are advancing. We believe this book is the most up-to-date and comprehensive look at the state of the art in observability practices as of the time of its publication. We hope you find the journey as fascinating as we have.

Conventions Used in This Book

The following typographical conventions are used in this book:

Italic

Indicates new terms, URLs, email addresses, filenames, and file extensions.

Constant width

Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords.

NOTE

This element signifies a general note.

Using Code Examples

Supplemental material (code examples, exercises, etc.) is available for download at https://oreil.ly/7IcWz.

If you have a technical question or a problem using the code examples, please send an email to bookquestions@oreilly.com.

This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission.

We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Observability Engineering by Charity Majors, Liz Fong-Jones, and George Miranda (O’Reilly). Copyright 2022 Hound Technology Inc., 978-1-492-07644-5.”

If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com.

O’Reilly Online Learning

NOTE

For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed.

Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com.

How to Contact Us

Please address comments and questions concerning this book to the publisher:

· O’Reilly Media, Inc.

· 1005 Gravenstein Highway North

· Sebastopol, CA 95472

· 800-998-9938 (in the United States or Canada)

· 707-829-0515 (international or local)

· 707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/observability-engineering.

Email bookquestions@oreilly.com to comment or ask technical questions about this book.

For news and information about our books and courses, visit https://oreilly.com.

Find us on LinkedIn: https://linkedin.com/company/oreilly-media

Watch us on YouTube: https://youtube.com/oreillymedia

Acknowledgments

This book would not have been possible without the support of executive sponsors at Honeycomb: many thanks to Christine Yen, Deirdre Mahon, and Jo Ann Sanders. Nor would this book have been possible without the domestic sponsors who put up with many odd hours, lost weekends, sleepless weeknights, and cranky partners: many thanks to Rebekah Howard, Elly Fong-Jones, and Dino Miranda. Without all of them, we would probably still be trying to find the time to fully develop and tie together the many ideas expressed in this book.

We’d especially like to thank additional contributors who have made the content in this book much stronger by sharing their varied perspectives and expertise. Chapter 16, “Efficient Data Storage”, was made possible by Ian Wilkes (author of Honeycomb’s Retriever engine, the basis of the case study), and Joan Smith (reviewing for technical accuracy of references to external literature). Chapter 14, “Observability and the Software Supply Chain”, was authored by Frank Chen, and Chapter 18, “Telemetry Management with Pipelines”, was authored by Suman Karumuri, and Ryan Katkov—all of whom we thank for sharing their knowledge and the lessons they’ve learned from managing incredibly large-scale applications with observability at Slack. Many thanks to Rachel (pie) Perkins for contributions to several early chapters in this book. And thanks to the many bees at Honeycomb who, over the years, have helped us explore what’s achievable with observability.

Finally, many thanks to our many external reviewers: Sarah Wells, Abby Bangser, Mads Hartmann, Jess Males, Robert Quinlivan, John Feminella, Cindy Sridharan, Ben Sigelman, and Daniel Spoonhower. We’ve revised our takes, incorporated broader viewpoints, and revisited concepts throughout the authoring process to ensure that we’re reflecting an inclusive state of the art in the world of observability. Although we (the authors of this book) all work for Honeycomb, our goal has always been to write an objective and inclusive book detailing how observability works in practice, regardless of specific tool choices. We thank our reviewers for keeping us honest and helping us develop a stronger and all-encompassing narrative.

Page

Contents

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!