We’ve learned over the past month how much site reliability engineering depends on good old fashioned human interactionsView in browser »
The New Stack Update

ISSUE 248: Site Reliability Engineering Is Made of People

Talk Talk Talk

“Site Reliability Engineers not only need to get a firm grip on the technologies involved in the system, but also on the intricacies of production deployments.”

Add It Up
Data Scientists and Machine Learning Engineers Have Many Similar Job Roles

Building or running data infrastructure is an important part of 55% of 372 data engineers’ jobs, according to the “2020 Kaggle Machine Learning & Data Science Survey.” These data engineers are supporting data science applications as well as other use cases. Compared to their infrastructure duties, data engineers are actually a bit more likely (58%) to be analyzing and understanding data in order to influence decisions as part of their job.

Data scientists focus on analysis, which is not as important for machine learning (ML) engineers. Still, there are many similarities between the 2,421 data scientists and 937 machine learning (ML) engineers in the Kaggle survey, with about the same percentage improving ML models, as well as building/running an ML service to improve a product or service.

At 18%, data engineers are more than twice as likely as data scientists to use cloud-based software and APIs as their primary tool to analyze data. They also exhibited a greater likelihood to analyze data in the cloud. Local development environments like Jupyter Notebooks are most likely to be used by all the job roles we reviewed. Basic statistical software, which is defined as spreadsheets, is very popular among software engineers. This is a reminder that just because they know Python doesn’t mean developers will use data tools for data science.

What's Happening

Infrastructure as code is a movement ready to boom. It’s also emerging as one of the three pillars in cloud security that is bringing DevOps and security together in the evolving DevSecOps market, said Varun Badhwar, senior vice president, Prisma Cloud at Palo Alto Networks, in this episode of The New Stack Makers hosted by TNS founder and Publisher Alex Williams.

Infrastructure as Code is a Movement Ready to Boom

Site Reliability Engineering Is Made of People

So far 2021 has actually been pretty slow news-wise, at least in the cloud native computing community. This has given us the chance to go back and review some of the recently posted talks from the USENIX SRECon20, a virtual conference held in December.  

While at first glance, site reliability engineering (SRE) appears to be about removing all manual processes from operations to ensure the robustness of a system, in fact, we’ve learned over the past month how much it depends on good old fashioned human interactions.

As TNS London correspondent Jennifer Riggins explained in 2017, the priority of the SRE team is to make sure the systems stay strong and stable by spending at least half their time on development. They “think about the whole life cycle of software objects from their inception to their deployment to operation, refinement, and eventual, peaceful decommissioning,” Google researcher Chris Jones told her at the time.

The SRECon talks we heard anyway stressed looking beyond the numbers to the users themselves, in order to get a true understanding of how a system could serve users. In one presentation, AppDynamics Technology Evangelist Marco Coulter noted that “whenever a measure becomes a target, it ceases to be a good measure.” British economist Charles Goodhart, who, writing about managing U.K. monetary policy, explained “Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes.”

The lesson Coulter shared was to look beyond the metrics. Coulter recalled working for a hospital where the development team, responding to complaints from staff, set strict Service Level Objectives (SLOs) for a message queuing system to respond to queries within 10 seconds. The bottleneck, however, was not with the queuing system, so setting this SLO was counter-intuitive. It was only when the development team met with staff, and understood their “instinctive expectation” of when they wanted this data available, were they able to craft a Service Level Agreement (SLA) that was satisfactory to both parties.

Other SRE talks had similar themes of keeping humans in the loop. Moshe Zadka, senior site reliability engineer at Twisted Matrix Laboratories, talked about how SREs could use the Jupyter Notebooks, a tool created for the scientific community, to document their findings around incidents, and share them with others. And Stanford University researcher Deepti Raghavan shared information about POSH, a “data-aware” Linux shell she is helping build that could help SREs more easily process data from the command line, without all the effort that otherwise would be needed to write to an API to access data resources. Again the theme for this talk was making SRE work easier for people, by listening and responding to their needs.  

What Red Hat’s Purchase of StackRox Means for OpenShift

With its planned purchase of Kubernetes security provider StackRox, announced Thursday, Red Hat intends to use the company's technology to bolster what Red Hat calls a multilayer security approach for OpenShift customers running Kubernetes and containerized applications. With an emphasis on integrating security with containers and Kubernetes, Red Hat can make use of how StackRox’s security plugins cover applications consistently, from the very beginning of the CI/CD process and once applications are deployed.

Data Mesh Liberates Business Value from Data Lakes, Data Warehouses

Data mesh is not a type of technology, and one does not buy a data mesh or hire a data mesh provider. It is a way to describe an organizational structure. “Data mesh is just an organizational principle,” said Arsalan Tavakoli, senior vice president of field engineering at data management systems provider Databricks.

Flatcar Container Linux: The Ideal OS for Running Kubernetes at the Edge

This post, from TNS Analyst Janakiram MSV, will be the first in a series covering Flatcar Container Linux, covering everything you need to configure and deploy the operating system. Flatcar Container Linux is an OS that started as a fork from CoreOS Container Linux when Red Hat purchased CoreOS, and today, it remains one of the most compelling Linux distributions for the cloud native space. 

On The Road
Building a Scalable Strategy for Cloud Security // JAN. 26 // VIRTUAL @ 9AM PST, 10AM GMT, 12PM SGT


Building a Scalable Strategy for Cloud Security

DevSecOps represented a new way to think about DevOps in 2021. Today, it is now an established approach to scaling security in the cloud. Register today to learn how you and your team can use DevSecOps to scale security on cloud services and beyond. Register now!

The New Stack Makers podcast is available on: — Pocket CastsStitcher — Apple PodcastsOvercastSpotifyTuneIn

Technologists building and managing new stack architectures join us for short conversations at conferences out on the tech conference circuit. These are the people defining how applications are developed and managed at scale.
Copyright © 2021 The New Stack, All rights reserved.

Want to change how you receive these emails?
You can update your preferences or unsubscribe from this list

Email Marketing Powered by Mailchimp