There’s a point in most environments where things stop feeling simple, but nothing is obviously broken. Deployments are still working. Pipelines are still running. Nobody is blocked. From the outside, everything looks fine.
That’s usually where the problem starts.
Because what you’re seeing at that stage isn’t control. It’s momentum. The system is still moving in the direction it was set up to move, but you’ve lost a clear understanding of why it behaves the way it does. And identity is almost always at the center of that.
Early on, you can get away with a lot. Smaller teams, fewer subscriptions, less complexity. You can answer access questions off the top of your head because there just aren’t that many moving parts. Someone needs access, you grant it, and it doesn’t feel like a big deal because the impact is limited and you can still keep track of it.
That doesn’t scale.
What actually happens is that access decisions start getting made in response to immediate needs instead of being part of a design. Someone needs to deploy, so they get Contributor. Someone needs to troubleshoot, so they get broader visibility than they probably should. A pipeline needs permissions, so you reuse something that already exists instead of creating something scoped properly.
None of those decisions are wrong on their own. If anything, they’re usually the fastest way to keep things moving. The issue is that nobody circles back to look at what those decisions are creating over time.
So access accumulates.
It doesn’t stay aligned to responsibility. It doesn’t stay scoped to intent. It just grows, and it grows in a way that isn’t easy to reason about anymore. You don’t notice it immediately because nothing breaks when it happens. The environment keeps working, which reinforces the idea that everything is fine. Then you hit a point where someone asks a simple question and it takes longer than it should to answer.
Who has access to this resource?
If you have to stop and think, or worse, go look it up, that’s already a signal. Not because you should have everything memorized, but because the model itself should be predictable enough that the answer is obvious. When it isn’t, that’s not a knowledge gap. That’s a design problem. I’ve seen this play out in environments that were otherwise well built. The management group structure was solid. Platform and workload separation made sense. Networking followed a clean hub-and-spoke model. Everything that usually gets attention was done right.
Identity wasn’t.
The issue didn’t show up until a change was made to something that wasn’t as isolated as the person making the change believed it was. It was part of a shared component, and that dependency wasn’t obvious unless you knew where to look. The person had access, so the change went through without friction. Nothing in the system pushed back on it. No policy blocked it. No boundary contained it. It wasn’t flagged because, from an access perspective, it was allowed. The impact didn’t stay where the change was made. It moved across anything that depended on that component. What should have been a small, contained change turned into something that took time to even understand, let alone fix. And when people started asking who could make that kind of change, the answer wasn’t immediate. That’s where things get uncomfortable. Because at that point, you’re not trying to fix infrastructure. You’re trying to understand how access was put together in the first place, and you realize pretty quickly that it wasn’t put together. It just happened.
That’s the difference between something being built and something being assembled.
A lot of the way identity gets treated comes from how people think about it. It’s often grouped into security, which makes it feel like one piece of a broader system. Networking gets its own attention. Compute gets its own attention. Data gets its own attention. Identity is just another concern to address somewhere along the way. That mental model doesn’t hold up. Identity isn’t sitting alongside those things. It’s what allows them to function at all. Every action in Azure is evaluated through identity. Whether something can be deployed, modified, or even read depends on how identity is defined and enforced. If that model is tight, the environment behaves in a way that makes sense. If it isn’t, you start getting outcomes that are hard to predict. Azure doesn’t correct any of this for you. It doesn’t second guess your assignments or try to enforce a better pattern. It applies exactly what you define, and it applies it consistently. That consistency is what makes small decisions matter more than people expect. When you assign access at a higher scope, you’re not just solving the immediate problem in front of you. You’re defining behavior across everything that sits underneath that scope. That might be a handful of resources today, but it won’t stay that way.
The model itself is simple.

Everything starts in Entra ID. Identities are grouped based on what they’re responsible for, not who they are. Those groups get mapped to roles, and those roles are assigned at specific scopes. Policy adds guardrails on top of that, and PIM controls when elevated access is actually active. What matters isn’t the individual pieces. It’s how cleanly they line up. If groups don’t represent real responsibility, RBAC becomes inconsistent. If RBAC is inconsistent, scope stops meaning anything. If scope doesn’t mean anything, policy becomes the only thing trying to hold the environment together.
That’s not where you want to be.
The same idea applies to how Azure structures itself.

Control flows down. Whatever you define higher up gets inherited by everything below it. That’s what gives you consistency, but it also means you don’t get to isolate mistakes. If something is too broad at the management group level, it’s broad everywhere. Most of the problems I’ve seen don’t come from people not understanding this. They come from the tradeoffs people make when things need to move. It’s easier to grant access at the subscription level than it is to figure out the exact resource group. It’s faster to assign a role directly to a user than it is to manage a group. It’s convenient to reuse an identity that already works instead of creating a new one that’s scoped correctly. None of that looks like a problem in the moment. It looks like progress.
But those decisions don’t stay isolated. They become part of the system, and the system starts reflecting them.
At some point, you end up with access that exists because it was needed once, not because it still makes sense. And removing it becomes harder than adding it was, so it stays. That’s how environments drift. If you want to pull that back under control, the changes aren’t complicated, but they do require discipline. Start with how you define groups. If a group doesn’t clearly map to a responsibility, it’s not helping you. It’s just another place where access can get lost. Groups should describe what someone does in the environment, not just collect people together. Then look at how roles are assigned. Direct user assignments are easy early on, but they don’t hold up. They make it harder to see who actually has access and why. Moving everything through groups isn’t about following a rule. It’s about making the system understandable. Scope is where most environments get away from people. It takes more effort to keep access tight, so it’s usually where compromises happen. The problem is that scope defines impact. The broader the scope, the harder it is to contain mistakes. That’s not theoretical. You see it the moment something unexpected happens and the effect is larger than it should have been.
Elevation is another place where intent gets lost. Standing access at a high privilege level might feel like it removes friction, but it also removes control. If everything is always allowed, you don’t have a way to distinguish between normal behavior and something that needs attention. That’s where PIM actually matters. Not as a feature to check off, but as a way to make elevated access something that happens with intent and visibility.
The part that tends to get ignored the most is pipeline identity.
People spend time tightening access for users, and then pipelines get broad permissions because they need to work across environments. Those identities don’t get revisited often, and they end up with more access than anything else in the system. If you think about how often pipelines execute changes compared to humans, that imbalance should stand out. In a lot of environments, the most active identities are also the least constrained. That’s not something you notice until you look for it. Terraform can help with consistency, but only if identity is actually part of what you’re defining. If you’re only using it for infrastructure, you’re leaving one of the most important parts of the environment to manual changes. That’s where drift starts to show up. Managed identity is one of those things that gets recommended a lot, and for good reason. Removing credentials reduces risk in a very direct way. But it doesn’t replace the need to get RBAC right. It just changes how access is used, not whether it’s appropriate. At the end of all this, the question that matters is still simple.
If something changed right now, could you explain who had access to do it and why that access existed?
If you can, your model is probably in a good place. If you can’t, it’s not something that gets fixed by adding another layer on top. It gets fixed by going back to how identity is structured and tightening it until it makes sense again. That’s not fast work. It usually means removing access that people are used to having and putting better patterns in place. It slows things down in the short term, which is why it gets pushed off. But the longer it’s pushed off, the harder it is to unwind. Most teams don’t deal with this until they have to. By then, the environment has grown, dependencies are harder to trace, and every change has more impact. There’s a window where it’s still manageable.
That’s the time to deal with it.

Drop me a note, and let me know what you think