Restarting the management agents in esxi the easy way

Restarting the management agents in esxi is usually the first thing I try whenever a host starts acting weird or shows up as "Not Responding" in vCenter. It's one of those classic "turn it off and back on again" tricks that works surprisingly often in the world of virtualization. The best part is that you can usually do this without actually affecting the virtual machines that are running on the host. It just gives the management side of things a much-needed kick in the pants.

If you've spent any time managing a VMware environment, you know the feeling of seeing that little red exclamation mark. Your heartbeat skips a beat because you think the whole server has crashed. But then you realize the VMs are still pinging, and the applications are running fine; it's just the host itself that has stopped talking to the rest of the world. That's exactly when you need to look into restarting those agents.

Why you'd actually need to do this

Most of the time, the management agents (specifically hostd and vpxa) are pretty stable. However, they aren't perfect. Sometimes they get overwhelmed by too many API calls, or maybe a backup job hung and left a process dangling. When that happens, the host might still be doing its job of running your servers, but you can't manage it. You can't migrate VMs, you can't change settings, and you definitely can't see performance stats.

It's a bit like a captain on a ship who is doing a great job steering, but the radio is broken. The ship is moving, the crew is working, but nobody on the shore knows what's going on. Restarting the management agents is basically fixing that radio. It reconnects the host to vCenter and lets the management tools see what's actually happening under the hood.

Using the DCUI for a quick fix

If you have physical access to the server—or more likely, access through an out-of-band management tool like iDRAC, iLO, or IPMI—the Direct Console User Interface (DCUI) is the safest way to go. This is that classic yellow and grey screen that looks like it belongs in the 90s.

First, you'll want to hit F2 to log in. You'll need your root credentials for this. Once you're in, just navigate down to Troubleshooting Options. Inside that menu, you'll see an option that says Restart Management Agents. It's pretty hard to miss.

When you select it, the system will ask you to confirm by pressing F11. Once you hit that key, the host will stop and start the services. You'll see a little progress bar or a series of status messages. Usually, it takes about thirty seconds to a minute. Once it's done, you can escape out of the menus and wait for vCenter to catch up. It's usually a pretty smooth process, and it solves about 90% of host connectivity issues.

Taking the SSH route

If the DCUI isn't an option or you just prefer the command line, SSH is your best friend. I personally find this faster if I already have a terminal open. However, you have to make sure SSH is actually enabled on the host. If the host is "Not Responding" in vCenter, you might not be able to turn SSH on through the GUI, so hopefully, you left it on or have another way in.

Once you've logged into the host via SSH as root, the command is simple:

services.sh restart

When you run this, you're going to see a whole bunch of text fly by. This command essentially runs a script that goes through every single service on the ESXi host and gives it a restart. It'll handle hostd, vpxa, the firewall, and a dozen other things.

One thing to keep in mind: if you're using things like VSAN or NSX, this command can take a bit longer. Don't panic if the screen hangs for a second while it's cycling through the storage or networking services. Just let it do its thing.

Restarting specific services

Sometimes, you don't want to restart everything. If you suspect it's just the vCenter agent causing the headache, you can be a bit more surgical. You can restart the main two services individually.

To restart the host agent, you'd run: /etc/init.d/hostd restart

To restart the vCenter agent (VPXA), you'd run: /etc/init.d/vpxa restart

Doing it this way is a bit more "gentle," but honestly, most people just run the full services.sh script because it's easier than guessing which specific service is being grumpy.

Does this affect my running VMs?

This is the big question everyone asks the first time they do this. No, restarting the management agents does not reboot or stop your virtual machines.

The VMs run on the VMkernel, which is separate from the management agents. When you restart hostd or vpxa, you are only restarting the processes that allow you (and vCenter) to talk to the kernel. It's like restarting the dashboard software in a car while you're driving down the highway. The engine keeps running, the wheels keep turning, but for a few seconds, you just can't see your speed or fuel level.

That being said, there is a tiny caveat. If you have some very specific automated tasks or third-party monitoring tools that are sensitive to timing, they might throw an alert because they lose connectivity for a minute. But as far as the actual workloads go, your SQL servers, web servers, and domain controllers won't even notice.

When restarting agents isn't enough

Occasionally, you'll run into a situation where restarting the management agents in esxi just doesn't cut it. Maybe the command hangs, or maybe the host reconnects for five minutes and then drops off again.

If the services.sh restart command hangs and never finishes, that's usually a sign of a deeper issue. It often means a process is stuck in an "uninterruptible sleep" state, often due to a storage issue. If the host is waiting for a response from a dead LUN or a wonky NFS share, the management agents might get stuck trying to inventory that storage.

In these cases, you might actually have to bite the bullet and reboot the host. But since you can't manage it, you might have to manually shut down or migrate VMs if you can get into them individually, or as a last resort, use the hardware reset button. But always try the agent restart first—it saves you so much time and avoids unnecessary downtime.

A few things to watch out for

While it's generally safe, there are a couple of "gotchas." For instance, if you have Host Power Management or certain high-availability (HA) settings configured, you might see a momentary alert in vCenter saying the host has lost its HA master status or something similar. This is normal. Once the agents are back up, vCenter will re-elect a master or re-sync the state, and the green checkmarks should return.

Also, if you are running a very old version of ESXi, there were some rare bugs where restarting services could cause issues with certain drivers, but in anything modern (6.7, 7.0, 8.0), it's a very standard troubleshooting step.

Another thing to remember is the ESXi Shell. If you can't get in via SSH and you don't have the DCUI open, but you do have physical access, you can sometimes toggle the shell with Alt+F1. It's basically the local version of SSH. It's saved me more than once when the network was completely toast.

Wrapping it up

At the end of the day, knowing how to handle restarting the management agents in esxi is a core skill for any sysadmin. It's the bridge between "everything is broken" and "everything is fine." It's quick, it's effective, and it's non-disruptive.

Next time you see a host acting sluggish or vCenter starts complaining that a host is disconnected, don't jump straight to the "Reboot" button. Take a breath, log into the DCUI or SSH, and give those agents a quick restart. Chances are, you'll be back in business in under two minutes, and your uptime stats will stay looking beautiful. It's a simple fix, but it's definitely one of the most useful ones in the VMware toolkit.