Start-Up Cohesity Cluster (Virtual Edition) After Graceful Shutdown

After a graceful shutdown it is important to ensure that the Cohesity cluster restarts properly. These instructions build up on the previous blog post Graceful Shutdown Cohesity Cluster (VE).

IMPORTANT: After an unplanned outage (e.g. power failure, etc.) of the environment the Cohesity cluster does not automatically enter “cluster halt mode”.
The cluster was not stopped properly which can lead to significant errors during the startup process or afterwards. In the best-case scenario, however, the Cohesity cluster should automatically start all services without the “support user” having to perform the startup routine.

Once the necessary maintenance tasks or changes have been completed the server or hypervisor can be powered on. As mentioned in the previous article VMware ESXi 8.0U3 is used here as the bare-metal hypervisor.
When ESXi has started you can select the “Exit Maintenance Mode” function after logging in via the category “Host” and selecting “Actions”. Depending on the settings the hosted virtual machines will start automatically. If this is not the case, then you have to start the virtual machine of the Cohesity node via the category “Virtual Machines”.
If you’re using multiple nodes (ideally spread across different physical hosts) you’ll need to repeat this process for each Cohesity VM. When the VM is selected click “Power on” to start it.

Clicking on the VM screen launches the web console. The start-up process for the individual services can be tracked in this web console. The start-up phase for the Cohesity node is completed when a login screen appears in the web console. Regardless of this the Web-UI may still take a few more minutes to become available as the web service may not be fully initialized yet.

As soon as the Web-UI is properly loaded and accessible an SSH session can be started using the “support user“.

IMPORTANT: Each time the Cohesity nodes are rebooted they create a new “SSH-Host-Key” for the “support user”. If an older Host-Key already exists on the client that wants to connect to the Cohesity nodes via SSH the establishment of the connection will be blocked with a security warning. This occurs because you’re using the same “Remote Computer Adress” as before but the “Remote Computer” is responding with a different fingerprint. Therefore, it’s possible that someone is actually spoofing the system you previously connect to (in this case the Cohesity Node). This is a Security Issue. Be aware to always check these circumstances and when it’s safe – like in our case – the information can be updated.

Within the following steps I will show you how to handle this issue if you want to connect from a Linux-Client or like in my case from a Windows-Client.

LINUX CLIENT:

  • The user’s SSH key identity can be found in your home directory in “~/.ssh
  • The keys a SSH server uses to identify itself when you login to it are located in “/etc/ssh/” and usually named something like “ssh_host_rsa_key
  • You can generate a fingerprint for a specific IPv4-Adress with the command ssh-keygen -R [HOST_IPv4]” or if you want to already specify the used port with “ssh-keygen -R [HOST_IPv4]:[PORT]

WINDOWS CLIENT:

  • Open the corresponding path with “Notepad++” or a different text editor –> “C:\Users[YOUR_USERNAME].ssh\known_hosts.txt
  • In the file you will find the entry with the IPv4-Adress of the Cohesity-Node which you can delete
  • The file “known_hosts.txt” can now be saved and closed

In the next step you can log in via SSH again. Select “yes” to confirm the connection and to save the corresponding Host-Key in “known_hosts.txt” (Windows) on the client. Once the login process is completed you can enter the command “iris_CLI” to call up the same named command line tool for cluster management.

The “iris_CLI” is the command line of the Cohesity cluster for management of the cluster and of course of the cluster services.
The “cluster start” command starts all cluster services. This process may take several minutes. The running services can be viewed with the command “cluster status”.

Once all cluster services have been started you can log in to the Web-UI.
If this is successful you can close the “iris_CLI” by entering “exit” and then log out of the “support user“.

NOTICE: The health status of the system should always be checked after such a restart. Furthermore, some data may no longer be representative in the first few minutes – however this will change within the next hour or with the next job. The comparison of statistics requires that all services are running at full capacity (this can take up to 10 minutes).
Finally the Cohesity cluster is fully functional again and the start-up process was successful.

Leave a Reply

Your email address will not be published. Required fields are marked *