In order to ensure your job is running correctly and that all of your nodes are properly allocated to distribute the job load, you'll want to check CPU and Memory usage in the Dashboard. You can do this at a glance just by looking at the "Utilization" in your job card in your Dashboard (shown below).
This utilization will show you your average CPU utilization across all your nodes. So if you see 100%, this is a good indicator that your job is set up correctly. But, for example, if you're running a 4 node job and the CPU usage pegged at 25%, this probably means that your entire workload is running on 1 node rather than across all nodes. However, before trying to reconfigure your job, you should first dive a little deeper into the utilization statistics by clicking the Utilization button at the top of the job card (shown below).
When you click on this button you'll be presented with a new view of each individual machine. The top row will show you CPU utilization and the bottom row will show you memory usage as you can see below.
The above picture is a healthy 8 node job. You'll notice there are 8 blocks for each CPU and Memory representing each machine and its load. If you hover over any of the machine blocks with your mouse, you'll get the exact CPU load or exact memory utilization for that particular node. With this view, it should be easy to pinpoint a problem node so that you can quickly assess the situation if you think there is an issue.
This page will refresh every 30 seconds so that you can easily monitor your job without having to connect to the job and check on the utilization.
While we understand a lot can happen in 30 seconds, you also have the ability to review CPU and mem stats in real-time via command-line (CLI). To do this, simply connect to your job, as normal, and then follow these steps.
Once you launch a multi-node job in JARVICE, start by opening a new terminal. Click on the “Nimbix” button located in the bottom left corner, and click on “New Terminal Emulator”.
In the terminal windows, type: cat /etc/JARVICE/nodes (this is a text file list with all nodes in your job)
Find the node you'd like to monitor. Type "ssh <node name>" (no quotes)
Once connected to the node, type “top” (no quotes) to view CPU utilization.
Type “exit” (no quotes) to return to the head node, or simply close the terminal once finished. You can repeat this process on each node as needed.
If you have any questions on this feature, feel free to reach out to us at firstname.lastname@example.org and we'll be happy to assist.