NVIDIA NVML GPU Statistics

12-25-2025 12-25-2025* blog** 15 minutes read (About 2214 words)** ** visits
Introduction

The NVIDIA official utility nvidia-smi provides a lot of useful information about the GPU. It is built on top of the NVIDIA Management Library (NVML), which provides a set of APIs for monitoring the statistics of NVIDIA GPUs. In practice, sometimes we would like to monitor the GPU statistics in our custom applications.
In this blog post, I would like to discuss how to use NVIDIA NVML library to monitor the GPU statistics and replicated nvidia-smi dmon in a custom C++ application.
NVIDIA NVML GPU Statistics

NVIDIA-SMI DMON

nvidia-smi dmon will display basic GPU statistics, including power (pwr), GPU temperature (gtemp), mem…
12-25-2025 12-25-2025* blog** 15 minutes read (About 2214 words)** ** visits
Introduction

In this blog post, I would like to discuss how to use NVIDIA NVML library to monitor the GPU statistics and replicated nvidia-smi dmon in a custom C++ application.
NVIDIA NVML GPU Statistics

NVIDIA-SMI DMON

nvidia-smi dmon will display basic GPU statistics, including power (pwr), GPU temperature (gtemp), memory temperature (mtemp), GPU utilization (sm) (the percentage of time that at least one SM is being used), memory utilization (mem), encoder utilization (enc), decoder utilization (dec), JPEG utilization (jpg), OFA utilization (ofa), memory clock (mclk), and graphics clock (pclk). The following is an example of nvidia-smi dmon output:

```
123456
```	```
$ nvidia-smi dmon 0 8 42 - 1 11 0 0 0 0 405 502 0 14 43 - 0 1 0 0 0 0 7001 1492 0 15 43 - 0 1 0 0 0 0 7001 1492

In addition, `nvidia-smi dmon` can also display additional GPU Performance Metrics \(GPM\) for Hopper and later GPUs\. The following example shows how to display the GPMs for GPU activity \(`gract`\) \(same as the `sm` metric\), SM utilization \(`smutil`\) \(the percentage of SMs that are actively being used\), and FP16 activity \(`fp16`\)\.


|   |   |
| - | - |
| ```
1234567
``` | ```
$ nvidia-smi dmon --gpm-metrics 1,2,13    0     12     43      -      0      1      0      0      0      0   7001    682          -          -          -    0     12     43      -      0      1      0      0      0      0    810    532          -          -          -    0     11     43      -      2      7      0      0      0      0    810    495          1          1          0    0      9     43      -      1      7      0      0      0      0    810    502          9          8          0
``` |

The other GPM metrics can be queried using `nvidia-smi dmon --help`\.


|   |   |
| - | - |
| ```
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586
``` | ```
$ nvidia-smi dmon --help    GPU statistics are displayed in scrolling format with one line    per sampling interval. Metrics to be monitored can be adjusted    based on the width of terminal window. Monitoring is limited to    a maximum of 16 devices. If no devices are specified, then up to    first 16 supported devices under natural enumeration (starting    with GPU index 0) are used for monitoring purpose.    It is supported on Tesla, GRID, Quadro and limited GeForce products    for Kepler or newer GPUs under x64 and ppc64 bare metal Linux.    Note: On MIG-enabled GPUs, querying the utilization of encoder,    decoder, jpeg, ofa, gpu, and memory is not currently supported.    Usage: nvidia-smi dmon [options]    Options include:    [-i | --id]:          Comma separated Enumeration index, PCI bus ID or UUID    [-d | --delay]:       Collection delay/interval in seconds [default=1sec]    [-c | --count]:       Collect specified number of samples and exit    [-s | --select]:      One or more metrics [default=puc]                          Can be any of the following:                              p - Power Usage and Temperature                              u - Utilization                              c - Proc and Mem Clocks                              v - Power and Thermal Violations                              m - FB, Bar1 and CC Protected Memory                              e - ECC Errors and PCIe Replay errors                              t - PCIe Rx and Tx Throughput    [N/A | --gpm-metrics]: Comma-separated list of GPM metrics (no space in between) to watch                           Available metrics:                               Graphics Activity           = 1                               SM Activity                 = 2                               SM Occupancy                = 3                               Integer Activity            = 4                               Tensor Activity             = 5                               DFMA Tensor Activity        = 6                               HMMA Tensor Activity        = 7                               IMMA Tensor Activity        = 9                               DRAM Activity               = 10                               FP64 Activity               = 11                               FP32 Activity               = 12                               FP16 Activity               = 13                               PCIe TX                     = 20                               PCIe RX                     = 21                               NVDEC 0-7 Activity          = 30-37                               NVJPG 0-7 Activity          = 40-47                               NVOFA 0 Activity            = 50                               NVLink Total RX             = 60                               NVLink Total TX             = 61                               NVLink L0-17 RX             = 62,64,66,...,96                               NVLink L0-17 TX             = 63,65,67,...,97                               C2C TOTAL TX                = 100                               C2C TOTAL RX                = 101                               C2C DATA TX                 = 102                               C2C DATA RX                 = 103                               C2C LINK0-13 TOTAL TX       = 104,108,112,...,156                               C2C LINK0-13 TOTAL RX       = 105,109,113,...,157                               C2C LINK0-13 DATA TX        = 106,110,114,...,158                               C2C LINK0-13 DATA RX        = 107,111,115,...,159                               HOSTMEM CACHE HIT           = 160                               HOSTMEM CACHE MISS          = 161                               PEERMEM CACHE HIT           = 162                               PEERMEM CACHE MISS          = 163                               DRAM CACHE HIT              = 164                               DRAM CACHE MISS             = 165                               NVENC 0-3 Activity          = 166-169                               GR0-7 CTXSW CYCLES ELAPSED  = 170,175,180,...,205                               GR0-7 CTXSW CYCLES ACTIVE   = 171,176,181,...,206                               GR0-7 CTXSW REQUESTS        = 172,177,182,...,207                               GR0-7 CTXSW ACTIVE AVERAGE  = 173,178,183,...,208                               GR0-7 CTXSW ACTIVE PERCENT  = 174,179,184,...,209    [N/A | --gpm-options]: options of which level of GPM metrics to monitor:                              d  - Display Device level GPM Metrics only                              m  - Display MIG level GPM Metrics only                              dm - Display both Device and MIG level GPM Metrics only                              md - Display both Device and MIG level GPM Metrics only    [-o | --options]:     One or more from the following:                              D - Include Date (YYYYMMDD) in scrolling output                              T - Include Time (HH:MM:SS) in scrolling output    [-f | --filename]:    Log to a specified file, rather than to stdout    [-h | --help]:        Display help information    [N/A | --format]:     Output format specifiers:                               csv - Format dmon output as a CSV                               nounit - Remove units line from dmon output                               noheader - Remove heading line from dmon output
``` |

### GPU Stats Using NVIDIA NVML

In turns out that we could query the basic GPU statistics, including `sm`, `mem`, `enc`, `dec`, `jpg`, `ofa` using the [nvmlDeviceGetProcessesUtilizationInfo](https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1g0fe806054e54932231596396ea8f9c12) API, and the additional GPM statistics, including every GPM metric listed in the `nvidia-smi dmon --help`, using the [nvmlGpmMetricsGet](https://docs.nvidia.com/deploy/nvml-api/group__nvmlGpmFunctions.html#group__nvmlGpmFunctions_1g0f0408fc31522711493960dd7b47ba44) API\. All the GPM metric IDs can be found in the [nvmlGpmMetricId\_t](https://docs.nvidia.com/deploy/nvml-api/group__nvmlGpmEnums.html) definition\. For example, the GPM metric ID for `gract` is `NVML_GPM_METRIC_GRAPHICS_UTIL = 1`, for `smutil` is `NVML_GPM_METRIC_SM_UTIL = 2`, and for `fp16` is `NVML_GPM_METRIC_FP16_UTIL = 13`\.

The following `gpu_stats` program demonstrates the usage of the NVIDIA NVML library APIs mentioned above and will produce exactly the same output as `nvidia-smi dmon`\. The source code is also available in the [“NVIDIA NVML GPU Statistics”](https://github.com/leimao/NVIDIA-NVML-GPU-Statistics) repository on GitHub\.

gpu\_stats\.cpp


|   |   |
| - | - |
| ```
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478
``` | ```
#include <chrono>#include <cstring>#include <iomanip>#include <iostream>#include <map>#include <memory>#include <sstream>#include <string>#include <thread>#include <vector>#include <nvml.h>std::map<int, std::string> gpmMetricNames = {    {1, "gract"},  {2, "smutil"}, {3, "smoccu"}, {4, "intact"},  {5, "tenact"},    {6, "dfmact"}, {7, "hmmact"}, {9, "immact"}, {10, "dramac"}, {11, "fp64"},    {12, "fp32"},  {13, "fp16"},  {20, "pcitx"}, {21, "pcirx"},  {30, "nvd0"},    {31, "nvd1"},  {32, "nvd2"},  {33, "nvd3"},  {34, "nvd4"},   {35, "nvd5"},    {36, "nvd6"},  {37, "nvd7"},  {40, "nvj0"},  {41, "nvj1"},   {42, "nvj2"},    {43, "nvj3"},  {44, "nvj4"},  {45, "nvj5"},  {46, "nvj6"},   {47, "nvj7"},    {50, "ofa0"},  {60, "nvlrx"}, {61, "nvltx"}};struct GPUStats{    unsigned int power;       unsigned int gpuTemp;     int memTemp;              unsigned int smUtil;      unsigned int memUtil;     unsigned int encUtil;     unsigned int decUtil;     unsigned int jpgUtil;     unsigned int ofaUtil;     unsigned int memClock;                unsigned int smClock;                 std::map<int, double> gpmMetrics; };void printError(char const* func, nvmlReturn_t const result){    std::cerr << "Error in " << func << ": " << nvmlErrorString(result)              << std::endl;}bool getUtilization(nvmlDevice_t const device, GPUStats& stats){        stats.smUtil = 0;    stats.memUtil = 0;    stats.encUtil = 0;    stats.decUtil = 0;    stats.jpgUtil = 0;    stats.ofaUtil = 0;            nvmlProcessesUtilizationInfo_t procUtilInfo{};    memset(&procUtilInfo, 0, sizeof(procUtilInfo));    procUtilInfo.version = nvmlProcessesUtilizationInfo_v1;    procUtilInfo.lastSeenTimeStamp = 0;        nvmlReturn_t result{        nvmlDeviceGetProcessesUtilizationInfo(device, &procUtilInfo)};    if (result == NVML_ERROR_INSUFFICIENT_SIZE &&        procUtilInfo.processSamplesCount > 0)    {                std::vector<nvmlProcessUtilizationInfo_v1_t> procUtilArray(            procUtilInfo.processSamplesCount);        procUtilInfo.procUtilArray = procUtilArray.data();        result = nvmlDeviceGetProcessesUtilizationInfo(device, &procUtilInfo);        if (result == NVML_SUCCESS)        {                        for (unsigned int i{0}; i < procUtilInfo.processSamplesCount; ++i)            {                stats.smUtil = std::max(stats.smUtil, procUtilArray[i].smUtil);                stats.memUtil =                    std::max(stats.memUtil, procUtilArray[i].memUtil);                stats.encUtil =                    std::max(stats.encUtil, procUtilArray[i].encUtil);                stats.decUtil =                    std::max(stats.decUtil, procUtilArray[i].decUtil);                stats.jpgUtil =                    std::max(stats.jpgUtil, procUtilArray[i].jpgUtil);                stats.ofaUtil =                    std::max(stats.ofaUtil, procUtilArray[i].ofaUtil);            }            return true;        }    }            nvmlUtilization_t utilization{};    result = nvmlDeviceGetUtilizationRates(device, &utilization);    if (result == NVML_SUCCESS)    {        stats.smUtil = utilization.gpu;        stats.memUtil = utilization.memory;    }    unsigned int encoderUtil{}, encoderSamplingPeriod{};    result = nvmlDeviceGetEncoderUtilization(device, &encoderUtil,                                             &encoderSamplingPeriod);    if (result == NVML_SUCCESS)    {        stats.encUtil = encoderUtil;    }    unsigned int decoderUtil{}, decoderSamplingPeriod{};    result = nvmlDeviceGetDecoderUtilization(device, &decoderUtil,                                             &decoderSamplingPeriod);    if (result == NVML_SUCCESS)    {        stats.decUtil = decoderUtil;    }    return true;}bool getGPUStats(nvmlDevice_t const device, GPUStats& stats){    nvmlReturn_t result{};        result = nvmlDeviceGetPowerUsage(device, &stats.power);    if (result != NVML_SUCCESS)    {        stats.power = 0;    }    else    {        stats.power /= 1000;     }        result =        nvmlDeviceGetTemperature(device, NVML_TEMPERATURE_GPU, &stats.gpuTemp);    if (result != NVML_SUCCESS)    {        stats.gpuTemp = 0;    }        stats.memTemp = -1;        getUtilization(device, stats);        result = nvmlDeviceGetClockInfo(device, NVML_CLOCK_MEM, &stats.memClock);    if (result != NVML_SUCCESS)    {        stats.memClock = 0;    }    result = nvmlDeviceGetClockInfo(device, NVML_CLOCK_SM, &stats.smClock);    if (result != NVML_SUCCESS)    {        stats.smClock = 0;    }    return true;}bool getGPMMetrics(nvmlDevice_t const device, std::vector<int> const& metricIds,                   GPUStats& stats){    if (metricIds.empty())    {        return true;    }        auto gpmSampleDeleter = [](nvmlGpmSample_t* sample)    {        if (sample && *sample)        {            nvmlGpmSampleFree(*sample);        }        delete sample;    };        std::unique_ptr<nvmlGpmSample_t, decltype(gpmSampleDeleter)> sample1(        new nvmlGpmSample_t{}, gpmSampleDeleter);    nvmlReturn_t result{nvmlGpmSampleAlloc(sample1.get())};    if (result != NVML_SUCCESS)    {                for (int id : metricIds)        {            stats.gpmMetrics[id] = -1.0;        }        return false;    }    std::unique_ptr<nvmlGpmSample_t, decltype(gpmSampleDeleter)> sample2(        new nvmlGpmSample_t{}, gpmSampleDeleter);    result = nvmlGpmSampleAlloc(sample2.get());    if (result != NVML_SUCCESS)    {        for (int id : metricIds)        {            stats.gpmMetrics[id] = -1.0;        }        return false;    }        result = nvmlGpmSampleGet(device, *sample1);    if (result != NVML_SUCCESS)    {        for (int id : metricIds)        {            stats.gpmMetrics[id] = -1.0;        }        return false;    }        std::this_thread::sleep_for(std::chrono::milliseconds(100));        result = nvmlGpmSampleGet(device, *sample2);    if (result != NVML_SUCCESS)    {        for (int id : metricIds)        {            stats.gpmMetrics[id] = -1.0;        }        return false;    }        nvmlGpmMetricsGet_t metricsGet{};    memset(&metricsGet, 0, sizeof(metricsGet));    metricsGet.version = NVML_GPM_METRICS_GET_VERSION;    metricsGet.numMetrics = metricIds.size();    metricsGet.sample1 = *sample1;    metricsGet.sample2 = *sample2;        for (size_t i{0}; i < metricIds.size() && i < 210; ++i)    {        metricsGet.metrics[i].metricId =            static_cast<nvmlGpmMetricId_t>(metricIds[i]);    }        result = nvmlGpmMetricsGet(&metricsGet);    if (result == NVML_SUCCESS)    {        for (size_t i{0}; i < metricIds.size(); ++i)        {            stats.gpmMetrics[metricIds[i]] = metricsGet.metrics[i].value;        }    }    else    {        for (int id : metricIds)        {            stats.gpmMetrics[id] = -1.0;        }    }    return result == NVML_SUCCESS;}void printHeader(std::vector<int> const& gpmMetricIds){    std::cout << "# gpu    pwr  gtemp  mtemp     sm    mem    enc    dec    "                 "jpg    ofa   mclk   pclk";    for (int id : gpmMetricIds)    {        if (gpmMetricNames.find(id) != gpmMetricNames.end())        {            std::cout << std::setw(11) << gpmMetricNames[id];        }    }    std::cout << std::endl;    std::cout << "# Idx      W      C      C      %      %      %      %      "                 "%      %    MHz    MHz";    for (size_t i{0}; i < gpmMetricIds.size(); ++i)    {        std::cout << "      GPM:%";    }    std::cout << std::endl;}void printStats(unsigned int const deviceId, GPUStats const& stats,                std::vector<int> const& gpmMetricIds){    std::cout << std::setw(5) << deviceId;    std::cout << std::setw(7) << stats.power;    std::cout << std::setw(7) << stats.gpuTemp;    if (stats.memTemp >= 0)    {        std::cout << std::setw(7) << stats.memTemp;    }    else    {        std::cout << std::setw(7) << "-";    }    std::cout << std::setw(7) << stats.smUtil;    std::cout << std::setw(7) << stats.memUtil;    std::cout << std::setw(7) << stats.encUtil;    std::cout << std::setw(7) << stats.decUtil;    std::cout << std::setw(7) << stats.jpgUtil;    std::cout << std::setw(7) << stats.ofaUtil;    std::cout << std::setw(7) << stats.memClock;    std::cout << std::setw(7) << stats.smClock;    for (int id : gpmMetricIds)    {        if (stats.gpmMetrics.find(id) != stats.gpmMetrics.end())        {            double const value{stats.gpmMetrics.at(id)};            if (value < 0)            {                std::cout << std::setw(11) << "-";            }            else            {                std::cout << std::setw(11) << static_cast<int>(value);            }        }        else        {            std::cout << std::setw(11) << "-";        }    }    std::cout << std::endl;}std::vector<int> parseGpmMetrics(std::string const& str){    std::vector<int> metrics{};    std::stringstream ss{str};    std::string token{};    while (std::getline(ss, token, ','))    {        try        {            metrics.push_back(std::stoi(token));        }        catch (...)        {            std::cerr << "Invalid GPM metric ID: " << token << std::endl;        }    }    return metrics;}int main(int argc, char* argv[]){    std::vector<int> gpmMetricIds{};    int delay{1};      int count{-1};         for (int i{1}; i < argc; ++i)    {        std::string const arg{argv[i]};        if (arg == "--gpm-metrics" && i + 1 < argc)        {            gpmMetricIds = parseGpmMetrics(argv[++i]);        }        else if ((arg == "-d" || arg == "--delay") && i + 1 < argc)        {            delay = std::atoi(argv[++i]);        }        else if ((arg == "-c" || arg == "--count") && i + 1 < argc)        {            count = std::atoi(argv[++i]);        }        else if (arg == "-h" || arg == "--help")        {            std::cout << "Usage: " << argv[0] << " [options]" << std::endl                      << "Options:" << std::endl                      << "  --gpm-metrics <ids>  Comma-separated list of GPM "                         "metric IDs"                      << std::endl                      << "  -d, --delay <sec>    Collection delay/interval in "                         "seconds [default=1]"                      << std::endl                      << "  -c, --count <n>      Collect specified number of "                         "samples and exit"                      << std::endl                      << "  -h, --help           Display this help"                      << std::endl;            return 0;        }    }        nvmlReturn_t result{nvmlInit()};    if (result != NVML_SUCCESS)    {        printError("nvmlInit", result);        return 1;    }        unsigned int deviceCount{};    result = nvmlDeviceGetCount(&deviceCount);    if (result != NVML_SUCCESS)    {        printError("nvmlDeviceGetCount", result);        nvmlShutdown();        return 1;    }    if (deviceCount == 0)    {        std::cerr << "No NVIDIA GPUs found" << std::endl;        nvmlShutdown();        return 1;    }        std::vector<nvmlDevice_t> devices(deviceCount);    for (unsigned int i{0}; i < deviceCount; ++i)    {        result = nvmlDeviceGetHandleByIndex(i, &devices[i]);        if (result != NVML_SUCCESS)        {            printError("nvmlDeviceGetHandleByIndex", result);            nvmlShutdown();            return 1;        }    }        printHeader(gpmMetricIds);        int iteration{0};    while (count < 0 || iteration < count)    {        for (unsigned int i{0}; i < deviceCount; ++i)        {            GPUStats stats{};            getGPUStats(devices[i], stats);                        if (!gpmMetricIds.empty())            {                getGPMMetrics(devices[i], gpmMetricIds, stats);            }            printStats(i, stats, gpmMetricIds);        }        iteration++;        if (count < 0 || iteration < count)        {            std::this_thread::sleep_for(std::chrono::seconds(delay));        }    }        nvmlShutdown();    return 0;}
``` |

The `gpu_stats` program can be built and run using the following commands\.


|   |   |
| - | - |
| ```
1234567891011
``` | ```
$ g++ -o gpu_stats gpu_stats.cpp -I/usr/local/cuda/include -L/usr/local/cuda/lib64 -lnvidia-ml$ ./gpu_stats --gpm-metrics 1,2,13    0     19     34      -      3      2      0      0      0      0   7001    712         13         11          0    0     19     34      -      3      2      0      0      0      0   7001    637         11          8          0    0     18     34      -      3      2      0      0      0      0   7001    667          9         10          0    0     19     34      -      3      2      0      0      0      0   7001    577          8         10          0    0     18     34      -      3      2      0      0      0      0   7001    615          7          8          0
``` |

## References

- [NVIDIA Management Library](https://docs.nvidia.com/deploy/nvml-api/index.html)
- [NVML GPM API](https://docs.nvidia.com/deploy/nvml-api/group__GPM.html#group__GPM)

---

**
Introduction

NVIDIA NVML GPU Statistics

NVIDIA-SMI DMON

Introduction

NVIDIA NVML GPU Statistics

NVIDIA-SMI DMON

Similar Posts