北肙

当你不能够再拥有,唯一可以做的,就是令自己不要忘记。

Grafana实现GPFS性能可观测 — DevMon更新

GPFS集群配置 1. 安装配置Net-SNMP 1.1 用yum命令安装软件包 yum install -y net-snmp 1.2 添加GPFS需要的共享库链接 libnetsnmpagent.so -- from Net-SNMP libnetsnmphelpers.so -- from Net-SNMP libnetsnmpmibs.so -- from Net-SNMP libnetsnmp.so -- from Net-SNMP libwrap.so -- from TCP Wrappers libcrypto.so -- from OpenSSL 这些文件通常在目录/lib64或者/usr/lib64或者/usr/local/lib64中,以libnetsnmpmibs.so为例说明: cd /usr/lib64 ln -s libnetsnmpmibs.so.5.1.2 libnetsnmpmibs.so # the version number could be different depends on your own environment […]

GPFS集群配置

1. 安装配置Net-SNMP

1.1 用yum命令安装软件包

yum install -y net-snmp

1.2 添加GPFS需要的共享库链接

libnetsnmpagent.so  -- from Net-SNMP 
libnetsnmphelpers.so  -- from Net-SNMP 
libnetsnmpmibs.so  -- from Net-SNMP 
libnetsnmp.so  -- from Net-SNMP 
libwrap.so  -- from TCP Wrappers 
libcrypto.so  -- from OpenSSL  

这些文件通常在目录/lib64或者/usr/lib64或者/usr/local/lib64中,以libnetsnmpmibs.so为例说明:

cd /usr/lib64 
ln -s libnetsnmpmibs.so.5.1.2 libnetsnmpmibs.so 
# the version number could be different depends on your own environment

1.3 配置GPFS采集节点Net-SNMP服务端

文件:/etc/snmp/snmpd.conf
添下如下内容:

# 以团体名public暴露所有OID给外部SNMP客户端
view all included .1
rocommunity public default

# 以下为GPFS SNMPD配置
master agentx
AgentXSocket tcp:localhost:705
trap2sink managementhost

managementhost:GPFS数据发送SNMP trap的主机

官网建议如果采集数据量大,添加如下内容:

agentXTimeout 60
agentXRetries 10

1.4 配置采集节点SNMP Agent

GPFS MIB库

文件: /usr/lpp/mmfs/data/GPFS-MIB.txt

1.4.1 配置MIB库

方法1:

复制或链接文件 (/usr/lpp/mmfs/data/GPFS-MIB.txt) 到本地SNMP MIB目录 (通常是/usr/share/snmp/mibs)

install -m 0644 /usr/lpp/mmfs/data/GPFS-MIB.txt /usr/share/snmp/mibs
方法2:

添国本地SNMP Agent配置 (配置文件通常为/etc/snmp/snmp.conf)

mibdirs +/usr/lpp/mmfs/data

1.4.2 配置MIB库可读

配置文件: /etc/snmp/snmp.conf

mibs +GPFS-MIB

重启SNMPD服务

systemctl enable snmpd
systemctl restart snmpd

one more thing

  1. the file /etc/snmp/snmp.conf is the configuration for SNMP Agent
  2. the file /etc/snmp/snmpd.conf is the configuration for SNMP Daemon
  3. All the actions be taken in the above section is ONLY for Agent
  4. The host which need to read the GPFS OID(s) should specify the MIB too. Which means:
    1. Copying or linking the MIB file to local MIB directory, or add configuration for the SNMP agent
    2. Adding configuration mibs ... to the local SNMP agent, or read OID(s) value(s) with -m GPFS-MIB option followed
    3. If all the above two steps can not be done, you can still read the values with -m /path/to/GPFS-MIB.txt option followed with snmpwalk or other commands

GPFS采集节点管理

支持的操作

  • 激活采集节点
  • 失效采集节点
  • 更改收集节点

激活采集节点并启动SNMP子服务

mmchnode --snmp-agent -N NodeName 

失效采集节点并关闭SNMP子服务

mmchnode --nosnmp-agent -N NodeName

查看SNMP Agent配置情况

mmlscluster | grep snmp

更改SNMP采集节点

mmchnode --nosnmp-agent -N OldNodeName
mmchnode --snmp-agent -N NewNodeName

开启和停止GPFS SNMP子系统

  1. SNMP子系统启停自动;
  2. SNMP子系统随GPFS集群启停;
  3. 激活SNMP采集节点时如GPFS集群已启动,SNMP子系统会自动开启;
  4. 失效SNMP采集节点时,该节点上SNMP子系统也会自动停止。

官网链接

Enable SNMP for GPFS Cluster

DevMon SNMP入口配置

YAML文件示例[^1]

[^1]: SNMP读取超时时间可稍微增大一些,避免GPFS SNMP子系统无响应(高配置服务器未测试,或可忽略。)

---  
# device address which runs SNMP agent  
address: GPFS_Collector_Node_Address  # the address which been listened to by SNMP Daemon  
region: SomeRegion  # The region of device, e.g., DCA, DCB, DataCenterA...  
area: SomeArea  # the business area  
addr_in_cmdb: SomeAddr  # the address which related with Resource ID in CMDB  
rid: 'THis is resource ID'  
snmp:  
  version: '2c'  # the SNMPD version  
  community: 'public'  # the SNMPD community  
  mib: 'GPFS-MIB'  
  timeout: 2  # second(s) to timeout, increase the value in case of reading faulty with busy OS  
  retries: 1  # time(s) to retry after failed  
  OIDs:  
  # the information of GPFS cluster  
  - id: 'gpfsClusterName'  
    label: gpfsClusterName  
    explanation: 'GPFS集群名称'  
    show: True  # add support for group showing  
  - id: 'gpfsClusterId'  
    label: gpfsClusterId  
    explanation: 'GPFS集群ID'  
    show: True  
  - id: 'gpfsClusterMinReleaseLevel'  
    label: gpfsClusterMinReleaseLevel  
    explanation: ''  
    show: True  
  - id: 'gpfsClusterNumNodes'  
    label: 'gpfsClusterNumNodes'  
    explanation: ''  
    show: True  
  - id: 'gpfsClusterNumFileSystems'  
    label: 'gpfsClusterNumFileSystems'  
    explanation: ''  
    show: True  

  - group:  
      - gpfsDiskPerfName  
      - gpfsDiskPerfFSName  
      - gpfsDiskPerfStgPoolName  
    label: gpfsStorgePoolName  
    show: True  

  - group:  
      - gpfsDiskReadTimeL  
      - gpfsDiskReadTimeH  
      - gpfsDiskWriteTimeL  
      - gpfsDiskWriteTimeH  
      - gpfsDiskLongestReadTimeL  
      - gpfsDiskLongestReadTimeH  
      - gpfsDiskLongestWriteTimeL  
      - gpfsDiskLongestWriteTimeH  
      - gpfsDiskShortestReadTimeL  
      - gpfsDiskShortestReadTimeH  
      - gpfsDiskShortestWriteTimeL  
      - gpfsDiskShortestWriteTimeH  
    label: gpfsPerfTime  
    perf: True  

  - group:  
      - gpfsDiskReadBytesL  
      - gpfsPerfBytes  
      - gpfsDiskReadBytesH  
      - gpfsDiskWriteBytesL  
      - gpfsDiskWriteBytesH  
      - gpfsDiskReadOps  
      - gpfsDiskWriteOps  
    label: gpfsPerfOps  
    perf: True  

  - group:  
      - gpfsFileSystemPerfName  
      - gpfsFileSystemBytesReadL  
      - gpfsFileSystemBytesReadH  
      - gpfsFileSystemBytesCacheL  
      - gpfsFileSystemBytesCacheH  
      - gpfsFileSystemBytesWrittenL  
      - gpfsFileSystemBytesWrittenH  
    label: gpfsFsSystemBytes  
    perf: True  

  - group:  
      - gpfsFileSystemReads  
      - gpfsFileSystemCaches  
      - gpfsFileSystemWrites  
    label: gpfsFsIO  
    perf: True  

  - group:  
      - gpfsFileSystemOpenCalls  
      - gpfsFileSystemCloseCalls  
      - gpfsFileSystemReadCalls  
      - gpfsFileSystemWriteCalls  
      - gpfsFileSystemReaddirCalls  
    label: gpfsFSSysCalls  
    perf: True  

  - group:  
      - gpfsFileSystemInodesWritten  
      - gpfsFileSystemInodesRead  
      - gpfsFileSystemInodesDeleted  
      - gpfsFileSystemInodesCreated  
    label: gpfsFSInodeIO  
    perf: True  

  - group:  
      - gpfsFileSystemStatCacheHit  
      - gpfsFileSystemStatCacheMiss  
    label: gpfsFSCacheNo  
    perf: True

DevMon持续读取GPFS性能数据

/path/to/venv/python3 devmon.py perf -s
# 默认每隔60s采样一次

InfluxDB安装配置(以Gentoo为例)

1. 添加包关键字

cat >> /etc/portage/package.accept_keywords << EOF 
dev-db/influx-cli ~amd64
dev-db/influxdb ~amd64
EOF

2. 安装软件包

emerge -avt influxdb influx-cli

3. 重新配置InfluxDB服务^2

文件/etc/init.d/influxdb

-- # command_args="-config ${config} ${influxd_opts}"
++ export INFLUXDB_CONFIG_PATH="/etc/influxdb/influxdb.conf"
++ command_args="run ${influxd_opts}"

4. 添加配置

文件/etc/influxdb/influxdb.conf

influxdb.conf on GitHub

5. 激活并启动服务

rc-update add influxdb default
/etc/init.d/influxdb start

本地安装Grafana^3

emerge -avt grafana-bin

Grafana配置看板

样例

JSON模板

GPFS Performance Observability Json Model

Leave a Reply

Your email address will not be published. Required fields are marked *