GPFS集群配置
1. 安装配置Net-SNMP
1.1 用yum命令安装软件包
yum install -y net-snmp
1.2 添加GPFS需要的共享库链接
libnetsnmpagent.so -- from Net-SNMP
libnetsnmphelpers.so -- from Net-SNMP
libnetsnmpmibs.so -- from Net-SNMP
libnetsnmp.so -- from Net-SNMP
libwrap.so -- from TCP Wrappers
libcrypto.so -- from OpenSSL
这些文件通常在目录/lib64或者/usr/lib64或者/usr/local/lib64中,以libnetsnmpmibs.so为例说明:
cd /usr/lib64
ln -s libnetsnmpmibs.so.5.1.2 libnetsnmpmibs.so
# the version number could be different depends on your own environment
1.3 配置GPFS采集节点Net-SNMP服务端
文件:/etc/snmp/snmpd.conf
添下如下内容:
# 以团体名public暴露所有OID给外部SNMP客户端
view all included .1
rocommunity public default
# 以下为GPFS SNMPD配置
master agentx
AgentXSocket tcp:localhost:705
trap2sink managementhost
managementhost:GPFS数据发送SNMP trap的主机
官网建议如果采集数据量大,添加如下内容:
agentXTimeout 60
agentXRetries 10
1.4 配置采集节点SNMP Agent
GPFS MIB库
文件
: /usr/lpp/mmfs/data/GPFS-MIB.txt
1.4.1 配置MIB库
方法1:
复制或链接文件 (/usr/lpp/mmfs/data/GPFS-MIB.txt
) 到本地SNMP MIB目录 (通常是/usr/share/snmp/mibs
)
install -m 0644 /usr/lpp/mmfs/data/GPFS-MIB.txt /usr/share/snmp/mibs
方法2:
添国本地SNMP Agent配置 (配置文件通常为/etc/snmp/snmp.conf
)
mibdirs +/usr/lpp/mmfs/data
1.4.2 配置MIB库可读
配置文件
: /etc/snmp/snmp.conf
mibs +GPFS-MIB
重启SNMPD服务
systemctl enable snmpd
systemctl restart snmpd
one more thing
- the file
/etc/snmp/snmp.conf
is the configuration for SNMP Agent - the file
/etc/snmp/snmpd.conf
is the configuration for SNMP Daemon - All the actions be taken in the above section is ONLY for Agent
- The host which need to read the GPFS OID(s) should specify the MIB too. Which means:
- Copying or linking the MIB file to local MIB directory, or add configuration for the SNMP agent
- Adding configuration
mibs ...
to the local SNMP agent, or read OID(s) value(s) with-m GPFS-MIB
option followed - If all the above two steps can not be done, you can still read the values with
-m /path/to/GPFS-MIB.txt
option followed withsnmpwalk
or other commands
GPFS采集节点管理
支持的操作
- 激活采集节点
- 失效采集节点
- 更改收集节点
激活采集节点并启动SNMP子服务
mmchnode --snmp-agent -N NodeName
失效采集节点并关闭SNMP子服务
mmchnode --nosnmp-agent -N NodeName
查看SNMP Agent配置情况
mmlscluster | grep snmp
更改SNMP采集节点
mmchnode --nosnmp-agent -N OldNodeName
mmchnode --snmp-agent -N NewNodeName
开启和停止GPFS SNMP子系统
- SNMP子系统启停自动;
- SNMP子系统随GPFS集群启停;
- 激活SNMP采集节点时如GPFS集群已启动,SNMP子系统会自动开启;
- 失效SNMP采集节点时,该节点上SNMP子系统也会自动停止。
官网链接
DevMon SNMP入口配置
YAML文件示例[^1]
[^1]: SNMP读取超时时间可稍微增大一些,避免GPFS SNMP子系统无响应(高配置服务器未测试,或可忽略。)
---
# device address which runs SNMP agent
address: GPFS_Collector_Node_Address # the address which been listened to by SNMP Daemon
region: SomeRegion # The region of device, e.g., DCA, DCB, DataCenterA...
area: SomeArea # the business area
addr_in_cmdb: SomeAddr # the address which related with Resource ID in CMDB
rid: 'THis is resource ID'
snmp:
version: '2c' # the SNMPD version
community: 'public' # the SNMPD community
mib: 'GPFS-MIB'
timeout: 2 # second(s) to timeout, increase the value in case of reading faulty with busy OS
retries: 1 # time(s) to retry after failed
OIDs:
# the information of GPFS cluster
- id: 'gpfsClusterName'
label: gpfsClusterName
explanation: 'GPFS集群名称'
show: True # add support for group showing
- id: 'gpfsClusterId'
label: gpfsClusterId
explanation: 'GPFS集群ID'
show: True
- id: 'gpfsClusterMinReleaseLevel'
label: gpfsClusterMinReleaseLevel
explanation: ''
show: True
- id: 'gpfsClusterNumNodes'
label: 'gpfsClusterNumNodes'
explanation: ''
show: True
- id: 'gpfsClusterNumFileSystems'
label: 'gpfsClusterNumFileSystems'
explanation: ''
show: True
- group:
- gpfsDiskPerfName
- gpfsDiskPerfFSName
- gpfsDiskPerfStgPoolName
label: gpfsStorgePoolName
show: True
- group:
- gpfsDiskReadTimeL
- gpfsDiskReadTimeH
- gpfsDiskWriteTimeL
- gpfsDiskWriteTimeH
- gpfsDiskLongestReadTimeL
- gpfsDiskLongestReadTimeH
- gpfsDiskLongestWriteTimeL
- gpfsDiskLongestWriteTimeH
- gpfsDiskShortestReadTimeL
- gpfsDiskShortestReadTimeH
- gpfsDiskShortestWriteTimeL
- gpfsDiskShortestWriteTimeH
label: gpfsPerfTime
perf: True
- group:
- gpfsDiskReadBytesL
- gpfsPerfBytes
- gpfsDiskReadBytesH
- gpfsDiskWriteBytesL
- gpfsDiskWriteBytesH
- gpfsDiskReadOps
- gpfsDiskWriteOps
label: gpfsPerfOps
perf: True
- group:
- gpfsFileSystemPerfName
- gpfsFileSystemBytesReadL
- gpfsFileSystemBytesReadH
- gpfsFileSystemBytesCacheL
- gpfsFileSystemBytesCacheH
- gpfsFileSystemBytesWrittenL
- gpfsFileSystemBytesWrittenH
label: gpfsFsSystemBytes
perf: True
- group:
- gpfsFileSystemReads
- gpfsFileSystemCaches
- gpfsFileSystemWrites
label: gpfsFsIO
perf: True
- group:
- gpfsFileSystemOpenCalls
- gpfsFileSystemCloseCalls
- gpfsFileSystemReadCalls
- gpfsFileSystemWriteCalls
- gpfsFileSystemReaddirCalls
label: gpfsFSSysCalls
perf: True
- group:
- gpfsFileSystemInodesWritten
- gpfsFileSystemInodesRead
- gpfsFileSystemInodesDeleted
- gpfsFileSystemInodesCreated
label: gpfsFSInodeIO
perf: True
- group:
- gpfsFileSystemStatCacheHit
- gpfsFileSystemStatCacheMiss
label: gpfsFSCacheNo
perf: True
DevMon持续读取GPFS性能数据
/path/to/venv/python3 devmon.py perf -s
# 默认每隔60s采样一次
InfluxDB安装配置(以Gentoo为例)
1. 添加包关键字
cat >> /etc/portage/package.accept_keywords << EOF
dev-db/influx-cli ~amd64
dev-db/influxdb ~amd64
EOF
2. 安装软件包
emerge -avt influxdb influx-cli
3. 重新配置InfluxDB服务^2
文件
:/etc/init.d/influxdb
-- # command_args="-config ${config} ${influxd_opts}"
++ export INFLUXDB_CONFIG_PATH="/etc/influxdb/influxdb.conf"
++ command_args="run ${influxd_opts}"
4. 添加配置
文件
:/etc/influxdb/influxdb.conf
5. 激活并启动服务
rc-update add influxdb default
/etc/init.d/influxdb start
本地安装Grafana^3
emerge -avt grafana-bin