Online Disk Replacement on NetApp FAS3210 NAS Storage
Replacing a hard disk in a storage array can be roughly divided into three steps. First, check the system status and identify the location of the failed disk. S
Replacing a hard disk in a storage array can be roughly divided into three steps. First, check the system status and identify the location of the failed disk. Second, replace the disk and let the system confirm it automatically or manually. Third, verify the storage state. For the first two steps, a failed disk is usually indicated by its status light, and virtually all storage-class products support hot-swapping. So for someone who just wants to get through it and brush off the client, the easiest method is to look at the panel lights and swap whichever disk has an abnormal light. If the storage happens to auto-detect the new disk and do the right thing, all is well.
But storage is no small matter—no matter how many or how few disks a device has, it is the most basic and important building block of production. So, with an attitude of responsibility toward both yourself and others, handling storage still calls for extreme care; no amount of caution is excessive.
1. Checking the System State
Fas3210A> sysconfig -r
Aggregate aggr0 (online, raid\_dp) (block checksums)
Plex /aggr0/plex0 (online, normal, active)
RAID group /aggr0/plex0/rg0 (normal)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.01.0 0a 1 0 SA:A - BSAS 7200 1695466/3472315904 1695759/3472914816
parity 0a.01.1 0a 1 1 SA:A - BSAS 7200 1695466/3472315904 1695759/3472914816
data 0a.01.2 0a 1 2 SA:A - BSAS 7200 1695466/3472315904 1695759/3472914816
Aggregate aggr1 (online, raid\_dp, reconstruct) (block checksums)
Plex /aggr1/plex0 (online, normal, active)
RAID group /aggr1/plex0/rg0 (reconstruction 5% completed)
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
dparity 0a.02.0 0a 2 0 SA:A - BSAS 7200 1695466/3472315904 1695759/3472914816
parity 0a.03.0 0a 3 0 SA:A - BSAS 7200 1695466/3472315904 1695759/3472914816
data 0a.01.3 0a 1 3 SA:A - BSAS 7200 1695466/3472315904 1695759/3472914816
FAILED 0a.02.6 0a 2 6 SA:A - BSAS 7200 1695466/3472315904 1695759/3472914816
data 0a.03.1 0a 3 1 SA:A - BSAS 7200 1695466/3472315904 1695759/3472914816
data 0a.02.7 0a 2 7 SA:A - BSAS 7200 1695466/3472315904 1695759/3472914816
Spare disks (empty)
Broken disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
failed 0a.03.9 0a 3 9 SA:A - BSAS 7200 1695466/3472315904 1695759/3472914816
Partner disks
RAID Disk Device HA SHELF BAY CHAN Pool Type RPM Used (MB/blks) Phys (MB/blks)
--------- ------ ------------- ---- ---- ---- ----- -------------- --------------
partner 0a.01.7 0a 1 7 SA:A - BSAS 7200 0/0 1695759/3472914816
partner 0a.01.6 0a 1 6 SA:A - BSAS 7200 0/0 1695759/3472914816
partner 0a.01.20 0a 1 20 SA:A - BSAS 7200 0/0 1695759/3472914816
partner 0a.01.13 0a 1 13 SA:A - BSAS 7200 0/0 1695759/3472914816
As shown above, there are two failed disks, displayed as "failed" in the command output, with their corresponding locations.
You can also obtain failed disk information by checking the status of all disks:
Fas3210A> disk show -v
DISK OWNER POOL SERIAL NUMBER HOME
------------ ------------- ----- ------------- -------------
0a.02.2 Fas3210B (1574419336) Pool0 WD-WMAY02394580 Fas3210B (1574419336)
0a.01.4 Fas3210B (1574419336) Pool0 B9KKWBZF Fas3210B (1574419336)
0a.03.7 Fas3210B (1574419336) Pool0 WD-WMAY04134224 Fas3210B (1574419336)
0a.01.1 Fas3210A (1574419769) Pool0 BFGUKM5F Fas3210A (1574419769)
0a.01.3 Fas3210A (1574419769) Pool0 BFGUKKBF Fas3210A (1574419769)
0a.02.6 Fas3210A (1574419769) FAILED WD-WMAY02391728 Fas3210A (1574419769)
0a.03.9 Fas3210A (1574419769) FAILED WD-WMAY02504662 Fas3210A (1574419769)
Once the slot of the failed disk is identified, you can replace the disk. But before doing so, you should check the disk's PN for the procurement preparation:
Fas3210A> storage show disk
DISK SHELF BAY SERIAL VENDOR MODEL REV
--------------------- --------- ---------------- -------- ---------- ----
0a.02.6 2 6 WD-WMAY02391728 NETAPP X306\_WMANT NA04
0a.02.7 2 7 WD-WMAY02377652 NETAPP X306\_WMANT NA04
0a.02.8 2 8 WD-WMAY02501788 NETAPP X306\_WMANT NA04
0a.03.8 3 8 WD-WMAY02481326 NETAPP X306\_WMANT NA04
0a.03.9 3 9 WD-WMAY02504662 NETAPP X306\_WMANT NA04
0a.03.10 3 10 WD-WMAY02459742 NETAPP X306\_WMANT NA04
As shown above, the PN of the failed disks is "X306_WMANT NA04". You can also check via the following command:
Fas3210A> sysconfig -v
NetApp Release 8.0.2 7-Mode: Mon Jun 13 14:13:45 PDT 2011
System ID: 1574419769 (Fas3210A); partner ID: 1574419336 (Fas3210B)
System Serial Number: 850000103829 (Fas3210A)
System Rev: F4
System Storage Configuration: Single-Path HA
System ACP Connectivity: NA
slot 0: System Board 2.3 GHz (System Board XVI F4)
Model Name: FAS3210
Part Number: 111-00585
Revision: F4
Serial Number: 5006779457
BIOS version: 5.1.1
Loader version: 3.2
Processors: 2
Processor type: Intel(R) Xeon(R) CPU E5220 @ 2.33GHz
Memory Size: 5120 MB
Memory Attributes: Bank Interleaving Hoisting Rank Interleaving Normal ECC
NVMEM Size: 640 MB of Main Memory Used
CMOS RAM Status: OK
Service Processor Status: Online
Firmware Version: 1.2.2
Mgmt MAC Address: 00:A0:98:18:77:C8
Ethernet Link: down
Using DHCP: no
IPv4 configuration:
IP Address: unknown
Netmask: unknown
Gateway: unknown
slot 0: Internal BGE 10/100 Ethernet Controller
e0M MAC Address: 00:a0:98:18:77:c6 (auto-100tx-fd-up)
e0P MAC Address: 00:a0:98:18:77:c7 (auto-100tx-fd-up)
Device Type: BCM5721
slot 0: Dual 10G Ethernet Controller T320E-SFP/KR
Device Type: CT-FE-3
Version Number: T3-SRAM1.1.0-BR1016-02-01-FW7.7.192-DR04
Serial Number: jb04050693
c0a MAC Address: 00:a0:98:18:77:c4 (auto-unknown-enabling)
c0b MAC Address: 00:a0:98:18:77:c5 (auto-10g\_kr-fd-up)
slot 0: Dual 10/100/1000 Ethernet Controller G20
e0a MAC Address: 00:a0:98:18:77:c2 (auto-100tx-fd-up)
e0b MAC Address: 00:a0:98:18:77:c3 (auto-100tx-fd-up)
Device Type: Rev 6
slot 0: SAS Host Adapter 0a (PMC-Sierra PM8001 rev. C, SAS, <UP>)
Firmware rev: 01.10.14.00
Base WWN: 5:00a098:000808d:e0
Phy State: [0] Enabled, 3.0 Gb/s [1] Enabled, 3.0 Gb/s [2] Enabled, 3.0 Gb/s [3] Enabled, 3.0 Gb/s
QSFP Vendor: Molex Inc.
QSFP Part Number: 112-00177+A0
QSFP Type: Passive Copper 2m ID:01
QSFP Serial Number: 116620073
ID Vendor Model FW Size
01.23: NETAPP X306\_HJUPI02TSSA NA01 1695.4GB (3907029168 512B/sect)
02.0 : NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
02.1 : NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
02.2 : NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
02.3 : NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
02.4 : NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
02.5 : NETAPP X306\_HMARK02TSSM NA00 1695.4GB (3907029168 512B/sect)
02.6 : NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect) (Failed)
02.7 : NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
03.6 : NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
03.7 : NETAPP X306\_WMANT02TSSM 4321 1695.4GB (3907029168 512B/sect)
03.8 : NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
03.9 : NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect) (Failed)
03.10: NETAPP X306\_WMANT02TSSM NA04 1695.4GB (3907029168 512B/sect)
Shelf 1: IOM3 Firmware rev. IOM3 A: 0131 IOM3 B: 0131
Shelf 2: IOM3 Firmware rev. IOM3 A: 0131 IOM3 B: 0131
Shelf 3: IOM3 Firmware rev. IOM3 A: 0131 IOM3 B: 0131
slot 0: SAS Host Adapter 0b (PMC-Sierra PM8001 rev. C, SAS, <OFFLINE (hard)>)
Firmware rev: 01.10.14.00
Base WWN: 5:00a098:000808d:e4
Phy State: [4] Disabled [5] Disabled [6] Disabled [7] Disabled
QSFP Vendor: not available
QSFP Part Number: not available
QSFP Type: not available
QSFP Serial Number: not available
slot 0: Intel ICH USB EHCI Adapter u0a (0xdf901400)
boot0 Micron Technology Real SSD eUSB 2GB, rev 2.00/11.10, addr 2 1936MB 512B/sect (0FF0022700058229)
slot 0: FC Host Adapter 0c (QLogic 2432 rev. 2, L-port, <OFFLINE (hard)>)
Firmware rev: 5.4.0
Host Loop Id: 0
FC Node Name: 5:00a:098000:808dd8
FC Port Name: 5:00a:098000:808dd8
SFP Vendor: FINISAR CORP.
SFP Part Number: FTLF8524P2BNV
SFP Serial Number: PL82Q4S
SFP Capabilities: 1, 2 or 4 Gbit
Link Data Rate: N/A
slot 0: FC Host Adapter 0d (QLogic 2432 rev. 2, L-port, <OFFLINE (hard)>)
Firmware rev: 5.4.0
Host Loop Id: 0
FC Node Name: 5:00a:098100:808dd8
FC Port Name: 5:00a:098100:808dd8
SFP Vendor: FINISAR CORP.
SFP Part Number: FTLF8524P2BNV
SFP Serial Number: PL82RFY
SFP Capabilities: 1, 2 or 4 Gbit
Link Data Rate: N/A
slot 1: Quad Gigabit Ethernet Controller 82580
e1a MAC Address: 00:1b:21:c4:36:88 (auto-unknown-down)
e1b MAC Address: 00:1b:21:c4:36:89 (auto-unknown-down)
e1c MAC Address: 00:1b:21:c4:36:8a (auto-unknown-down)
e1d MAC Address: 00:1b:21:c4:36:8b (auto-unknown-down)
Device Type: 150E, PBA E68891-015
slot 2: Quad Gigabit Ethernet Controller 82580
e2a MAC Address: 00:1b:21:c4:35:c4 (auto-unknown-down)
e2b MAC Address: 00:1b:21:c4:35:c5 (auto-unknown-down)
e2c MAC Address: 00:1b:21:c4:35:c6 (auto-unknown-down)
e2d MAC Address: 00:1b:21:c4:35:c7 (auto-unknown-down)
Device Type: 150E, PBA E68891-015
2. Replacing the Disk
Identify the disk:
Fas3210A> priv set advanced (enter advanced mode)
Fas3210A*> led\_on 0a.2.6 (make the disk indicator blink)
Fas3210A*> led\_off 0a.2.6 (stop the disk indicator blink)
Fas3210A*> priv set admin (exit advanced mode)
Open the disk latch, remove the failed disk, insert the new disk, then check the new disk's status in the system:
Fas3210A*> disk show -v
DISK OWNER POOL SERIAL NUMBER HOME
------------ ------------- ----- ------------- -------------
0a.02.3 Fas3210B (1574419336) Pool0 WD-WMAY02393819 Fas3210B (1574419336)
0a.02.4 Fas3210B (1574419336) Pool0 WD-WMAY02501482 Fas3210B (1574419336)
0a.01.14 Fas3210B (1574419336) Pool0 BFGULBNF Fas3210B (1574419336)
0a.01.23 Fas3210B (1574419336) Pool0 BFGUKM4F Fas3210B (1574419336)
0a.01.5 Fas3210B (1574419336) Pool0 BFGS1EDF Fas3210B (1574419336)
0a.01.1 Fas3210A (1574419769) Pool0 BFGUKM5F Fas3210A (1574419769)
0a.01.3 Fas3210A (1574419769) Pool0 BFGUKKBF Fas3210A (1574419769)
0a.02.6 Not Owned NONE YFG95P3A
0a.02.8 Fas3210A (1574419769) Pool0 WD-WMAY02501788 Fas3210A (1574419769)
0a.02.0 Fas3210A (1574419769) Pool0 WD-WMAY02481293 Fas3210A (1574419769)
0a.03.10 Fas3210A (1574419769) Pool0 WD-WMAY02459742 Fas3210A (1574419769)
0a.03.3 Fas3210A (1574419769) Pool0 WD-WMAY02482034 Fas3210A (1574419769)
0a.02.12 Fas3210A (1574419769) Pool0 BFGB2R9F Fas3210A (1574419769)
0a.03.2 Fas3210A (1574419769) Pool0 WD-WMAY02481343 Fas3210A (1574419769)
0a.02.10 Fas3210A (1574419769) Pool0 WD-WMAY02389974 Fas3210A (1574419769)
0a.02.5 Fas3210A (1574419769) Pool0 YFH7R0BA Fas3210A (1574419769)
If, as shown above, the new disk's status is "Not Owned", you need to run the following commands:
Fas3210A*> disk assign 0a.03.9 (assign the disk to the current controller)
Fas3210B*> disk assign 0a.03.9 -o Fas3210A (assign the disk to a specific controller)
3. Checking the Storage State
Then use the command disk show -v to check the disk status, and sysconfig -r to check the system status.
If the disk you swapped in is brand new, there usually won't be any further issues after running disk assign. If it's a used spare part, watch out for the disk showing "not zeroed" in the sysconfig -r output, or for a new aggregate being created automatically instead of the disk being added to the spare pool.
If the newly replaced disk was added to the spare pool and is "not zeroed", running the command disk zero spares will resolve it. If a new aggregate was created, run the following commands to delete it.
Check the aggregate status to determine which are pre-existing and which is newly created (assume the new aggregate is aggr1(1)):
Fas3210A*> aggr status
Change the aggregate status to offline:
Fas3210A*> aggr offline aggr1(1)
Delete the aggregate:
Fas3210A*> aggr destroy aggr1(1)
Check the system status again to make sure the newly replaced disk has been added to the spare pool and its status is normal:
Fas3210A*> sysconfig -r
Appendix
View log command:
Fas3210A> rdfile /etc/messages
Enable automatic disk assignment:
Fas3210A*> options disk.auto\_assign off (disable)
Fas3210A*> options disk.auto\_assign (view)
If a used spare part is fine on its own but shows "failed" after replacement, run the following command to force the disk state to change. If it still shows "failed" afterward, the spare part is faulty:
Fas3210A*> disk unfail 0a.2.6
Switch back to normal mode:
Fas3210A*> priv set
评论Comments
加载中…Loading…
留下评论Leave a comment