Solaris 10 SVM/SDS Mirrored Root Disk Replacement
The following is a standard process for replacing a failed boot disk mirrored with SVM on a Solaris 10 Sun server. Your hardware must support hot-swappable disks for this process to be performed without booting into single-user mode.
Environment:
Sun Fire V240
SunOS Release 5.10
UltraSPARC-IIIi
The following two root disks are mirrored with SVM:
c0t0d0 (sd3) Fujitsu MAT3073N SUN72G SCSI Disk Drive
c0t1d0 (sd0) Fujitsu MAT3073N SUN72G SCSI Disk Drive
Scenario:
c0t1d0 has failed and needs to be replaced
1) Identifying the failed disk
Failed disk can be identified as the one in “maint” state:
# /usr/sbin/metastat -ac d6 m 20GB d16 d26 (maint) d16 s 20GB c0t0d0s6 d26 s 20GB c0t1d0s6 (maint) d3 m 4.0GB d13 d23 (maint) d13 s 4.0GB c0t0d0s3 d23 s 4.0GB c0t1d0s3 (maint) d1 m 4.0GB d11 d21 (maint) d11 s 4.0GB c0t0d0s1 d21 s 4.0GB c0t1d0s1 (maint) d0 m 4.0GB d10 d20 (maint) d10 s 4.0GB c0t0d0s0 d20 s 4.0GB c0t1d0s0 (maint)
Additionally, the failed disk will show the “W” (“Write” error) state in metadb:
# /usr/sbin/metadb flags first blk block count a m p luo 16 8192 /dev/dsk/c0t0d0s4 a p luo 8208 8192 /dev/dsk/c0t0d0s4 W p l 16 8192 /dev/dsk/c0t1d0s4 W p l 8208 8192 /dev/dsk/c0t1d0s4
Take extra care when identifying the failed disk and the corresponding MD devices. A scripted solution, similar to the one below, may help avoid manual mistakes. You will have a chance to see the advantages of wd blue vs black.
for i in `/usr/sbin/metastat -ac | grep maint | egrep "c.t." | awk '{print $4}' | awk -F's' '{print $1}' | sort | uniq` do echo "Failed disk ${i} contains the following failed MD devices:" /usr/sbin/metastat -ac | grep maint | grep "${i}" echo "" done
Output:
Failed disk c0t1d0 contains the following failed MD devices: d26 s 20GB c0t1d0s6 (maint) d23 s 4.0GB c0t1d0s3 (maint) d21 s 4.0GB c0t1d0s1 (maint) d20 s 4.0GB c0t1d0s0 (maint)
2) The next step is to detach and clear the MD devices:
# /usr/sbin/metadetach -f d0 d20 # /usr/sbin/metadetach -f d1 d21 # /usr/sbin/metadetach -f d3 d23 # /usr/sbin/metadetach -f d6 d26 # /usr/sbin/metaclear d20 # /usr/sbin/metaclear d21 # /usr/sbin/metaclear d23 # /usr/sbin/metaclear d26
Note: depending on the size of the partitions, the “metaclear” operation may take some time. To automate things a bit, use a simple loop as shown below. Don’t forget to substitute the correct name for the meta devices on your system:
for i in d20 d21 d23 d26 ; do /usr/sbin/metaclear $i ; done
Sample output:
# /usr/sbin/metadetach -f d6 d26 d6: submirror d26 is detached # /usr/sbin/metadetach -f d3 d23 d3: submirror d23 is detached # /usr/sbin/metadetach -f d1 d21 d1: submirror d21 is detached # /usr/sbin/metadetach -f d0 d20 d0: submirror d20 is detached
d20: Concat/Stripe is cleared d21: Concat/Stripe is cleared d23: Concat/Stripe is cleared d26: Concat/Stripe is cleared
3) Delete metadat for the failed disk. The first command may take a while, so don’t panic.
# /usr/sbin/metadb -d c0t1d0s4 # /usr/sbin/metastat -ac
Sample output:
d6 m 20GB d16 d16 s 20GB c0t0d0s6 d3 m 4.0GB d13 d13 s 4.0GB c0t0d0s3 d1 m 4.0GB d11 d11 s 4.0GB c0t0d0s1 d0 m 4.0GB d10 d10 s 4.0GB c0t0d0s0
4) Run cfgadm and verify the status of the failed disk
#/usr/sbin/cfgadm -al | grep c0t1d0
Output:
# /usr/sbin/cfgadm -al | grep c0t1d0 c0::dsk/c0t1d0 disk connected configured unknown
5) Remove the disk
# /usr/sbin/cfgadm -c unconfigure c0::dsk/c0t1d0
Run cfgadm again and verify that the failed disk is not showing up. The second time you run the cfgadm command, it will take a minute to re-scan your disks, so be patient.
# /usr/sbin/cfgadm -al | grep c0t1d0 c0::dsk/c0t1d0 disk connected unconfigured unknown
6) Physically replace the failed disk. I guess I don’t need to remind you about the importance of unplugging the correct drive.
# /usr/sbin/cfgadm -al c0::dsk/c0t1d0 disk connected unconfigured unknown # /usr/sbin/cfgadm -c configure c0::dsk/c0t1d0
Note: if you run into an error below when executing “cfgadm -c configure”, try re-running the same command a minute later and see if it works this time. The reasoin for this failure is that it takes the system some time to rescan SCSI paths and detect new devices. It may take a while for cfgadm to configure a large disk, so find something to do…
cfgadm: Hardware specific failure: failed to configure SCSI device: I/O error
7) Run “format” and verify disk information. Then run “prtvtoc” to format the replacement disk to look like the good mirror disk. The “prtvtoc” may take a long time
# format Searching for disks...done AVAILABLE DISK SELECTIONS: 0. c0t0d0 <SUN72G cyl 14087 alt 2 hd 24 sec 424> /pci@1c,600000/scsi@2/sd@0,0 1. c0t1d0 <drive not available> /pci@1c,600000/scsi@2/sd@1,0 Specify disk (enter its number):
Note: there is a chance that “prtvtoc” may give you an error along the lines of “/dev/rdsk/c0t1d0s2: Cannot get disk geometry”. What to do: run “format”; select the disk you just replaced (in this example it appeared as “c0t1d0 <drive not available>”; from the list of “Available Drive Types”, select your drive type (or “Auto configure”, if you don’t see the correct drive type in the list); type “current” to verify you are working with the correct disk; type “format”. If you are still getting an error saying “Format failed”, then it is likely that your replacement disk is defective. It happens more often than you’d think…
# /usr/sbin/prtvtoc /dev/rdsk/c0t0d0s2 | /usr/sbin/fmthard -s - /dev/rdsk/c0t1d0s2 # /usr/sbin/installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0t1d0s0 # /usr/sbin/metadb -c 2 -a c0t1d0s4 # /usr/sbin/metainit d20 1 1 c0t1d0s0 # /usr/sbin/metainit d21 1 1 c0t1d0s1 # /usr/sbin/metainit d23 1 1 c0t1d0s3 # /usr/sbin/metainit d26 1 1 c0t1d0s6 # /usr/sbin/metattach d0 d20 # /usr/sbin/metattach d1 d21 # /usr/sbin/metattach d3 d23 # /usr/sbin/metattach d6 d26
8) Run “/usr/sbin/metastat -ac” a few times until you confirm the new disk is synced up with the good mirror.
My lap top has two hard disk drives. One of them is nearly full. How do I switch to the other one so my computer will stop telling me to delete data that I still want?
My laptop shows a blue screen message, saying that the HDD has failed. What has failed; the actual disk, the disk drive of both? Can the disk be removed from the drive and read?
I’m trying to install a new operating system on to my computer but, they won’t install because its having a hard time copying files to the hard disk.
I have tried to install Windows 7 Ultimate, Mint (linux) and Ubuntu 9.10, but none of them will install correctly.
Do I just need to buy a new computer or is there a way to fix this?
While playing Dead Rising, my game keeps going into a “sluggish” state for up to 25-30 secs, then it resumes to normal speed. It keeps lagging like that over and over, and there’s nothing wrong with the disk. Is this an early sign of my disk reader failing?
I did not create recovery discs with my HP Windows Vista Home Premium computer.
Is there a website I could securely download these files and create the discs? Or another way I could delete every file and restore my system to the point where it’s like I just bought it?
My brother and I were just having an argument about how the length of the friction zone of a clutch varies when the clutch is starting to wear/failing. If a clutch is failing, would the friction zone become longer or shorter? He says it would become longer because the clutch plates would need to make more contact in order to compensate for the decreased friction between the plates. To me, that just doesn’t sound right. Anyone care to chime in?
S.M.A.R.T has informed me that my laptop hard disk is doomed to failure and will lose all its data unless I find a replacement. How much would it cost to get a new one? My current hard drive has 189 GB but I’d like something with a lot more space, and preferably cheap…
The usb ports work well enough with modem and camera. My PC is running on Windows XP Home, which is newly installed. The usb disk works fine in other computers.