5

I am facing an issue with my RAID 1 (mdadm softraid) on an AlmaLinux/CloudLinux OS server, which is a production server with live data. Here's the chronology of events:

  1. Initially, I created a RAID 1 array with two 1TB NVMe disks (2 x 1TB).
  2. At some point, the second NVMe disk failed. I replaced it with a new 2TB NVMe disk. I then added this new 2TB NVMe disk to the RAID array, but it was partitioned/configured to match the 1TB capacity of the remaining active disk.
  3. Currently, the first 1TB disk has failed and was automatically kicked out by the RAID system when I rebooted the server. So, only the 2TB NVMe disk (which is currently acting as a 1TB member of the degraded RAID) remains.

Replacement and Setup Plan

I have already replaced the failed 1TB disk with a new 2TB NVMe disk. I want to utilize the full 2TB capacity since both disks are now 2 x 2TB.

[root@id1 ~]# cat /proc/mdstat Personalities : [raid1] [raid6] [raid5] [raid4] md124 : active raid5 sdd2[3] sdc2[1] sda2[0] 62945280 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] bitmap: 1/1 pages [4KB], 65536KB chunk md125 : active raid5 sdd1[3] sdc1[1] sda1[0] 1888176128 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] bitmap: 7/8 pages [28KB], 65536KB chunk md126 : active raid5 sda3[0] sdc3[1] sdd3[3] 2097152 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/3] [UUU] bitmap: 0/1 pages [0KB], 65536KB chunk md127 : active raid1 nvme1n1p1[2] 976628736 blocks super 1.2 [2/1] [_U] bitmap: 8/8 pages [32KB], 65536KB chunk unused devices: <none> 

mdadm --detail /dev/md127

[root@id1 ~]# mdadm --detail /dev/md127 /dev/md127: Version : 1.2 Creation Time : Tue Aug 29 05:57:10 2023 Raid Level : raid1 Array Size : 976628736 (931.39 GiB 1000.07 GB) Used Dev Size : 976628736 (931.39 GiB 1000.07 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Thu May 29 01:33:09 2025 State : active, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Name : idweb.webserver.com:root (local to host idweb.webserver.com) UUID : 3fb9f52f:45f39d12:e7bb3392:8eb1481f Events : 33132451 Number Major Minor RaidDevice State - 0 0 0 removed 2 259 2 1 active sync /dev/nvme1n1p1 

lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL

[root@id1 ~]# lsblk -o NAME,FSTYPE,SIZE,MOUNTPOINT,LABEL NAME FSTYPE SIZE MOUNTPOINT LABEL sda 931.5G ├─sda1 linux_raid_member 900.5G web1srv.serverhostweb.com:home2 │ └─md125 ext4 1.8T /home2 ├─sda2 linux_raid_member 30G web1srv.serverhostweb.com:tmp │ └─md124 ext4 60G /var/tmp └─sda3 linux_raid_member 1G web1srv.serverhostweb.com:boot └─md126 xfs 2G /boot sdb ext4 5.5T sdc 931.5G ├─sdc1 linux_raid_member 900.5G web1srv.serverhostweb.com:home2 │ └─md125 ext4 1.8T /home2 ├─sdc2 linux_raid_member 30G web1srv.serverhostweb.com:tmp │ └─md124 ext4 60G /var/tmp └─sdc3 linux_raid_member 1G web1srv.serverhostweb.com:boot └─md126 xfs 2G /boot sdd 931.5G ├─sdd1 linux_raid_member 900.5G web1srv.serverhostweb.com:home2 │ └─md125 ext4 1.8T /home2 ├─sdd2 linux_raid_member 30G web1srv.serverhostweb.com:tmp │ └─md124 ext4 60G /var/tmp └─sdd3 linux_raid_member 1G web1srv.serverhostweb.com:boot └─md126 xfs 2G /boot nvme0n1 1.8T nvme1n1 1.8T └─nvme1n1p1 linux_raid_member 931.5G web1srv.serverhostweb.com:root └─md127 ext4 931.4G / 

What are the steps to repair my soft RAID 1, maximize the storage to 2TB, and ensure the data remains safe?

I have some example step but not really sure, does the below step right?:

# Create a partition on the new disk with a full size of 2TB fdisk /dev/nvme0n1 mdadm --manage /dev/md127 --add /dev/nvme0n1p1 # Wait for First Sync # Fail and remove the old disk mdadm --manage /dev/md127 --fail /dev/nvme1n1p1 mdadm --manage /dev/md127 --remove /dev/nvme1n1p1 # Repartition the old disk for full 2TB gdisk /dev/nvme1n1 # Add back to RAID mdadm --manage /dev/md127 --add /dev/nvme1n1p1 # Wait for Second Sync # Expand RAID array to maximum mdadm --grow /dev/md127 --size=max # Verify new size mdadm --detail /dev/md127 # Resize ext4 filesystem resize2fs /dev/md127 # Update mdadm.conf mdadm --detail --scan > /etc/mdadm.conf # Update initramfs dracut -f 

Server Spec:

  • Os Almalinux/Cloudlinux 8
2
  • i don't know if it's a good idea to expand a degraded raid array in production. sounds a littlebit risky, do you have a full backup if something goes wrong ? Commented May 28 at 20:11
  • Sure, i have full backup Commented May 30 at 1:03

1 Answer 1

8

First you'll need to extend the nvme1n1p1 partition to cover the full size of the NVMe it's on. The easiest way to do this would be with the growpart tool (depending on distribution, it may be packaged as cloud-utils-growpart or just growpart).

The command would be:

sudo growpart /dev/nvme1n1 1 

i.e. the arguments are the name of the full-disk device (which includes the namespace ID in case of NVMe drives), then the number of the partition to extend.

At this point, I'd duplicate the partitioning to the other new NVMe. There are many ways to do it, but assuming that nvme0n1 is exactly the same size or bigger than nvme1n1, I might do it this way:

sudo sfdisk --dump /dev/nvme1n1 >/tmp/nvme1n1.parttable sudo sfdisk /dev/nvme0n1 < /tmp/nvme1n1.parttable 

Then it's time to join the new NVMe to the RAID set:

sudo mdadm --manage /dev/md127 --add /dev/nvme0n1p1 

Syncing before growing should mean that you need to only sync ~1TB worth of data instead of 2 TB.

After the syncing has completed, the next step would be to tell the RAID array to grow:

sudo mdadm --grow /dev/md127 -z max 

And since you have just a simple ext4 filesystem inside the RAID array, the last step would be to resize the filesystem to take advantage of the new space:

 sudo resize2fs /dev/md127 

All these steps can be done without rebooting, and would be the way to go if the safety of the data and uptime are your main priorities.

However, since you seem to have /boot, /home2 and /var/tmp as separate filesystems, having a ~2 TB root filesystem on a classic partition might not be exactly ideal.

If you wanted to reorganize your filesystems or e.g. start using LVM, this would be a good time to do it.

2
  • I update my post, can you take look? Commented May 30 at 1:06
  • Yes, your steps will do the job... but if you prefer/need to use gdisk for nvme1n1, use it instead of fdisk for nvme0n1 too. That way, you won't end up with a GPT partition table on nvme1n1 and a classic MBR on nvme0n1, which might cause confusion later. Commented May 30 at 4:16

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.