I have a machine with a 3 disk ZFS raidz pool and recently a disk in the pool had faults. The system sent messages like this one:

Subject: ZFS device fault for pool 0x7ECB4B30B6937099 on mymachine

The number of I/O errors associated with a ZFS device exceeded
acceptable levels. ZFS has marked the device as faulted.

 impact: Fault tolerance of the pool may be compromised.
    eid: 12959
  class: statechange
  state: FAULTED
   host: mymachine
   time: 2025-07-03 04:00:26+0200
  vpath: /dev/disk/by-id/ata-ST4000LM024-2AN17V_WCK6DX9D-part1
  vphys: pci-0000:00:17.0-ata-4.0
  vguid: 0x93C91C63FD91EC96
  devid: ata-ST4000LM024-2AN17V_WCK6DX9D-part1
   pool: 0x7ECB4B30B6937099

Subject: ZFS scrub_finish event for diskarray on mymachine

ZFS has finished a scrub:

   eid: 12962
 class: scrub_finish
  host: mymachine
  time: 2025-07-03 08:23:03+0200
  pool: diskarray
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
        repaired.
  scan: scrub repaired 2M in 07:22:50 with 0 errors on Thu Jul  3 08:23:03 2025
config:

        NAME                                 STATE     READ WRITE CKSUM
        diskarray                            DEGRADED     0     0     0
          raidz1-0                           DEGRADED     0     0     0
            ata-ST4000LM024-2AN17V_WCK693CD  ONLINE       0     0     0
            ata-ST4000LM024-2AN17V_WCK69J3C  ONLINE       0     0     0
            ata-ST4000LM024-2AN17V_WCK6DX9D  FAULTED     34     0     0  too many errors

errors: No known data errors

I decided to replace the faulty disk, I didn't need to set it offline with zpool offline diskarray ata-ST4000LM024-2AN17V_WCK6DX9D because the state is FAULTED.

The link in /dev/disk/by-id/ata-ST4000LM024-2AN17V_WCK6DX9D shows that the faulty disk is the one connected on the second SATA connector.

I replaced the disk on the second connector with a new one (same model) and then I started the resilvering:

zpool replace diskarray ata-ST4000LM024-2AN17V_WCK6DX9D ata-ST4000LM024-2AN17V_WCK6C5KH

It took about 50 hours to finish and while it is resilvering the pool can be used normally.

Links

zfs commands Setting up root on ZFS in Debian Bullseye Root on ZFS in Debian Bullseye Using zfs on a separate partition Fixing zfs file errors and degraded pool Recovering zfs pools after a system crash Zfs kernel panic

hashtags: #zfs