Migrating Filesystems with Active Processes
I’ve run into an interesting challenge: I needed to migrate application data from a local filesystem to NFS without stopping the processes running in the original mountpoint. Here’s a basic overview of the process. This will not work for every application.
So, let’s start a process in a directory located on a local filesystem. This will just run in `/tmp/local` and write the current timestamp to a file.
cd /tmp/local df -hlP /tmp/local #Filesystem Size Used Avail Use% Mounted on #tmpfs 30G 17M 30G 1% /tmp for i in `seq 1 1000` ; do date >> /tmp/local/out; sleep 5; done
Below is a sample process for halting the processes running in the source filesystem, rsyncing the contents to an NFS share, remounting the original mountpoint to the NFS share, restarting the halted processes, and refreshing their working directory. The last part is important because, while the mountpoint did not change, the underlying filesystem will be different and the process needs to know that.
# Define source and taget mountpoints workdir=/tmp/local tmpdir=/tmp/remote mkdir -p ${tmpdir} chown --reference=${workdir} ${tmpdir} # Create an array holding PIDs for processes running in ${workdir} IFS=$'\n' ; a=($(lsof ${workdir} | awk '{print $2}' | egrep "[0-9]{1,}")) ; unset IFS # And pause those processes for i in $(printf '%s\n' ${a[@]}); do kill -STOP ${i}; done # Mount the ${tmpdir} on the NFS share mount nas04:/share01 ${tmpdir} # Rsync local filesystem to the NFS share rsync -avKx --delete ${workdir}/ ${tmpdir}/ # Mount the original ${workdir} to the NFS share umount ${tmpdir} mount nas04:/share01 ${workdir} # Resume paused PIDs and refresh their working directory for i in $(printf '%s\n' ${a[@]}); do kill -CONT ${i} gdb -q <<EOF attach ${i} call (int) chdir("${workdir}") detach quit EOF done
Now you can tail the migrated output file and see that the original process is still writing to it:
tail -f /tmp/local/out #Tue May 2 11:44:47 EDT 2017 #Tue May 2 11:44:52 EDT 2017 #Tue May 2 11:44:57 EDT 2017 #Tue May 2 11:45:02 EDT 2017 #Tue May 2 11:45:07 EDT 2017 # >>> note the time gap due to migration #Tue May 2 11:45:29 EDT 2017 #Tue May 2 11:45:34 EDT 2017 #Tue May 2 11:45:39 EDT 2017 #Tue May 2 11:45:44 EDT 2017
As I mentioned, this may not work for more complex applications. Still, this can be useful. For example, you launched some script or another process that’s writing to a local filesystem. Then you realized you may not have enough disk space to hold the output. This may be a way to move the output file to another filesystem without relaunching the process.
For more complex data structures, you may want to use lsyncd
instead of rsync
for the sync process to be running in real time. This will minimize the downtime required to remount.