Linux 小撇步：利用`flock`來做同步和非同步應用

簡介

最近我常常需要同時ssh給若干台電腦做許多需要等待，而且可以同時進行的工作。例如：

讓遠端電腦同時更新套件
同時傳送小檔案給遠端的電腦（時間大部分在ssh認證）

然而之後的動作又需要在確認上述工作完畢之後，才能繼續進行。

過去我都是這樣做：

# 前面的工作
update_pkg_on_machine_1
update_pkg_on_machine_2
update_pkg_on_machine_3
# ... 後面的工作

這樣雖然可以確保工作同時進行完畢，但是就是很慢…

另一種可能的方法是：

# 前面的工作
update_pkg_on_machine_1 &
update_pkg_on_machine_2 &
update_pkg_on_machine_3 &
sleep 10
# ... 後面的工作

這樣子雖然可以同時進行工作，但是如果10秒內工作還沒完成，接下來的工作可能就會出錯了。

而工作要在多少秒之內做完，其實是很難掌握的。

利用`flock`來管理工作狀態

我過去在自修作業系統的時候，有學到mutex這個東西，而flock就是可以在shell上使用的mutex。

`flock`的官方說明

我們先看一下flock 在ubuntu lucid上的說明:

NAME
       flock - Manage locks from shell scripts

SYNOPSIS
       flock [-sxon] [-w timeout] lockfile [-c] command...

       flock [-sxon] [-w timeout] lockdir [-c] command...

       flock [-sxun] [-w timeout] fd
DESCRIPTION
       This  utility  manages  flock(2) locks from within shell scripts or the
       command line.

       The first and second forms  wraps  the  lock  around  the  executing  a
       command,  in  a  manner  similar  to  su(1)  or  newgrp(1).  It locks a
       specified file or directory, which  is  created  (assuming  appropriate
       permissions), if it does not already exist.

       The  third form is convenient inside shell scripts, and is usually used
       the following manner:

       (
         flock -s 200
         # ... commands executed under lock ...
       ) 200>/var/lock/mylockfile

       The mode used to open the file doesn’t matter to flock; using >  or  >>
       allows  the  lockfile  to  be  created  if  it  does not already exist,
       however, write permission is required; using < requires that  the  file
       already exists but only read permission is required.

       By  default,  if  the  lock cannot be immediately acquired, flock waits
       until the lock is available.

OPTIONS
       -s, --shared
              Obtain a shared lock, sometimes called a read lock.

       -x, -e, --exclusive
              Obtain an exclusive lock, sometimes called a write  lock.   This
              is the default.

       -u, --unlock
              Drop  a  lock.   This  is  usually not required, since a lock is
              automatically dropped when the file is closed.  However, it  may
              be  required  in  special  cases,  for  example  if the enclosed
              command group may have forked a background process which  should
              not be holding the lock.

       -n, --nb, --nonblock
              Fail  (with  an  exit  code  of  1) rather than wait if the lock
              cannot be immediately acquired.

       -w, --wait, --timeout seconds
              Fail (with an exit code of 1) if the  lock  cannot  be  acquired
              within  seconds seconds.  Decimal fractional values are allowed.

       -o, --close
              Close the file descriptor on  which  the  lock  is  held  before
              executing  command.   This  is  useful if command spawns a child
              process which should not be hold ing the lock.

       -c, --command command
              Pass a single command to the shell with -c.

       -h, --help
              Print a help message.

AUTHOR
       Written by H. Peter Anvin <hpa@zytor.com>.

COPYRIGHT
       Copyright © 2003-2006 H. Peter Anvin.
       This is free software; see the source for copying conditions.  There is
       NO  warranty;  not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
       PURPOSE.

SEE ALSO
       flock(2)

AVAILABILITY
       The flock command is part of the util-linux-ng package and is available
       from ftp://ftp.kernel.org/pub/linux/utils/util-linux-ng/.

重點說明

透過flock，程式會先嘗試取得某個lock（通常代表某個檔案）的擁有權之後才執行，執行的時候會握有該lock的擁有權，並且在結束之後才釋出擁有權。

舉例來說，如果我們寫一個shell script放在$HOME底下：

#! /bin/bash
sleep 10
date

儲存成test.sh並且打開執行權限（chmod 700 test.sh）

此時如果我們打開兩個shell, 並且約同時執行：

flock /tmp/demo.lock ~/test.sh

這時會發生什麼事情呢？

使用者應該會看到兩個shell都停住，一個等10秒後印出時間，一個再過10秒後印出時間：

wush@router:~$ flock /tmp/demo.lock ./test.sh
Sat Jan 4 00:55:24 CST 2014

wush@router:~$ flock /tmp/demo.lock ./test.sh
Sat Jan 4 00:55:34 CST 2014

其中A程序先搶到/tmp/demo.lock的擁有權，然後執行test.sh。而B程序等到A程序結束之後（A歸還/tmp/demo.lock的擁有權)，才拿到/tmp/demo.lock的擁有權。所以B程序自然比A程序慢10秒。

`flock`的參數

除了預設的行為之外，我們可以透過參數來調整flock的行為。和預設行為上最主要的差異在於，當無法獲得lock_path的擁有權時，接下來的動作會不同。

flock -n lock_path xxx：當無法獲得擁有權的時候，直接中止程序，不執行xxx。
flock -s lock_path xxx：把lock_path當成一個shared lock，同時能被多個程序擁有。所以大家都可以馬上執行，而且同時擁有lock_path
flock -x lock_path xxx：把lock_path當成一個exclusive lock，同時只能被一個程序擁有。

註：一個lock_path不能同時為shared和exclusive！

解決簡介中的問題

所以透過組合flock，我可以同時執行若干個工作，並且等到他們結束之後再繼續執行接下來的工作：

# 前面的工作
flock -s lock_path update_pkg_on_machine_1 &
flock -s lock_path update_pkg_on_machine_2 &
flock -s lock_path update_pkg_on_machine_3 &
flock -x lock_path echo "all done!"
# ... 後面的工作

關鍵在於flock -x lock_path xxx會因為shared和exclusive互斥的關係，而不能共存。因此就會等到上面的工作都結束（歸還lock_path的擁有權）之後才執行。

Linux 小撇步：利用flock來做同步和非同步應用

簡介

利用flock來管理工作狀態