File system based semaphore + rsync.
copy_guardian implements a file system based seamaphore to limit execution of
critical operations. The default mode binds the semaphore to the script which
owns it, such that operations can be also limitted over parallel executions within
a HPC batch system.
To guard a code block to be executed not more then 3 times simultaniously using copy_guardian:
import copy_guardian
with copy_guardian.BoundedSemaphore(3):
print("im active")
copy_guardian uses per default a sub-folder .copy_guard_locks in your homefolder.
You can change this folder as follows:
with copy_guardian.BoundedSemaphore(3, lock_directory="/shared/my_locks"):
print("im active")
The default timeout for acquiring a lock to enter the guarded code segment is 300s, for longer running operations you can overrun this default value:
with copy_guardian.BoundedSemaphore(3, timeout=600):
print("im active")
copy_guardian wraps the rsync tool to copy files between servers. To use this functionality rsync must be
installed on all machines involved.
Further passwordless authentication must be setup using public-/private-key pairs. The
following example copies a file my_data.txt to a folder /remote_data on a
remote machine ssh-server.mycompany.com using the same name on the target
machine:
c = Connection(
host="ssh-server.mycompay.com",
user="me",
private_key="./id_ed25519"
)
c.rsync_to("my_data.txt", "/remote_data")
You can also use a different port and copy multiple files using wild-cards:
c = Connection(
host="ssh-server.mycompay.com",
user="me",
private_key="./id_ed25519"
port=2222
)
c.rsync_to("./local_data/*.txt", "/remote_data")
You can also copy folders:
c = Connection(
host="ssh-server.mycompay.com",
user="me",
private_key="./id_ed25519"
port=2222
)
c.rsync_to("./local_data/", "/remote_data")
To copy from another server to the local computer the method is rsync_from:
c = Connection(
host="ssh-server.mycompay.com",
user="me",
private_key="./id_ed25519"
port=2222
)
c.rsync_from("/remote_data/*.txt", "/local_data")
To speedup copying many files on an parallel file system, you can use copy_local_folder:
from copy_guardian import copy_local_folder
copy_local_folder("/remote_data/results", "/local_data")
If you have any suggestions or questions about copy_guardian feel free to email me at uwe.schmitt@id.ethz.ch.
If you encounter any errors or problems with copy_guardian, please let me know!