Rsync is a fast and versatile command line utility that synchronizes files and folders between two locations over a remote shell, or from/to a remote Rsync daemon. It provides fast incremental file transfer by transferring only the differences between the source and the destination.
In this guide we will explain how to setup and automate an Rsync script that pulls data from the source to the destination.
1. Generate Keypair
Generate a public and private key pair on the destination server, the server from where you are pulling your backup data. This is the method that we recommend, the push method has a number of security implications that need to be considered.
ssh-keygen Enter passphrase (empty for no passphrase): Enter same passphrase again:
2. copy public key to the server that is the source of your data.
ssh-copy-id -i ~/.ssh/id_rsa.pub ip.of.data.source
or you can manually copy and paste the contents of /root/.ssh/id_rsa.pub
that you just generated on the destination server to the /root/.ssh/authorized_keys
file on the source server. This may come handy if you do not allow password authentication for SSH on your destination server.
3. setup rsync script
The example below sets up Rsync to pull data from a remote server. It synchronizes this data with the local folders that have been configured. Additionally a bandwidth limit has been set and a folder named "backup" on the source folder is ignored and thus not synchronized. Finally a backup log file is generated.
rsync -av --delete --bwlimit=50000 --exclude 'backup*/' --log-file=/home/rsync-backup-log-$(date +"%Y-%m-%d").log -e 'ssh -p 22' root@ip.of.data.source:/source/folder/ /destination/folder/
4. setup a cron job
navigate to the /etc/cron.d/
folder, create a file named backup-cron
and paste the contents of the rsync script that we created earlier and with the cron schedule configured.
00 10,20 * * * root rsync -av --delete --bwlimit 50000 --exclude 'exlude/folder/of/choice*/' --log-file=/home/rsync-backup-log-$(date +"%Y-%m-%d").log -e 'ssh -p 22' root@ip.of.data.source:/source/folder /destination/folder/
Make sure to adjust the schedule to meet your requirements. In the above example the script runs twice a day, once at 10.00 hrs and another run takes place at 20.00 hrs. The cron script runs as the root
user.
Commonly used rsync flags
-a | archive mode; equals -rlptgoD (no -H,-A,-X). Mandatory for backup usage. activates recursion into the folders and preserve all file’s metadata |
-c | skip based on checksum, not mod-time & size. More trustworthy, but slower. Omit this flag if you want faster backups, but files without changes in modified time or size won't be detected for include in backup. |
-h | output numbers in a human-readable format. |
-v | increase verbosity for logging. |
-n or –dry-run | Rsync provides a method for double-checking your arguments before executing an rsync command. The -v flag (for verbose) is also necessary to get the appropriate output: rsync -anv dir1/ dir2 |
-R | relative will create the same folder structure on the server |
-P | combines the flags –progress and –partial . The first of these gives you a progress bar for the transfers and the second allows you to resume interrupted transfers |
-z | compress file data during the transfer. Less data transmitted, but slower. Omit this flag when backup target is a local device or a machine in local network (or when you have a high bandwidth to a remote machine). |
--progress | show progress per file during transfer. Only for interactive usage. |
--timeout | set I/O timeout in seconds. If no data is transferred for the specified time, the backup will be aborted. |
---delete | delete extraneous files from dest dirs. Mandatory for master-slave backup usage. |
--link-dest | hardlink to files in specified directory when unchanged, to reduce storage usage by duplicated files between backups. |
--log-file | log what we're doing to the specified file. Example: --log-file=$HOME/public_html/rsynclogs/rsync-backup-log-$(date +"%Y-%m-%d").log |
--chmod | affect file and/or directory permissions. |
--exclude | exclude files matching pattern. |
--exclude-from | same as --exclude, but getting patterns from specified file. |
--bwlimit | imitss I/O bandwidth. You need to set bandwidth using KBytes per second. For example, limit I/O banwidth to 10000KB/s (9.7MB/s), enter: # rsync --delete --numeric-ids --relative --delete-excluded --bwlimit=10000 |
Used only for remote backups
--no-W | ensures that rsync's delta-transfer algorithm is used, so it never transfers whole files if they are present at target. Omit only when you have a high bandwidth to target, backup may be faster. |
--partial-dir | put a partially transferred file into specified directory, instead of using a hidden file in the original path of transferred file. Mandatory for allow partial transfers and avoid misleads with incomplete/corrupt files. |
Used only for local backups
-W | ignores rsync's delta-transfer algorithm, so it always transfers whole files. When you have a high bandwidth to target (local filesystem or LAN), backup may be faster. |
Used only for system backups
-A | -A: preserve ACLs (implies -p). |
Used only for log sending
-r | recurse into directories |
--remove-source-files | sender removes synchronized files (non-dir). |