What about backintime in official repository? Some time ago I found this approximate idea with rsync, specifically the first answer. Both source and destination must be the same filesystem to take advantage of that approach.
I can snapshot them, roll them back, perform incremental backups, live migrate them between hypervisors — all the good stuff. And because ZFS is a log-based filesystem, the snapshots, backups, and clones are all extremely efficient.
You can write the backup out incremental or otherwise as a regular file on any target filesystem. The way that is done is usually to organize the backup directories into one per date and using —link-dest to share unchanged files with the last backup.
While this can certainly be done manually, and I sometimes do so, at the very least, I will capture the command in a backup. Fedora Linux 35 is available now. Read the release announcement for all the details.
Email Address. The opinions expressed on this website are those of each author, not of the author's employer or of Red Hat. Fedora Magazine aspires to publish all content under a Creative Commons license but may not be able to do so in all cases. You are responsible for ensuring that you have the necessary permission to reuse any work on this site.
The Fedora logo is a trademark of Red Hat, Inc. Terms and Conditions. The rsync algorithm addresses this problem in a lovely way as we all might know. After this introduction on rsync , Back to the story! Problem 1: Thin provisioning There were two things that would help the friend understand what was going on.
Problem 2: Updating files The second problem appeared when sending over an updated file. Like this: Like Loading Fedora Project community. Kathirvel Rajendran. Garrett Nievin. Very nice article, thank you! Thank you for noticing! Night Romantic. Daniel, thanks for the article! I use rsync alot…. Thank u for letting us know the nitty-gritty….
That was my understanding too, could it be that the author has it wrong? Mohammed El-Afifi. Linux Dummy. SSH option -c arcfour uses faster encryption algorithm than default. Robocopy is great for things like this. It will try again after network timeouts and it also allows you set an inter-packet gap delay to now swamp the pipe.
I know this may be stupid - but have you thought of just copying them onto an external disk and carrying it over to the other server?
It may actually be the most efficient and simple solution. Already tons of good suggestions, but wanted to throw in Beyond Compare.
I recently transferred about , files between 5KB and 20MB from one server to another over a gigabit switch. It didn't even hiccup at all. Granted it took a while, but I'd expect that with so much data. We are investigating this issue currently. We need to transfer about 18 million small files - about GB total. About 3 Days from 1 server to another, about 2 Weeks to an external drive!
Through another process, we needed to duplicate the server. This was done with Acronis. It took about 3 hours!!! We will be investigating this some more. The dd suggestion above would probably provide similar results. In a similar situation, I tried using tar to batch up the files.
I wrote a tiny script to pipe the output of the tar command across to the target machine directly in to a receiving tar process which unbundled the files. Here are the tar commands. The archive created contains all the files in the current directory. This archive file is piped into the remsh command which sends it to the box2 machine. I had 6 of these tar commands running simultaneously to ensure the network link was saturated with data, although I suspect that disk access may have been the limiting factor.
Are you able to unmount this partition that the files live on it, or mount it readonly? Do that, then something like:. You can then mount diskimage. If you are really courageous you can dd it directly back into a partition on the destination side. I don't recommend that. By doing this, there is NO overhead for the directory iteration or compression, because that was done at the time the files were written. There is only one file to move--the VHD. This means less IP header overhead. One thing I've run into, though, is that it's best to keep file sizes to under Mb for a network or USB transfer.
I use Rar. Works like a champ. This is the equivalent of 'dd' in Linux.. The concept of mounting a compressed filesystem to a directory is normal for Linux as well, so the same logic applies. You ought to ensure all files are closed before the operation starts, as in the other methods. This has the added benefit of making it possible to put a size quota on a folder.
If the VHD is a fixed size, going over that limit will not bring down the server, it will just cause an error creating or writing the file. Maybe -c is enabled or some other flag that reads more than just directories and metadata inode data.
Is rsync with no files changed nearly as fast as ls -Llr? Then you've tuned rsync as best as you can. Very few file systems and system tools handle such large directories very well. If the filenames are abcdefg.
This breaks the directories up into smaller ones, but doesn't require a huge change to the code. If you have 80k files in a directory, maybe your developers are working around the fact that what they really want is a database.
Lastly, is 5 minutes really so bad? If you run this backup once a day, 5 minutes is not a lot of time. Yes, I love speed. However if 5 minutes is "good enough" for your customers, then it is good enough for you.
If you don't have a written SLA, how about an informal discussion with your users to find out how fast they expect the backups to take. I assume you didn't ask this question if there wasn't a need to improve the performance. However, if your customers are happy with 5 minutes, declare victory and move on to other projects that need your efforts. Update: After some discussion we determined that the bottleneck is the network.
I'm going to recommend 2 things before I give up You can also try lsyncd, which will rsync only when changes are detected on the filesystem and just the changed subdirectories. I've been using it for directories with up to two million files on a decent server. My explanation for the problem lies in the way how rsync works: See here.
They say: While it is being built, each entry is transmitted to the receiving side in a network-optimised way. This leads to write-stop-write-stop-write sending over the network, which is supposedly inferior to preparing full data first and then sending it over the network at full speed.
The write-stop-write-stop-write sequence might require many network roundtrips more, in worst case eventually even 80k network roundtrips See info about TCP packet handling, Nagle's algorithm and so on. I did a practical test with a sync program that indeed works batch-wise: The local synchronizer Zaloha. The program works by running find on the directories to obtain CSV files. The find on the remote directory runs in an ssh session and after it finishes, the CSV file is downloaded to the local system by scp.
The find on the local directory runs locally. I have chosen one of my directories to most closely match 80k files in fact it is nearly 90k files and 3k directories. The hardware used during the test is nothing special or "bleeding edge": an eight year old notebook with Linux and a desktop PC of approximately same age with Linux serving as the remote backup host. The link between them is a plain vanilla home Wi-Fi network. The notebook has its data on an USB-connected! The data is in synchronized state same condition as yours except of one not-synchronized file to proof that Zaloha2.
The find scan of the internal HDD took 14 seconds. The scp -transferring of the CSV files over the Wi-Fi and their sort and mawk processing took 34 seconds. Interestingly, when running the whole test again, both find s finished almost immediately. I assume that this is due to caching of directory data in the Linux kernels. Normally, rsync only compares file modification dates and file sizes.
Your approach would force it to read and checksum the content of all files twice on the local and remote system to find changed directories. For synchronisation of large numbers of files where little has changed , it is also worth setting noatime on the source and destination partitions. This saves writing access times to the disk for each unchanged file.
Note it isn't encrypted, but may be able to be tunneled without losing the listing performance improvement. Sign up to join this community. The best answers are voted up and rise to the top. Stack Overflow for Teams — Collaborate and share knowledge with a private group. Create a free Team What is Teams? Learn more. Faster rsync of huge directory which was not changed Ask Question.
Asked 5 years, 10 months ago. Active 1 year, 1 month ago. Viewed 42k times. We use rsync to backup servers. Unfortunately the network to some servers is slow. I guess that the rsync clients sends data for each of the 80k files. Since the network is slow I would like to avoid to send 80k times information about each file.
Is there a way to tell rsync to make a hash-sum of a sub directory tree? This way the rsync client would send only a few bytes for a huge directory tree. Update Up to now my strategy is to use rsync. Update2 There are 80k files in one directory tree. Improve this question. You might read up on zsync. I have not used it myself, but from what I read, it pre-renders the metadata on the server side and might just speed up transfers in your case.
It might be worth testing anyway. Add a comment.
0コメント