![]() |
|
Questions:I restore/save all files but dar reported some files have been ignored, what are those ignored files?Dar hangs when using it with pipes, why? Why, when I restore 1 file, dar report 3 files have been restored? While compiling dar I get the following message : " g++: /lib/libattr.a: No such file or directory", what can I do? I cannot find the binary package for my distro, where to look for? Can I use different filters between a full backup and a differential backup? Would not dar consider some file not included in the filter to be deleted? Once in action dar makes all the system slower and slower, then it stops with the message "killed"! How to overcome this problem? I have a backup I want to change the size of slices? I have a backup in one slice, how can I split it in several slices? I have a backup in several slice, how can I stick all them in a single file? I have a backup, how can I change its encryption scheme? I have a backup, how can I change its compression algorithm? Which options can I use with which options? Why dar reports corruption for the archive I have transfered with FTP? Why DAR does save UID/GID instead of plain usernames and usergroups? Dar_Manager does not accept encrypted archives, how to workaround this? How to overcome the lack of static linking on MacOS X? Why cannot I test, extract file, list the contents of a given slice from an archive? Why cannot I merge two isolated catalogues? Why cannot dar use the full power of my multi-processor computer? Is libdar thread-safe, which way do you mean it is? How to solve "configure: error: Cannot find size_t type"? Why dar became much slower since release 2.4.0? Why dar became yet slower since release 2.5.0? How to search for questions (and their answers) about known problems similar to mines? Why dar tells me that he failed to open a directory, while I have excluded this directory? Dar reports a "SECURITY WARNING! SUSPICIOUS FILE" what does that mean!? Can dar help copy a large directory tree? Does dar compress per file or the whole archive? What slice size can I use with dar? Is there a dar fuse filesystem? how dar compares to tar? Why when comparing an archive with filesystem, dar does not report new files found on filesystem? Why do dar reports truncated filenames under Windows, especially with cyrillic filenames? I have a 32 bits windows system, which binary package can I to use? lzo compression is slower with dar than with lzop, why? Answers:I restore/save all files but dar reported some files have been ignored, what are those ignored files? When
restoring/saving, all
files are
considered by default. But if you specify some files to restore or
save, all other files are "ignored", this is the case when using -P -X
-I or -g.
Dar hangs when using it with pipes, why? Dar
can
produce archive on its
standard output, if you give '-' as
basename. But it cannot read an archive from its standard input in
direct access mode. To
feed an archive to dar through pipes, you need dar_slave and two pipes
or use the sequential mode (--sequential-mode option, which is very
slow compared to the default direct access mode).
To use dar with dar_slave over pipes in direct access mode (which is
the more efficient way to proceed), see the detailed notes or more
precisely dar and ssh note.
Why, when I restore 1 file, dar report 3 files have been restored? if
you
restore for example the
file usr/bin/emacs dar will first
restore usr (if the directory already exists, it will get its date and
ownership restored, all existing files will be preserved), then
/usr/bin will be restored, and last usr/bin/emacs will be restored.
Thus 3 inodes have been
restored or modified while only one file has been asked for restoration.
While compiling dar I get the following message : " g++: /lib/libattr.a: No such file or directory", what can I do? The problem comes from an
incoherence in your distro (Redhat and Slackware seem(ed) concerned at
least): Dar (Libtool) finds
/usr/lib/gcc-lib/i386-redhat-linux/3.3.3/../../../libattr.la file
to link with. This file defines where is located libattr static and
dynamic libraries but in this file both static and dynamic libraries
are expected to be found under /lib. While the dynamic libattr is
there, the static version has been moved to /usr/lib. A
workaround is to make a symbolic link:
ln -s /usr/lib/libattr.a
/lib/libattr.a I cannot find the binary package for my distro, where to look for? For any
binary package, ask
your distro maintainer to include dar (if
not already done), and check on the web site of your preferred distro
for a dar package
Can I use different filters between a full backup and a differential backup? Would not dar consider some file not included in the filter to be deleted? Yes, you
can. No, there is no
risk to have dar deleting the files that
were not selected for the differential backup. Here is the way dar
works:
During a
backup process, when a
file is ignored due to filter
exclusion, an "ignored" entry is added to the catalogue. At the
end of the backup, dar compares both catalogues, the one of reference
and the new one built during the backup process, and adds a "detruit"
(destroyed in English) entry, when an entry of the reference is not
present in the new catalogue. Thus, if an "ignored" is present no
"detruit" will be added for that name. Then all "ignored" entries are
removed and the catalogue is dumped in the archive.
Once in action dar makes all the system slower and slower, then it stops with the message "killed"! How to overcome this problem? Dar needs
virtual memory to
work. Virtual memory is the RAM + SWAP
space. Dar memory requirement grows with the amount of file saved, not
with the amount of data saved. If you have a few huge files you will
have little chance to see any memory limitation problem. At the
opposite, saving a plethora of files (either big or small), will make
dar request a lot of virtual memory. Dar needs this memory to build the
catalogue (the contents) of the archive it creates. Same thing, for
differential backup, except it also needs to load in memory the
catalogue of the archive of reference, which most of the time will make
dar using twice more memory when doing a differential backup than a
full backup.
Anyway, the
solution is:
There is
still a workaround
which is to make several smaller archives
of the files to backup. For example, make a backup for all in
/usr/local another for all in /var and so on. These backup can be full
or differential. The drawback is not big as you can store these
archive side by side and use them at will. Moreover, you can feed a
unique dar_manager database with all these different archives. This
which will hide you the fact that there are several full archives and
several differential archives concerning different set of files.
I have a backup I want to change the size of slices? dar_xform is your friend!
dar_xform -s <size>
original_archive new_archive dar_xform
will create a new
archive with the slices of the requested
size, (you can also make use of -S option for the first slice). Note
that you don't need to decrypt the archive, not dar will uncompress it,
this is thus a very fast processing. See dar_xform
man page for more.
I have a backup in one slice, how can I split it in several slices? dar_xform is your friend!
dar_xform -s <size>
original_archive new_archive see above for more. I have a backup in several slice, how can I stick all them in a single file? dar_xform is your friend!
dar_xform original_archive new_archive dar_xform without -s option creates a single sliced archive. See dar_xform man page for more. I have a backup, how can I change its encryption scheme? The merging
feature let you do
that. The merging has two roles, putting
in one archive the contents of two archives, and at the same time
filtering file contents to not copy certain files in the resulting
archive. The merging feature can take two but also only one archive as
input, so we will use it in that special way here:
dar
-+ new_archive -A
original_archive -K "<new_algo>:new pass" -ak If the original archive was not in clear you need to add the -J option to provide the encryption key, and if you don't want to have password in clear on the command line (command that can be seen with top or ps by other users), simply provide "<algo>:" then dar will ask you on the fly the password, if using blowfish you can then just provide ":" for the keys:dar
-+ new_archive -A
original_archive -K ":" -J ":" -ak Note that
you can also change
slicing of the archive at the same time
thanks to -s and -S options:
dar
-+ new_archive -A
original_archive -K ":" -J ":" -ak -s 1G I have a backup, how can I change its compression algorithm? Same thing
as above : we will
use the merging feature :
to use bzip2 compression: dar
-+ new_archive -A
original_archive -zbzip2 to use gzip compression dar
-+ new_archive -A
original_archive -zgzip to use lzo
compression
dar
-+ new_archive -A
original_archive -zlzo to use no
compression at all:
dar
-+ new_archive -A
original_archive Note that you can also change encryption scheme and slicing at the same time you change compression: dar -+ new_archive -A
original_archive -zbzip2 -K ":" -J ":" -s 1G
Which options can I use with which options? DAR provides
seven commands:
|
short
option |
long option |
-c |
-x |
-l |
-d |
-t |
-C |
-+ |
-v |
--verbose |
OK |
OK |
OK |
OK |
OK |
OK |
OK |
-vs |
--verbose=s |
OK |
OK |
-- |
OK |
OK |
-- | OK |
-b |
--beep |
OK |
OK |
OK |
OK |
OK |
OK |
OK |
-n |
--no-overwrite |
OK |
OK |
-- | -- | -- | OK |
OK |
-w |
--no-warn |
OK | OK | -- | -- | -- | OK | OK |
-wa |
--no-warn=all |
-- | OK | -- | -- | -- | -- | -- |
-A |
--ref |
OK | OK | -- |
OK | OK | OK | OK |
-R |
--fs-root |
OK | OK | -- | OK | -- | -- | -- |
-X |
--exclude |
OK | OK | OK | OK | OK | -- | OK |
-I |
--include |
OK | OK | OK | OK | OK | -- | OK |
-P |
--prune |
OK | OK | OK | OK | OK | -- | OK |
-g |
--go-into |
OK | OK | OK |
OK | OK | -- | OK |
-] |
--exclude-from-file |
OK | OK | OK |
OK | OK | -- | OK |
-[ |
--include-from-file |
OK | OK | OK |
OK | OK | -- | OK |
-u |
--exclude-ea |
OK | OK | -- | -- | -- | -- | OK |
-U |
--include-ea |
OK | OK | -- | -- | -- | -- | OK |
-i |
--input |
OK | OK | OK | OK | OK | OK | OK |
-o |
--output |
OK | OK | OK | OK | OK | OK | OK |
-O |
--comparison-field |
OK | OK | -- | OK | -- | -- | -- |
-H |
--hour |
OK | OK | -- | -- | -- | -- | -- |
-E |
--execute |
OK | OK | OK | OK | OK | OK | OK |
-F |
--ref-execute |
OK | -- | -- | -- | -- | OK | OK |
-K |
--key |
OK | OK | OK | OK | OK | OK | OK |
-J |
--ref-key |
OK | -- | -- | -- | -- | OK | OK |
-# |
--crypto-block |
OK | OK | OK | OK | OK | OK | OK |
-* |
--ref-crypto-block |
OK | -- | -- | -- | -- | OK | OK |
-B |
--batch |
OK | OK | OK | OK | OK | OK | OK |
-N |
--noconf |
OK | OK | OK | OK | OK | OK | OK |
-e |
--empty |
OK | -- | -- | -- | -- | OK | OK |
-aSI |
--alter=SI |
OK | OK | OK | OK | OK | OK | OK |
-abinary |
--alter=binary |
OK | OK | OK | OK | OK | OK | OK |
-Q |
OK | OK | OK | OK | OK | OK | OK | |
-aa |
--alter=atime |
OK | -- | -- | OK | -- | -- | -- |
-ac |
--alter=ctime |
OK | -- | -- | OK | -- | -- | -- |
-am |
--alter=mask |
OK | OK | OK | OK | OK | OK | OK |
-an |
--alter=no-case |
OK | OK | OK | OK | OK | OK | OK |
-acase |
--alter=case |
OK | OK | OK | OK | OK | OK | OK |
-ar |
--alter=regex |
OK |
OK |
OK |
OK |
OK |
OK |
OK |
-ag |
--alter=glob |
OK |
OK |
OK |
OK |
OK |
OK |
OK |
-z |
--compression |
OK | -- | -- | -- | -- | OK | OK |
-s |
se--slice |
OK | -- | -- | -- | -- | OK | OK |
-S |
--first-slice |
OK | -- | -- | -- | -- | OK | OK |
-p |
--pause |
OK | -- | -- | -- | -- | OK | OK |
-@ |
--aux |
OK | -- | -- | -- | -- | -- | OK |
-$ |
--aux-key |
-- | -- | -- | -- | -- | -- | OK |
-~ |
--aux-execute |
-- | -- | -- | -- | -- | -- | OK |
-% |
--aux-crypto-block |
-- | -- | -- | -- | -- | -- | OK |
-D |
--empty-dir |
OK | OK | -- | -- | -- | -- | OK |
-Z |
--exclude-compression |
OK | -- | -- | -- | -- | -- | OK |
-Y |
--include-compression |
OK | -- | -- | -- | -- | -- | OK |
-m |
--mincompr |
OK | -- | -- | -- | -- | -- | OK |
-ak |
--alter=keep-compressed |
-- |
-- |
-- |
-- |
-- |
-- |
OK |
-af |
--alter=fixed-date |
OK |
-- |
-- |
-- |
-- |
-- |
-- |
--nodump |
OK | -- | -- | -- | -- | -- | -- | |
-M |
--no-mount-points |
OK | -- | -- | -- | -- | -- | -- |
-, |
--cache-directory-tagging |
OK | -- | -- | -- | -- | -- | -- |
-k |
--deleted |
-- | OK | -- | -- | -- | -- | -- |
-r |
--recent |
-- | OK | -- | -- | -- | -- | -- |
-f |
--flat |
-- | OK | -- | -- | -- | -- | -- |
-ae |
--alter=erase_ea |
-- | OK | -- | -- | -- | -- | -- |
-T |
--list-format |
-- | -- | OK | -- | -- | -- | -- |
-as |
--alter=saved |
-- | -- | OK | -- | -- | -- | -- |
-ad |
--alter=decremental |
-- | -- | -- | -- | -- | -- | OK |
-q |
--quiet |
OK |
OK |
OK |
OK |
OK |
OK |
OK |
-/ |
--overwriting-policy |
-- | OK | -- | -- | -- | -- | OK |
-< |
--backup-hook-include |
OK | -- | -- | -- | -- | -- | -- |
-> |
--backup-hook-exclude |
OK | -- | -- | -- | -- | -- | -- |
-= |
--backup-hook-execute |
OK | -- | -- | -- | -- | -- | -- |
-ai |
--alter=ignore-unknown-inode-type |
OK |
-- |
-- |
-- |
-- |
-- |
-- |
-at |
--alter=tape-marks |
OK |
-- |
-- |
-- |
-- |
-- |
OK |
-0 |
--sequential-read |
OK |
OK |
OK |
OK |
OK |
OK |
-- |
-; |
--min-digits |
OK |
OK |
OK |
OK |
OK |
OK |
OK |
-1 |
--sparse-file-min-size |
OK |
-- |
-- |
-- |
-- |
-- |
OK |
-ah |
--alter=hole-recheck |
-- |
-- |
-- |
-- |
-- |
-- |
OK |
-^ |
--slice-mode |
OK |
-- |
-- |
-- |
-- |
OK |
OK |
-_ |
--retry-on-change |
OK |
-- |
-- |
-- |
-- |
-- |
-- |
-asecu |
--alter=secu |
OK |
-- |
-- |
-- |
-- |
-- |
-- |
-. |
--user-comment |
OK |
-- |
-- |
-- |
-- |
OK |
OK |
-3 |
--hash |
OK |
-- |
-- |
-- |
-- |
OK |
OK |
-2 |
--dirty-behavior |
-- |
OK |
-- |
-- |
-- |
-- |
-- |
-al |
--alter=lax |
-- |
OK |
-- |
-- |
-- |
-- |
-- |
-alist-ea |
--alter=list-ea |
-- |
-- |
OK |
-- |
-- |
-- |
-- |
-4 |
--fsa-scope |
OK |
OK |
-- |
OK |
-- |
-- |
OK |
-5 |
--exclude-by-ra |
OK |
-- |
-- |
-- |
-- |
-- |
-- |
-7 |
--sign |
OK |
-- |
-- |
-- |
-- |
OK |
OK |
Why DAR does save UID/GID
instead of plain usernames and usergroups?
In
each file property is
not present the name of the owner nor the name of the group owner, but
instead are present two numbers, the user ID and the group ID (UID
& GID in short). In the /etc/password file these numbers are
associated to names and other properties, like the login shell, the
home
directory, the password (see also /etc/shadow).
Thus, when you do a directory list (with the 'ls' command for example
or with any GUI program for another example), the listing application
used
does open each directory, there it finds a list of name and a inode
number associated, then the listing program fetchs the inode attributes
for each file and looks among other information for the UID and the
GID. To be able
to display the real user name and group name, the listing application
use a well-defined standard C library call that will do the lookup in
/etc/password, eventually NIS system if configured and any other
additional
system, [this way applications have not to bother with the many system
configuration possible, the same API interface is used whatever is the
system], then lookup returns the name if it exist and the listing
application display for each
file found in a directory the attributes and the user name and group
name as returned by the system. As you can see, the user name and
group name are not part of any file attribute, but UID and GID *are*
instead. Dar is a backup tool mainly, it does preserve at much as
possible the files property to be able to restore them as close as
possible
to their original state. Thus a file saved with UID=3 will be restored
with UID=3. The name corresponding the UID 3 may exist or not,
may exist and be the same or may exist and be different, the file will
be anyway restored in UID 3. Scenario with dar's way
of restoring
Thus, when doing backup and
restoration of a crashed system you can be confident, the restoration
will not interfere with the bootable system you have used to launch dar
to restore your disk. Assuming you have UID 1 labeled 'bin' in your
real crashed system, but this UID 1 is labeled 'admin' in the boot
system, while UID 2 is labeled 'bin'
in this boot system, files owned
by bin in the system to
restore will be restored under UID 1, not UID 2
which is used by the temporary boot system. At that time after
restoration still running the from the boot system, if you do a 'ls'
you will see that the original files
owned by 'bin' are now owned
by user 'admin'. This is really a mirage: in your
restoration you will also restore the /etc/password
file and other
system configuration files (like NIS configuration files if they have
been used),
then at reboot time on the newly restored real system, the UID 1 will
be backed associated to user 'bin'
as expected and files originally owned by user bin will now been listed as owned
by bin as expected. Scenario with plain name way of
restoring
If dar had done else, restoring
the files owned by 'bin' to
the UID corresponding to 'bin',
these
files would have been given UID 2 (the one used by the temporary
bootable system used to launch dar). But once the real restored system
would
have been launched, this UID 2 would have become some other user and
not 'bin' which is mapped to
UID 1 in the restored /etc/password. Now, if you want to change some
UID/GID when moving a set of
files from
one live system to another system, there is no problem if you are not
restoring dar under the 'root'
account. Other account than 'root'
are
usually not allowed to modify UID/GID, thus restored files by dar will
have group and user ownership of the dar process, which is the one that
has launched dar. But if you really need to move a
directory tree containing a set of files with different ownership and
you want to preserve these different ownership from one live system to
another, while the corresponding UID/GID do not match between the two
system, dar can still help you:
find /path/to/restored/archive
-uid <old UID> -print -exec chown <new name> {} \;
find /path/to/restored/archive
-gid <old GID> -print -exec chgrp <new name> {} \;
The first command will let you
remap an UID to another for all files
under the /path/to/restored/archive directory
The second command will let you remap a GID to another for all files under the /path/to/restored/archive directory Example on how to globally modify ownership of a directory tree user by user For example, you have on the source system three users: Pierre
(UID
100), Paul (UID 101), Jacques (UID 102)
but on the destination system, these same users are mapped to different UID: Pierre has UID 101, Paul has UID 102 and Jacques has UID 100. We temporary need an unused UID on the destination system, we will assume UID 680 is not used. Then after the archive restoration in the directory /tmp/A we will do the following: find /tmp/A -uid 100 -print
-exec
chown 680 {} \;
find /tmp/A -uid 101 -print -exec chown pierre {} \; find /tmp/A -uid 102 -print -exec chown paul {} \; find /tmp/A -uid 680 -print -exec chown jacques {} \; which is: change files of UID 100 to UID 680 (the files of Jacques are now under the temporary UID 680 and UID 100 is now freed) change files of UID 101 to UID 100 (the files of Pierre get their UID of the destination live system, UID 101 is now freed) change files of UID 102 to UID 101 (the files of Paul get their UID of the destination live system, UID 102 is now freed) change files of UID 680 to UID 102 (the files of Jacques which had been temporarily moved to UID 680 are now set to their UID on the destination live system, UID 680 is no more used). You can then move the modified
files to appropriated destination or
make a new dar archive to be restored in appropriated place if you want
to use some of dar's feature like for example only restore files that
are more recent than those present on filesystem.
Yes,
that's true, dar_manager
does not accept encrypted archives. The
first reason is that while dar_manager database cannot be encrypted
this is not very fair to add to them encrypted archives. The second
reason
is because the dar_manager database should hold the key for each
encrypted archive making this archive the weakest point in your data
security: Breaking the database encryption would then provide
access to any encryption key, and with original archive access it would
bring access to data of any of the archive added to the database.
OK, there is however a feature in the pipe to provide to dar_manager the support to encrypt its archives, then next another feature to provide dar_manager the possibility to store the different archive keys, then is needed another feature to have key being passed from dar_manager to dar out of command-line (which would expose the keys to the sight of other users on your multi-user system), then yet another feature to be able to feed the database with the archive keys also without using the command-line. ... well there is a lot of feature to add and test before you can expect finding it in a released version of dar. In the meanwhile, you can proceed as follows:
Note that the database is not
encrypted this will expose the archive
file listing (not the file's contents) of your encrypted archives to
anyone able to read the database, thus it is recommended to set
restrictive permission to this database file.
When will come the time to use dar_manager to restore some file, you will have to make dar_manager pass the key to dar for it be able to restore the needed files from the archive. This can be done in several ways: dar_manager's command-line, dar_manager database or dar.dcf file.
note that you must prevent other
users reading any file holding the
archive key, this covers the dar_manager database as well as the DCF
files you could temporarily use. Second note, in this workaround
approach we have assumed that all encrypted archive do share the same
key.
How to overcome the lack of static linking on MacOS X? The answer comes from Dave
Vasilevsky in an email to the dar-support
mailing-list. I let him explain how to do:
Pure-static
executables aren't used on OS X.
However, Mac OS X does have other ways to build portable binaries.
HOWTO build portable binaries on OS X?
First, you have to make sure that dar only uses operating-system libraries that exist on the oldest version of OS X that you care about. You do this by specifying one of Apple's SDKs, for example: export
CPPFLAGS="-isysroot /Developer/SDKs/MacOSX10.2.8.sdk"
export LDFLAGS="-Wl,-syslibroot,/Developer/SDKs/MacOSX10.2.8.sdk" Second, you have to make sure that any non-system libraries that dar links to are linked in statically. To do this edit dar/src/dar_suite/Makefile, changing LDADD to '../libdar/.libs/libdar.a'. If any other non-system libs are used (such as gettext), change the makefiles so they are also linked in statically. Apple should really give us a way to force the linker to do this automatically! Some caveats: * If you build for 10.3 or lower, you will not get EA support, and therefore you will not be able to save special Mac information like resource forks. * To work on both ppc and x86 Macs, you need to build a universal binary. For instructions, use Google :-) * To make a 10.2-compatible binary, you must build with GCC 3.3. * These instructions won't work for the 10.1 SDK, that one is harder to use. Well
this is due to dar's design. Since release 2.4.0 two feature can help
you be close to that point, namely --sequential-reading
which asks dar
to read the archive sequentially and -al
option which asks dar to be
relaxed on sanity and coherence checks. You can put a single slice into
a given directory, and create as much empty files as necessary to
simulate slices of that archive which has lower numbers than the real
slice(s) that remains of a partially lost archive. Then using
sequential-reading (--sequential-read option) and laxist mode (-al
option) you will get to the
requested information:
mkdir tempo
cd tempo ln -s ../sowhere/backup.3.dar touch backup.1.dar touch backup.2.dar dar -l backup --sequential-reading -al Note however that using the
laxist mode skips a lot a sanity checks. This method is to be used as
last ressort method upon heavy archive corruption. It is still a good
option to test your archive once on destination medium and if possible
in addition to add redundancy data using Parchive to be able to repair
an archive corrupted due to media problem.
Alternative: Once missing slices have been replaced by empty files (using the touch command for example), if you have the last slice of the archive, you can avoid using --sequential-read mode and only use the lax mode (-al option). You can then use the testing operation to known what file can be retrieved from the archive. if you have not the last slice, you must use --sequential-read mode in addition to lax mode (-al option) If you want to know what particular files a slice contains, you can add the following option: -E "echo '************* Opening slice
%N **********'"
all in one: touch <archive>.<missing
slice>.dar
dar -t <archive> -al -E "echo '****** Opening slice %N ******'" -v > result.txt less result.txt Since version 2.4.0, isolated catalogues can also be used to rescue an corrupted internal catalogue of the archive it has been isolated from. For that feature be possible, a mecanism let dar know if an given isolated catalogue and a given archive correspond to the same contents. Merging two isolated catalogues would break this feature as the resulting archive would not match any real archive an could only be used as reference for a differential backup. Parallel computing programming is
a science by itself. For having done a specialization in that area
during my studies, I can explain briefly here the constraints. A
program can use several processor if the algorithm it uses is able to
be parallelized. Such an algorithm can either statically (at
programming time) or dynamically (at execution time) be cut in several
independent execution threads. These different execution threads must
be as much autonomous as possible between them, if you don't want to
have one thread waiting for another (which is not what we want). The
constraint is this: if you cannot have different threads with no or
very little communication and dependence then parallelization does not
worth it.
Back to dar. From a very abstracted point of view, dar works by fetching files from the filesystem and by appending their data in a single file (the archive). For each file, dar records in memory the location of the data and once all files have been treated, this location information (contained in the so called "catalogue") is added at the end of the archive. One could say that to parallelize file treatment, instead of proceeding file by file, let's do all file at the same time (or rather let's say N files at the same time). OK, but first you would have an important loss of performance at disk level as the disk heads would spend most of the time seeking from one of the N file's data to another of the N file's data. The second point would be that to add a file to the archive you must know the position of the end of the last added file, which is not possible to know in advance because of compression and/or encryption. thus a given thread would have to wait that another has finished to be able to drop in turn the data of the file it owns... As you can guess, parallelizing this way would bring worse performance than the sequential algorithm. Another possibility is to have several thread doing :
OK, you have maybe found also another possibility : having N threads for compression and M threads for encryption. Assuming encryption is faster than compression, we could choose N > M. We could also have a fixed value for N and a dynamic value for M depending on how fast compression is running. Well, this would let dar be able to compress and encrypt several files at the same time, assuming that reading data and data writing time is negligible compared to compression time (which must be demonstrated as several files have potentially to be read at the same time), we could maybe have a real performance gain. But, ... while several files can now be compressed at the same time, only one can be written to disk at a given time. Thus, during the time the compression of a file has started and the time it has finished all other threads have to keep their compressed data in memory. Then a next thread can drop its data to the archive while all other keep compressing to memory (RAM). We will quickly lack of RAM! Or your computer will start to swap, or you have to store the data back to disk in a temporary file, which file will have to be read again and wrote back to archive. So, doing so will bring huge disk performance degradation, as disk will server for read file's data, writing its compressed data to temporary file, reading back its compressed data, writing its compressed data to archive. Last, when using parallelization there is a always a cost due to inter-process communication and concurrent I/O operations on the hardware (here, hard disk are used at the same time to read files to backup and to write them into the archive). This cost becomes negligible when the number of parallel thread increase, assuming all thread are well busy ... here there is a bottleneck, which is the archive creation that seems to avoid a real impressive parallelization. Conclusion, unless you can find another way to parallelize dar, it will not bring noticeable improvement to have a parallelized version of dar. Parallelization is strongly related to the algorithm used, some algorithms are well adapted to this operation some others are not. Is libdar thread-safe, which way do you mean it is? libdar is the part of dar's
source code that has been rewritten to be used by external programs
(like kdar). It has been modified to be used in a multi-threaded
environment, thus, *yes*,
libdar is thread-safe.
However, thread-safe does not mean that you do not have to take some
precautions in your programs while using libdar (or any other library).
How to solve
"configure: error: Cannot find size_t type"?Let's take an example, considering a simple library that provides two functions that both receive the address of an integer as argument. The first increments the given integer up to an specific user key pressed, while the second decrements the given integer up to another user key pressed. This library is thread-safe in the way that there is no static variable in it nor it has any given state at a particular time. It is just a set of two functions. Now, your multi-threaded program is the following: at a given time you have one thread running the first library function while another runs the other library function. All will work fine unless you provided to both threads the same integer. One thread would then increment it while the other would decrement it, and you would not have the expected behavior you could get if you were not using multi-threaded environment. The problem would be the same if instead of using an external library you were accessing this same integer from two different threads at the same time. Care must thus be taken for two different threads not acting on the same variables at the same time. This is however possible with the use of posix mutex, which would define a portion of code (known as a critical section) that cannot be entered by a thread while another one is accessing it (such a thread is suspended until the other thread exits the critical section). For libdar, this is the same, you must pay attention not having two or more different threads acting on the same data. Libdar provides a set of classes, which can be seen as a set of type (like a C struct) with associated functions (known as methods in the object oriented world). From these classes, your program will create objects: each object *is* a variable. Technically, invoking a method on an object is exactly the same as invoking a function giving it as hidden argument a pointer to the object ; while semantically, invoking a method is a way to read or modify this variable (= the object). Thus, if you plan to act on a given object from several threads at the same time, you must use posix mutex or any other mean to mutually exclude the access to this object between all your threads, this way only one thread may read or modify this variable (=this object) at a given time. Note that internally libdar uses some static variables. By static variables, I mean variable that exist even when no thread is running a libdar function or method. These variables are enclosed in critical sections for libdar's user may use it normally. In other words, this is transparent to you. For example, to cancel a libdar call, the mechanism uses an array in which the tid (thread id) by which a call is ran must be canceled: If you wish to cancel a libdar call ran by thread 10, another thread will add the tid 10 to this list. At regular checkpoints, all libdar function check that this same list does not contain the tid the call is ran from. If so, the call aborts/returns and the thread can continue its execution out of libdar code. As you see, several thread may read or write this array of tid at the same time. thanks to a set of mutex this is transparent to you and for this reason, libdar can be said to be thread-safe. This error shows when you lack support for C++ compilation. Check the gcc compiler has been compiled with C++ support activated, or if you are using gcc binary from a distro, double check you have installed the C++ support for gcc. This is the drawback of new features!
Why dar became yet slower since release 2.5.0? This is again the drawback of new features!
How to search for questions (and their answers) about known problems similar to mines? Before sending an email to the dar-support mailing-list, you are welcome to first look in the already sent email if your problem has not yet been exposed and solved. This will first for you be the fastest way to get an answer to your problem, and for me a way to preserve time for development. But yes, there is now tones of emails subjects to read to have a chance to have a chance to find the answer to your problem. The most simple way is to use the search engine at gmane Dar-support mailing-list is archived at sourceforge *and* at gmane.org Only this second archive owns a search engine (look there for the green box at the bottom of the page). This search engine is available for all the mailing list archived at gmane used around dar. Reading the contents of a
directory is done using the usual system call
(opendir/readdir/closedir). The first call (opendir) let dar design
which directory to inspect, the dar call readdir to get the next entry
in the opened directory. Once nothing has to be read, closedir is
called. The problem here is that dar cannot start reading a directory
do some treatment and start reading another directory. In brief, the
opendir/readdir/closedir system call are not re-entrant.
This is in particular critical for dar as it does a depth lookup in the directory tree. In other words, from the root if we have two directories A and B, dar reads A's contents, the contents of its subdirectories, then once finished, it read the next entry of the root directory (which is B), then read the contents of B and then of each of its subdirectories, then once finished for B, it must go back to the root again, and read the next entry. In the meanwhile dar had to open many directories to get their contents. For this reason dar caches the directory contents (when it first meet a directory, it read its whole content and stores it in the RAM). This is only after, that dar decide whether to include or not a given directory. But at this point then, its contents has already been read thus you may get the message that dar failed to read a given directory contents, while you explicitly specify not to include that particular directory in the backup. Dar reports a "SECURITY WARNING! SUSPICIOUS FILE" what does that mean!? When dar reports the following
message:
SECURITY WARNING!
SUSPICIOUS FILE <filepath>: ctime changed since archive of
reference was done, while no inode or data changed You should be concerned by finding an explanation to the root cause that triggered dar to ring this alarm. As you probably know, a unix file has three dates:
However, some rootkits and other nasty programs that tend to hide themselves from the system administrator use this trick and modify the mtime to become more difficult to detect. Thus, the ctime keeps track of the date and time of their infamy. However, ctime may also change while neither mtime nor atime do, in several almost rare but normal situations. Thus, if you are faced to this message, you should first verify the following points before thinking your system has been infected by a rootkit:
Well, if you cannot find an valid explanation from the one presented above, you'd better consider that your system has been infected by a rootkit or virus and use all the necessary tools (see below for examples) to find some evidence of it. Last point, if you can explain the cause of the alarm and are annoyed by it (you have hundred of files concerned for example ) you can disable this feature adding the "-asecu" switch to the command-line. 1 atime may also not be updated at all if filesystem is mounted with relatime or noatime option. Can dar help copy a large directory tree? The answer is "yes" and even for
more than one reason:
Using the following command will
do the trick without relying on temporary file or archive:
dar -c - -R <srcdir>
--retry-on-change 3 -N | dar -x - --sequential-read -N -R <dstdir>
<srcdir> contents will be copied to <dstdir>
both must exist before running this command, and <dstdir>
should
be an empty dir. Here is an example: we will copy the content of /home/my to /home2/my: first we create the destination directory: mkdir /home2/my then we run dar:
dar -c - -R /home/my
--retry-on-change 3 | dar -x - --sequential-read -R /home2/my The "--retry-on-change" let dar
retry the copy of a file up to three times if that file has changed at
the time dar was reading it. You can increase this number at will. If a
file fails to be copied correctly after more than the allowed retry, a
warning is issued about that file and it is flagged as dirty in the
data flow, the second dar command will then ask you whether you want it
to be restored (here copied) on not.
"piping" ('|' shell syntax) the first dar's output to the second dar's input makes the operation not requiering any temporary storage, only virtual memory is used to perform this copy. Compression is thus not requested as it would only slow down the whole process. last point, you should compare the copied data to the original one, before removing it, as no backup file has been dropped down to filesystem. This can simply be done using: diff -r <srcdir> <dstdir> But, no, diff will not check extended Attributes, File Forks or Posix ACL, hard linked inodes, etc. If you want a more controlable way of copying a large directory, simply use dar with a real archive file, compare the archive toward the original filesystem, restore the archive contents to its new place, and compare the restored filesystem toward the original archive. Any better idea? Feel free to contact dar's author for an update of this documentation! Does dar compress per file or the whole archive? Dar uses compression (gzip, lzo, bzip2, xz, ...) with different level of compression (1 for quick but low compression up to 9 best compression but slower) on a file by file basis. I other words, the compression engine is reset for each new file added into the archive. When a corruption occurs in a file like a compressed tar archive, it is not possible to decompress the data passed that corruption, with tar you loose all files stored after such data corruption. Having compression per file has instead the advantage to only impact one file inside the archive and all files that are stored before or after such data corruption can still be restored from that corrupted archive. The drawback is that the overall compression ratio is slightly less good. But note that compressing per file opens the possibility to not compress all files in the archive, in particular already compressed files (like *.jpeg, *.mpeg, some *.avi files and of course the *.gz, *.bz2 or *.lzo files). Avoiding compressing already compressed files save CPU cycles (in other words it speeds up backup process time). And while compressing an already compressed file takes time for nothing, it also leads to require more storage space than if that same file was not compressed a second time. In brief, beside the possibility to not compress already compressed files, compressing file by file, gives a quite equivalent overall compression ratio than what you get when compressing the archive globally, while it may be faster (depending on the data) and allow to recover any file before or after the file impacted by the data corruption within the archive. How to activate compression with dar? Use the --compression option (or -z in short), telling the algorithm to use and the compression level (--compression=bzip2:9 or -zgip:7 for example), you may not mention the compression ratio (which default to 9) and even not mention the compression algorithm which default to gzip. Thus -z or -zlzo are correct. To select file to compress or not compress, several options are available: --exclude-compression (or -Z in short --- the uppercase Z here) --include-compression (or -Y in short). Both take as argument a mask that based on their names define files that have to be compressed or not to be compressed. For example -Z "*.avi" -Z "*.mp?" -Z "*.mpeg" will avoid compressing MPEG, MP3, MP2 and AVI files. Note that dar provides in its /etc/darrc default configuration file, a long list of -Z options to avoid compressing most common compressed files, that you can activate by simply adding compress-exclusion on dar command-line. In addition to excluding/including files from compression based on their name, you can also exclude small files (for which compression ratio is usually poor) using the --mincompr option which takes a size as argument: --mincompr 1k will avoid compressing files which size is less than or equal to 1024 bytes. You should find all details about these options in dar man page. Check also the -am and -ar options to understand how --exclude-compression and --include-compression interact with each other, or how to use regular expressions in place of glob expressions in masks. What slice size can I use with dar? The minimum
slice size is around 20 bytes,
but you will only be able to store 3 to 4 bytes of information per
slice, due
to the slice header that need around 15 bytes in each slice. But there is no
maximum slice size! In other words you can give to -s and -S options
a as long as required positive integer, thanks to its internal own
integer type named "infinint"
dar is able to handle arbitrarily large
integers. This has a slightly memory and CPU penalty in regard to using
native computer
32 or 64 bits integers, but has the advantage to provide a long term
implementation in dar.
You can make use of suffixes like 'k' for kilo, M for mega, G for giga etc... (all suffixes are listed here) to simplify your work. See also the -aSI and -abinary options to swap meaning between ko (= 1000 octets) kio (= 1024 octets). Last point dar/libdar can be compiled using the --enable-mode=64 option given to ./configure while building dar. This replaces the "infinint" type by 64 bits integers, for better performances and reduced memory usage. However this has some drawback on archive size and dates. See the limitations for more details. Is there a dar fuse filesystem? You can find several applications
relying on dar or directly on libdar
to manage dar archive, these are referred here as external software
because they are not maintained nor have been created by the author of
dar and libdar. AVFS
is such external software that provides a virtual file system layer for
transparently accessing the content
of archives and remote directories just like local files.
how dar compares to tar? Here follows a table that
provides
comparison on main points between tar
and dar, if you find errors or inconsistencies, thanks to report them
to dar maintainer.
Why when comparing an archive with filesystem, dar does not report new files found on filesystem? Archive comparison (-d option) is
to be seen as a step further than
archive testing (-t option) where dar checks the archive internal
structure and usability. The step further here is not only to check
that each part of the archive is readable and has a correct associated
CRC but also that it matches what is present on filesystem. So yes, if
new files are present on filesystem, nothing has to be reported. If a
file changed, dar reports that the file does not match what's in the
archive, if a file is missing dar cannot compare it with filesystem
and reports an error too.
So you want to know what has changed on your filesystem? No problem, do a differential backup! OK, you don't want to have a new backup or do not have the space for that, just output the archive to /dev/null and request on-fly isolation as follows: dar -c - -A <ref
archive> -@ <isolated> ... other options ... > /dev/null
<ref archive> is the
archive of reference or an isolated
catalogue from it, <isolated> is the name of the isolated
catalogue to produce. Once the operation has completed, you can list
the isolated catalogue using the following command:
dar -l <isolated> -as
It will give you the exact difference between your current filesystem and the filesystem at the time the "<ref archive>" was done: modified files and new files are reported with [inref] for either data EA or both, while deleted files are reported by "[ --- REMOVED ENTRY ----]" information, followed by the estimated removal date and the type of the removed file ([-] for plain file, [d] for directory, and so on. More details in dar man page for listing command). Why do dar reports truncated filenames under Windows, especially with cyrillic filenames? Dar/libdar has been first
developer for Linux. It has been later ported to many other operating
systems. For Unix-like system (FreeBSD, Solaris, ...), it can run as a
native program by just recompiled it for the target OS and processor.
For Windows system, it cannot because Unix and Windows systems do not
provide the same system calls at all. The easiest way to have dar
running under Windows was to rely on Cygwin, which translates the Unix
system calls to Windows system calls. However Cygwin brings some
limitations. One of them is that it cannot provide filenames longer
than 256 bytes, while today's Windows can have much longer filenames.
What the point with cyrillic filenames? Cyrillic characters unlike most latin ones are not stored as a single byte, they usually use several bytes per character, thus this maximum file size is reached much quicker than with latin filenames, but the problem also exists with them. The consequence is that when dar reads a directory that contains a large filename, the Cygwin layer is not able to provide it entierly: the filename is truncated. When dar wants to read information about that filename most of the time such truncated filename does not exists and dar reports the message from the system that this file does not exists (which might sound strange from user point of view). Since release 2.5.4 dar reports instead that filename has been truncated and that it will be ignored. I have a 32 bits windows system, which binary package can I to use? Up to release 2.4.15 (including)
the dar/libdar binaries for windows were built on a 32 bits windows
(XP) system. After that release, binaries for windows have been built
using a 64 bits windows system (7, now 8 and probably 10 soon).
Unfortunately, the filename of the binary packages for windows do not
reflect that change and have still been labeled "i386" while included
binaries do no more supporting i386 CPU family (which are 32 bits CPU).
This is an oversight that has been unseen until Adrian Buciuman's
remark in dar-support mailing-list September 23d, 2016. In consequence
after that date binary packages for windows will receive an additional
field corresponding to the windows flavor they have been built against.
Some may still need 32 bits windows binaries of dar, unfortunately I have no more access to such system, but if you have such windows ISO image and valid license to give me, I could install it into a virtual machine and provide binary packages for 32 bits too. Until then, you can build yourself the binary for windows. Here follows the recipe: install Cygwin on windows including at least the following packages:
For clarity let's assuming you have extracted dar source package for version x.y.z into C:\Temp directory, thus you now have the directory C:\Temp\dar-x.y.z Run a cygwin terminal and "cd" into that directory: cd /cygdrive/c/Temp/dar-x.y.z In the previous command, note that from within a cygwin shell, the path use slashes not windows backslashes ; note also the 'c' is lowercase while windows shows upper case letter for drives... But don't worry, we are almost finished, run the following script: misc/batch_cygwin x.y.z starting release 2.5.7 the syntax will change / has changed misc/batch_cygwin x.y.z win32 the new "win32" or "win64" field will be used to label the zip package containing the dar/libdar binary for windows, that's up to you to choose the value corresponding to your OS 32/64 bits flavor. At the end of the process you will get a dar zip file for windows in C:\Temp\dar-x.y.z directory. Feel free to ask for support on dar-support mailing-list if you enconter any problem building dar binary for windows, this FAQ will be updated accordingly. when using the "lzo" compression algorithm, dar/libdar always uses the algorithm"lzo1x_999" with the compression level requested
(from 1 to 9) as argument. Dar thus provides 9 different
compression/speed levels with lzo.
On the other size as of today (2017) lzop uses a very degradated lzo algorithm for level 1 (lzo1x_1_15) and the exact same algorithm for levels from 2 to 6 (lzo1x_1) this reduces memory requirement and improves compression speed at the cost of the resulting compression ratio. Lzop compression levels 7 to 9 use the same algorithm as what dar/libdar uses (lzo1x_999). In total lzop only provides 5 different compression levels/algorithms. So now you know why dar is slower than lzop when using lzo compression at level 1 to 6. beside lzo algorithm, dar/libdar provides two additional lzo-based compression algorithms: lzop-1 and lzop-3.As you guess, lzop-1 uses the lzo1x_1_15 algorithm as lzop does for compression level 1, and lzop-3 uses the lzo1x_1 algorithm as lzop does for its compression level 2 to 6. For lzop-1 and lzop-3 compression level is not used so you can keep the default or change its value this will not change dar behavior
|