Copyright © 2010-2013 Konstantin Livitski. License terms apply.
Please read file
NOTICE.md
or browse
http://www.livitski.name/projects/data-bag/license for details.
This manual is a work in progress.
Last modified: June, 4 2013
Data-bag is a tool that helps keep files consistent across multiple devices, track the updates, and restore lost or changed files.
It will create and maintain a database with copies of your files on a shared medium, such as a USB drive, memory stick, smartphone, tablet, or network server. As long as you have access to that medium (e.g. carry your phone or memory stick), you can create up-to-date copies of your files on any device capable of running data-bag. Changes made to a local copy will be stored on the shared medium next time you synchronize the copy. Data-bag will keep track of your changes and update other local copies of your files when you synchronize those copies. It will also store histories of changes to synchronized files, allowing you to review past changes, resolve conflicting updates, and restore corrupt or deleted files.
The data-bag executable is usually a single file named databag.jar
that you can run on different machines. A machine can run that file if it has a
Java runtime installed. The tool is compatible with:
OpenJDK Runtime Environment for JDK 6 or 7 that may be installed on, or available as an option for your system; or
Java runtime version 5, 6, or 7 (JRE 1.5+) downloadable from Oracle
Throughout this guide, we assume that your machine runs a (mostly) POSIX-compliant operating system, such as GNU/Linux, BSD, or Mac OS X. Data-bag is designed to run on various platforms, and you may be able to use it on a system that does not comply with POSIX specifications. In that case you have to adjust the syntax of your commands and file locations according to the system's conventions, or install a POSIX compatibility kit, to be able to follow this guide.
We also assume that you use a single shared medium, mounted at /mnt/
,
containing a copy of the data-bag executable databag.jar
and that
your shell can find the Java runtime executable on its search path.
These conditions are not necessary, and data-bag will work in other
environments if you make adjustments to the commands you enter.
To run the standard data-bag executable, use the -jar
option of
the Java runtime command:
$ java -jar /mnt/databag.jar
Initially, there is no bag for data-bag to work with, and the tool informs you of that and prints its usage summary:
Data-bag: file synchronization, backup, and change tracking tool, v.1.05.130122 Copyright 2010-13 Konstantin Livitski and others. See file "LICENSE/data-bag.md" for applicable terms, http://data-bag.org to learn more about the project. Jan 21, 2013 6:50:45 PM name.livitski.databag.cli.Launcher run WARNING: Shared storage not found on /home/user Usage: java -jar databag.jar [options] command [arguments] Common options: [-d medium] [--create] [-C path [--default]] [-A action] [--fn file-id] [--vn version-id] [-o output-file] [-N] [-F filter-name [--default | --invert]] [-v] [database-options] Command: -? | -l | -h | -r | --drop | --log | --purge | [-s]
.....
To create a bag in the current directory, add the
--create
option
to the previous command line. To work with a medium at another location, add the
--medium
switch (or its shorthand -d
) with the path to that medium:
$ java -jar /mnt/databag.jar -d /mnt --create
Given this command line, data-bag will create a bag and tell you that there is no local replica defined for your user account:
Data-bag: file synchronization, backup, and change tracking tool, v.1.05.130120 Copyright 2010-13 Konstantin Livitski and others. See file "LICENSE/data-bag.md" for applicable terms, http://data-bag.org to learn more about the project. Jan 21, 2013 7:05:39 PM name.livitski.databag.db.Manager create INFO: Created database jdbc:h2:file:/mnt/databag/databag;LOCK_MODE=1;COMPRESS_LO B=DEFLATE;MAX_LENGTH_INPLACE_LOB=3500 Jan 21, 2013 7:05:39 PM name.livitski.databag.cli.Launcher run WARNING: Local replica of shared storage /mnt not found for user@d1.data-bag.org
The bag for shared medium at /mnt/
is now stored in the /mnt/databag/
directory:
$ ls -l /mnt/databag/
total 48 -rw-r--r-- 1 user users 47104 2013-01-21 19:05 databag.h2.db
To synchronize files, data-bag needs both a bag and a replica directory that stores local copies of those files. The bag remembers locations of its replicas for each machine and user account combination it is used with. A user can have multiple replicas of the same bag on any machine, but only one of these replicas is treated as the default replica for the user's account. The first replica you create on a machine is designated as its default replica for your account unless you switch the default to another replica.
A directory becomes a replica of your bag when you run data-bag
with the
--local
option (or its shorthand -C
) and the
path to that directory. The same option is used to operate on a non-default
replica for your account:
$ java -jar /mnt/databag.jar -d /mnt/ -C /tmp/demo
If the directory that you designate for a replica does not exist, data-bag will create it for you. If that directory exists, its files are automatically synchronized, i.e. added to your new bag.
Note that the first several lines of data-bag's output are always the same as long as you are using the same executable. These lines contain general information about the tool and the project. When running data-bag from a script, you may want to omit those lines from the output. To do that, add the
--nobanner
option to the command line. In the following examples, we quote the tool's output with that option present, even though it is not shown.Jan 21, 2013 8:04:18 PM name.livitski.databag.app.maint.ReplicaManager registerN ewReplica INFO: Created a new replica #1 for user@d1.data-bag.org at /tmp/demo Jan 21, 2013 8:04:18 PM name.livitski.databag.app.sync.SyncService synchronize INFO: Synchronizing replica #1 for user@d1.data-bag.org at /tmp/demo with databa se at /mnt/databag using filter "all" ...
Once the bag and the default replica are set up, running
data-bag without arguments in the bag's directory
(or, with the
-d
option, in any directory)
will automatically synchronize them:
$ java -jar /mnt/databag.jar -d /mnt/
Jan 21, 2013 8:35:54 PM name.livitski.databag.app.sync.SyncService synchronize INFO: Synchronizing replica #1 for user@d1.data-bag.org at /tmp/demo with databa se at /mnt/databag using filter "all" ...
You can also request synchronization explicitly by entering the
--sync
command (or its shorthand -s
). The following command
does the same as above:
$ java -jar /mnt/databag.jar -d /mnt/ -s
With an explicit --sync
command, you can also tell data-bag what files it
should synchronize, by appending a location pattern argument to it.
$ java -jar /mnt/databag.jar -d /mnt/ -s 'doc/*.txt'
This will synchronize only those files in /tmp/demo/doc
that have suffix
.txt
.
During synchronization, unmatched files from the bag are copied to the replica (except for the files deleted in that replica) and unmatched files from the replica are stored in the bag. Files at the same location relative to both containers are subject to version tracking and, in some cases, conflict resolution.
A typical scenario of everyday data-bag use consists of three steps:
If you know that the bag's contents haven't changed since you last synchronized the replica, you may skip step 1.
Data-bag logs the operations that affect the bag's contents. To review
the log entries, enter the
--log
command, followed by an optional date or
two dates constraining the time frame of interest. Without arguments, the
command will display all the log entries for your bag. With a single date
argument, the output will cover the period starting at that date. For example,
command
$ java -jar /mnt/databag.jar -d /mnt/ --log 2013-01-23
will display the log entries made on or after January, 1st 2013.
The arguments that you pass to data-bag on the command line always belong to
a group that begins with a switch: a literal string with --
prefix or a
shorthand that begins with a dash. The switches, including shorthands, are
case-sensitive.
To data-bag, every switch is either a command or an option. The difference between the two is that you can enter multiple options on a command line, but no more than one command. The command determines the main action that data-bag will take when it runs. Some options may conflict with others, and some may not be compatible with the command you run. In that case you may encounter a warning or error when running data-bag. If you enter two or more commands on the command line, you will get an error, and none of the commands will run.
For example, the
--log
and
--sync
switches shown above
are commands, while
--medium
and
--local
are
options. Thus, you can either display the log of
operations, or synchronize the bag; but in both cases you are able to tell
data-bag where your shared medium is mounted.
As noted above, data-bag runs the synchronization automatically when its
command line has --medium
and --local
options and nothing else. In fact,
the program treats --sync
as the default command and runs it whenever no
other commands are present. In some cases, such as when you are configuring a
bag, you may want to skip the automatic synchronization. To do that, add
the
--nosync
option (shorthand -N
) to the command line. For example,
$ java -jar /mnt/databag.jar -d /mnt/ -C /tmp/demo1 -N
will register /tmp/demo1
as a new replica for the bag, but will not
synchronize that directory.
You can establish a replica in a directory, using the
--local
option
or its -C
shorthand. This is how you un-bag your files on a
new machine. If the new replica's directory does not exist, it will be created
for you. However, its parent directory must exist for the operation to succeed.
A user can have multiple replicas of the same bag on any machine. One of these
replicas is designated as the default replica for the user's account.
Commands that work with a replica use the default replica unless there is
a
--local
option on the command line. The first replica that a user creates
becomes the default replica for his or her account. To make another
replica the default replica for your account, enter the --default
argument after --local
and respective replica's path:
$ java -jar /mnt/databag.jar -d /mnt/ -C /tmp/demo1 --default
Note that this command will also synchronize /tmp/demo1
unless you add the
--nosync
option.
To find out locations of replicas defined for the current user's account, use
the
--list
command (or the shorthand -l
) followed by the replicas
keyword:
$ java -jar /mnt/databag.jar -d /mnt/ -l replicas
The output of this command will look like:
/tmp/demo * /tmp/demo1
The line with an asterisk denotes the current user's default replica.
To remove information about a replica from the bag, log on as that
replica's user and run the --drop replica
command. The contents of the
dropped replica's directory will remain intact, but information about the
past operations with that replica will be lost.
$ java -jar /mnt/databag.jar -d /mnt/ -C /tmp/demo1 --drop replica
There are many practical reasons to avoid tracking and synchronizing certain files, or to synchronize them on different schedules. Some files may be too large to fit on the shared medium. Others are generated automatically and can be rebuilt on any machine. Yet others you may not care about or have access to. Conversely, you may only be interested in tracking files that follow a certain name pattern, but unable to group them in a separate directory.
To address these concerns, data-bag offers a mechanism of filters.
Filters apply to locations of files relative to the bag or
replica that stores them. A filter consists of two sets of
patterns. The first set (includes) limits eligible files to those
matching any of its constituent patterns. The second set
(excludes) removes files matching its patterns from the tentative
list of files. Thus, a file passes the filter when its relative location
matches one or more patterns on the list of includes and none of the patterns
on the list of excludes. As an exception, when the list of includes is empty,
all files that do not match excludes patterns pass the filter.
Filters are stored in a bag, so when you need to apply one of
them, you don't have to re-enter all its components. Each filter has a unique
name within the bag. Filter names are case-insensitive. When choosing a
name for your filter, you may want to use a short descriptive string so you
can remember what it does by looking at the name. Note that filter names listed
in the table below have special meanings to data-bag:
Name | Meaning |
---|---|
all
|
The built-in filter that matches all files. This filter cannot be changed. |
default
|
User-defined filter that data-bag applies to replicas that do not
have their own default filters when no filter is explicitly selected for an
operation. This filter does not exist in a new bag
until you create it. If there is no filter named default , and no
default filter is designated for the replica, data-bag falls back to
the built-in filter all .
|
The easiest way to define a filter and store it in a bag is to use the
--set
option and place all the filter's patterns on the command line:
$ java -jar /mnt/databag.jar -d /mnt -F 'Documents and spreadsheets' \
--set '*.odt:*.ods:*.doc:*.xls' 'Welcome*'
This command will create a filter called "documents and spreadsheets"
(the name is not case-sensitive) that matches files with suffixes .odt
,
.ods
, .doc
, and .xls
in the replica's directory (and none of its
subdirectories), but excludes any files with names beginning with Welcome.
Case sensitivity of the filter's patterns depends on the underlying file
system. POSIX file systems are usually case-sensitive. Operations that do
no affect any replicas (i.e. work with files in a bag only) perform
case-sensitive pattern matching.
The above command will also synchronize the default replica unless you add the
--nosync
option:
Jan 25, 2013 12:24:08 AM name.livitski.databag.cli.Launcher setFilter INFO: Updating filter "documents and spreadsheets" Jan 25, 2013 12:24:08 AM name.livitski.databag.app.sync.SyncService synchronize INFO: Synchronizing replica #1 for user@d1.data-bag.org at /tmp/demo with databa se at /mnt/databag using filter "documents and spreadsheets" ...
Note that the filter name follows the
--filter
option, or its
shorthand -F
. Use that option to select a filter to manipulate, display,
or apply to an operation.
The
--set
option expects two arguments: a list of include
patterns, and a list of exclude patterns. The patterns on each list
are separated by the system-specific path separator string. On
POSIX-compliant systems, that string consists of a colon character, :
.
When a pattern on a list contains white space,
you have to escape it to make sure the entire list is interpreted
by the shell as a single argument. You may also have to escape the ?
and *
characters within patterns to prevent their expansion by the shell. To make
one of the lists empty, you can follow up the --set
option with an empty-string
argument, if your shell allows that, or use a single path separator string
otherwise. For example, command line
$ java -jar /mnt/databag.jar -d /mnt -F 'No temp files' --set : '**/*.tmp' -N
will create a filter that excludes all files with suffix .tmp
anywhere in the
bag's or replica's hierarchy.
To apply a named filter to an operation that supports filters, use the
--filter
option, or its shorthand -F
. You can append the --invert
literal as the second argument to --filter
to make the filter work in reverse,
rejecting files that match it and accepting files that don't. For example,
having defined the prevoius filter, you can list all files with suffix .tmp
that are already in the bag by entering the command:
$ java -jar /mnt/databag.jar -d /mnt -F 'No temp files' --invert -l
To list filters in the bag, use the
--list
command (or its shorthand
-l
) with the filters
keyword:
$ java -jar /mnt/databag.jar -d /mnt -l filters
Depending on what your default replica is, the output of this command may be:
* all documents and spreadsheets no temp files
Note the asterisk on the first line. It marks the default filter applied to the
current replica. Since our default replica does not have a default filter,
and no filter named default
exists in the bag, this replica will have the
built-in filter all
applied to it. To change the default filter for a
replica, append the --default
literal to the
--filter
option:
$ java -jar /mnt/databag.jar -d /mnt -F 'No temp files' --default -N
Note that you may want to use the
--local
option as well unless you
are configuring the default replica. Now, the --list
command line shown
above will result in this output:
all documents and spreadsheets * no temp files
TODO: explain how to display a filter
TODO: explain how to save a filter
TODO: explain how to load a filter (note that --load is an option)
You can list paths to all files in the bag using the
--list
command
(or its shorthand -l
) followed by the files
keyword, or without the keyword:
$ java -jar /mnt/databag.jar -d /mnt -l
Jan 25, 2013 11:03:45 AM name.livitski.databag.cli.Launcher listFiles INFO: Applying filter "all" About_these_files.odt Derivatives_of_Ubuntu.doc Maxwell's_equations.odt Payment_schedule.ods Trigonometric_functions.xls
The listing will include relative locations of both current and deleted files matching the current filter. Data-bag sends the diagnostic output (two lines at the top) is sent to the standard error stream. You can separate it from the list output by redirecting either or both output streams.
With the
--list
command you can display other data records stored in a
bag, such as replicas, filters, and view detailed
information about a filter. That is done by placing a keyword, such as
replicas
, filters
, or filter
, after the command. A keyword that follows
the
--list
command can be typed using letters in any (or both) cases.
TODO: example of the --save option with a list command
When data-bag synchronizes a replica, it detects changes made to the local files. A changed local file is matched against the file at the same location in the bag as explained below. If there is no match, the local file is considered a different version of the bagged file. This usually happens when someone makes changes to the local file and saves them. If the clocks on all machines hosting replicas are set correctly, the changed file will have later modification date than the same file in the bag. With the default settings, data-bag saves changes made to the file in the bag as a new version of that file and retains all its prior versions.
Thus, data-bag remembers the history of each file that it stores. Note that a file's history contains only those versions of the file that were available when it was synchronized. For instance, if you edit a document in the morning, save it, then edit it again in the afternoon, and only then synchronize it, the history of that document's file will not contain a record of the document as it looked during your lunch time. To have changes to your files recorded separately from later changes, synchronize the files that you are editing often.
To review the history of a shared file, run the
--history
command
(shorthand -h
) followed by that file's path relative to the replica's root
directory. For example,
$ java -jar /mnt/databag.jar -d /mnt -h 'About_these_files.odt'
yields this output:
Jan 25, 2013 11:54:37 AM name.livitski.databag.cli.Launcher listVersions INFO: Applying filter "all" === File # 1 with name 'About_these_files.odt' === Version: Id Parent Size Timestamp 1 (none) 181512 2010-03-26 08:21:16 (current) 2 1 182192 2013-01-25 11:54:03 Found 1 existing, 0 renamed, and 0 deleted file(s)
Note that the file has a record number associated with it. These
numbers uniquely identify files in the bag. Also note that each version
record has a number, too (labeled Id
in the output). Those
numbers are unique within the file's history.
TODO: cover the common arguments and options of the --history
command
During synchronization, if there is a file at the same location relative to the bag and the replica, the local file's attributes are compared to those of the prior versions of the bagged file. If there is a match, the local file is replaced by the most recent version from the bag. If the local file is newer than all known versions of the bagged file and the replica has been synchronized to the most recent version of that file in the bag before, the local file is added to the bag as a new version. Otherwise, if there is no match, data-bag detects a conflict. A conflict is a situation of ambiguity with respect to the order or content of versions in a file's history or detection and propagation of a file's deletion.
By default, data-bag does not attempt to resolve conflicts and simply exits with an error message:
Jan 25, 2013 18:46:00 AM name.livitski.databag.cli.Launcher run SEVERE: Please specify how to resolve the conflict between local file /tmp/demo1 /About_these_files.odt (size = 181875, modified at Fri Jan 25 18:44:53 EST 2013) and version 2 (file=1, base=1, size=182192, modified=2013-01-25 11:54:03.0). The file has a synchronization record of file #1 in replica #2 to version #1
To proceed with synchronization, you must tell data-bag how to resolve the
conflict. Add the
--default-action
option (or its shorthand -A
)
followed by one of the keywords from the table below:
Keyword | Meaning |
---|---|
NONE
|
Means that no action will be taken to synchronize the affected file. In other
words, with -A NONE switch, files with a conflict are simply
skipped.
|
UPDATE
|
Tells data-bag to make the local file the most recent version of the bagged file and add it to that file's history as such. The side effect is that the local file may have its modification time adjusted to follow the modification time of the conflicting file in the bag. |
DISCARD
|
Asks data-bag to replace all conflicting local files with recent versions from the shared medium. This mode can cause data loss, so use it with caution. |
TODO: describe the granularity of actions and the use of filters and patterns to address that
Data-bag detects files that have been deleted from the replica after it was last synchronized. During the next synchronization of that replica, those files are marked deleted in the bag. The deletion will propagate to other replicas as they are synchronized. Note that a bag retains histories of its files even after they are marked deleted to allow you to restore such files later. You can also create a new file in a replica at the same location as the previously deleted file. The new file will have a separate history if you synchronize the replica that you create it in after the old file's deletion and before the new file's creation.
The
--history
command with a location argument lists versions of the existing
file and histories of all deleted files at the same relative location. For
example,
TODO: example with multiple histories for the same name
Note that deleted file records have their own numbers. Therefore, when you enter a file's record number, you will be shown only the history of that particular file.
When you rename or move a file within a replica, data-bag currently treats such event as two operations:
In other words, continuity of the file's history is not preserved. There are plans to implement detection of file renames and moves in future, even though often such detection cannot be done reliably without user's intervention.
Data-bag allows you to restore older versions of existing files and the files you deleted after a synchronization. There are two different commands that restore files, each operating in its own way.
The first method is useful when you need to obtain historic files for a
temporary use. For example, you may want to check how a certain document looked
a week ago, make a copy of it, or compare it with the current version. To
obtain such temporary images of historic files, use the
--restore
command described in this chapter.
The other method is needed when you decide to discard unwanted changes to your
files after you have synchronized them with a bag. For example, you might
have made a computation in a spreadsheet, saved and synchronized it. A week
later you may realize that the formulas that you used were incorrect, and
decide to start from scratch. To roll back changes that are already stored in
a bag you may want to run the
--undo
command, described
later.
Restoration of historic files for a temporary use requires knowledge of two
things: original locations and version numbers of the files being restored.
For simplicity, consider a single file restoration first. If you want to see
how the file About_these_files.odt
looked before it was modified, and you
know that there are two versions of that file
in the bag, you can run
--restore
command (or its shorthand -r
) as
follows:
$ java -jar /mnt/databag.jar -d /mnt -r 'About_these_files.odt' --vn 1
This will result in a copy of that file's version with
number 1
restored to the default replica in /tmp/demo
:
Jan 25, 2013 8:39:50 PM name.livitski.databag.cli.PointInTimeAbstractCommand res olveFileSpec INFO: Applying filter "no temp files" Jan 25, 2013 8:39:50 PM name.livitski.databag.app.sync.RestoreService restore INFO: Restoring version 1 (file=1, base=0, size=181512, modified=2010-03-26 08:2 1:16.0) to /tmp/demo/About_these_files.odt ...
The file's relative location follows the
-r
command on the
command line. The
--vn
option specifies the version number to be
restored. Observe that the historic version is written to its replica
location, replacing the current file.
$ ls -l /tmp/demo/About_these_files.odt
-rw-r--r-- 1 user users 181512 2010-03-26 08:21 /tmp/demo/About_these_files.odt
Data-bag will synchronize the file with the bag before overwriting it, so
that file can be retrieved later. Since
--restore
command performs
temporary restore, the file's old version will be replaced again with the
current one next time you synchronize the replica.
If you want to compare the
current version of the file with the original version, or otherwise avoid
confusion between current and historic files, you may tell data-bag to
restore that file elsewhere. To achieve that, add a
--save
option
(or its shorthand -o
) to the command line:
$ java -jar /mnt/databag.jar -d /mnt -r 'About_these_files.odt' --vn 1 \
-o /tmp/about.odt
The argument of the
--save
option is either an absolute location of the
restored file or a location relative to the current directory. If you restore
a single file into a different location in the current replica, data-bag
will add it to the bag right away, without the need for synchronization.
However, data-bag will not overwrite any existing files when restoring with
the
--save
option.
When the above command is run without a version number, data-bag restores the most recent version of the file, i.e.
$ java -jar /mnt/databag.jar -d /mnt -r 'About_these_files.odt'
will cause the subsequent command
$ ls -l /tmp/demo/About_these_files.odt
produce this output:
-rw-r--r-- 1 user users 182192 2013-01-25 11:54 /tmp/demo/About_these_files.odt
You can also restore the most recent version of a file as of a specific moment
in the past, by adding
--as-of
option (or its shorthand -a
). Note that
--as-of
option cannot be used with
--vn
.
$ java -jar /mnt/databag.jar -d /mnt -r 'About_these_files.odt' \
-a 2012-12-31 22:12:12
Note that the time argument at the end of this line is optional. If you omit
it, data-bag assumes 00:00:00
(midnight at the beginning of the calendar
day) as the target time.
TODO: show how to restore deleted files
TODO: explain the use of file numbers when restoring files
When you don't have a fixed version number to restore, you can restore
multiple files with relative locations matching a pattern. You can use the
--save
option with such command if the argument points to an empty directory
that is neither the current replica's root nor any of its descendants. For
example, commands
$ mkdir -p ~/Desktop/2012
$ java -jar /mnt/databag.jar -d /mnt -r '*.odt' -a 2013-01-01 -o ~/Desktop/2012
will result in all files with suffix .odt
from the root directory of your bag
that had versions modified before the end of 2012 having copies of these
versions restored to the new directory Desktop/2012
in your the home area of
your user account.
If any of the restored files have to be written to a location within the current replica, e.g. by following a symbolic link, the operation will fail.
TODO: describe the undo operation
TODO: explain how undo deletes and un-deletes files
When data-bag updates a bag, it retains all deleted files and historical
versions of existing files. Thus, bags tend to grow in size
during each synchronization. To avoid running out of space on the shared
medium, you may occasionally want to purge old versions and deleted files.
To do that, use the
--purge
command followed by a date in yyyy-mm-dd
format:
$ java -jar /mnt/databag.jar -d /mnt --purge 2012-01-01
The date on the above command line denotes the beginning of a new epoch.
To specify the epoch more precisely, you may add a time argument following
the date. The time format is hh:mm:ss[.f]
, where hour is taken from a 24-hour
clock, and fractions of a second may be omitted:
$ java -jar /mnt/databag.jar -d /mnt --purge 2013-01-24 22:02:01
When no time is specified, 00:00:00
(midnight at the beginning of the calendar
day) is assumed.
Bags that were never purged have their epochs beginning at the earliest modified time of any version they store. Data-bag is not required to retain versions, deleted files, or log records beyond the current epoch, and attempts to delete that data permanently. Notable exceptions are the current versions of existing files that haven't been modified since the epoch began, and, in some cases, version records needed to restore other versions modified during the current epoch. The purge operation is irreversible, so use it with caution.
To help you prevent unauthorized access to bags, data-bag supports encryption of data on the shared medium. Data-bag uses symmetric cryptography to encrypt bags and, currently, applies a single key to each bag in its entirety. Therefore, you should use long randomized keys for bag encryption and store them securely.
The data-bag's interface allows you to set up encryption and estblish a key
when you create a bag. Then, you have to feed the same key to the software
every time you run an operation on the encrypted bag. In both cases, you
need to add the
--encrypt
option (or the shorthand -E
) to the command line.
The arguments to the
--encrypt
option differ depending on how you provide the
key to data-bag:
To submit the key via standard input, enter the stdin
keyword following
the
--encrypt
option. Note that your terminal will echo the key unless you
redirect the standard input.
$ java -jar /mnt/databag.jar -d /mnt --create -E stdin
To enter the key on the terminal without echo, use the ask
keyword. This
option requires Java runitme version 6 or newer and cannot be used with
input or output redirection.
To enter the key on the command line, or use a shell variable, append the
key
keyword and the key argument or variable to the command line after the
--encrypt
option. This is the only method that allows you to add line
separator characters to your key.
$ java -jar /mnt/databag.jar -d /mnt -l -E key "$BAG_KEY"
The
--encrypt
option is insensitive to the letter case of its keywords.
Given the
--encrypt
option without arguments, data-bag defaults to asking
user to enter the key via terminal (as in the ask
mode) in the environments
supporting that. Where the terminal input is not supported, the software issues
a warning message and falls back to the standard input method.
Data-bag does not accept the space character (code 32
) in the encryption
keys. To use a passphrase for bag encryption (recommended), you need an
alternative way to separate its words. One option is to delimit them with
punctuation marks.
Data-bag currently supports two encryption algorithms: AES and XTEA. The
default encryption algorithm for bags is AES. To use the alternative
algorithm, append keywords --cipher xtea
to the
--encrypt
option:
$ java -jar /mnt/databag.jar -d /mnt/teabag --create -E ask --cipher xtea
Although data-bag's interface does not yet allow you to encrypt or decrypt an existing bag, or change the encryption algorithm or key, you can still do that by calling the database management library embedded in the software directly:
$ java -cp /mnt/databag.jar org.h2.tools.ChangeFileEncryption \
-dir /mnt/teabag/databag/ -db databag -decrypt "password" -cipher XTEA
For details about the encryption management tool embedded in data-bag, please
refer to the the ChangeFileEncryption
command reference in the
H2 database documentation, available online at
http://h2database.com/html/features.html#file_encryption. You can obtain the
tool's usage summary by running it without arguments:
$ java -cp /mnt/databag.jar org.h2.tools.ChangeFileEncryption
The directory argument to the ChangeFileEncryption
tool must point to the
databag
directory of the medium with your bag. The database argument
must be databag
, too.
Syntax: --help
Prints the command line syntax summary and exits.
Syntax: --drop
type [ --force
]
Removes a record from the bag. The type
argument communicates the type of a record to be removed.
Supported types are REPLICA
and FILTER
. REPLICA
type must be
used in conjunction with the
--local
option to select the
replica to drop. FILTER
type requires a
--filter
option
that tells data-bag what filter to drop. Built-in filter all
cannot be dropped. If there are replicas that use the filter
being dropped as their default filter, the command will fail
unless followed by the --force
switch.
Syntax: --history
[ location ]
Lists all versions of a file. The argument is a relative location
of the file in the bag. The location must be exact, which means it
cannot contain wildcard characters. When a location is specified,
data-bag applies the current filter to it. If the location does
not satisfy the filter, no histories are displayed. If there were
deleted files with the same name, their histories are listed too.
If you omit the file argument, you must enter a file number on the
command line using the
--fn
option.
Syntax: --list
[ type ]
Lists items in the bag. The case-insensitive
argument designates the type of items that will be listed.
It can take values FILES
, REPLICAS
, FILTER
, or FILTERS
. The
default is FILES
. The output will contain a header and
may be formatted to accommodate a standard terminal. If you redirect
the output to a file with the
--save
option, it will
contain neither the header nor the terminal formatting. With the
FILTER
argument, the output file is formatted to allow loading
it into a filter with the
--load
option.
Syntax: --log
[ time-frame ]
Displays the log of operations that might have changed
contents of the bag. Optional time-frame arguments
formatted as yyyy-mm-dd[ hh:mm:ss[.f...]]
specify the
beginning (inclusive) and the end (exclusive) of the log
fragment to print. If only one argument is present, it is
treated as the beginning of the time frame and infinity is
assumed to be the end. Note that the white space between the
date and time parts of each argument must be included in the
argument. You may have to escape or quote that white space
when running data-bag in a shell. If you omit the time part
of an argument, data-bag will assume 00:00:00.0 local time
on the date you enter. Note that
--purge
erases the log
entries beyond the epoch.
Syntax: --purge
epoch
Purges the versions of files in the bag
modified before the beginning of an epoch. The epoch argument
has the yyyy-mm-dd
format followed by an optional hh:mm:ss[.f...]
part. The optional part is a separate argument on the command line. In
other words, you must not escape the white space between the parts of
the epoch argument. When run with the built-in filter "all"
, this
command also purges the log of operations with
the bag prior to the new epoch.
Syntax: --restore
[ file-or-pattern ]
Restores file(s) from the bag. The argument
following this command must either be the relative location
of a file in a bag, or a relative location pattern.
Single-file lookup by name will only succeed if there was
just one file having that name, i.e. there were no histories
of deleted or renamed files with the same name in the
bag. Alternatively, you can specify a file number
using the
--fn
option. To restore a historic version of a
file, enter the
--vn
option with a version number or the
--as-of
option with a date. To restore the file to a
different location or under a different name than its
current replica, use the
--save
option to enter the
intended destination. If the destination is a descendant of the
current replica directory, the restored file will be
automatically added to the bag. When restoring
multiple files, the argument to
--save
option should point
to an empty directory that is neither the current replica's
directory nor any of its descendants. Multiple-file restore
to a target directory will fail if any of the restored files
have to be written to a location within the current replica.
If you don't enter a target directory on the command line,
files that match the pattern and the current filter are
restored to their locations in the current replica. The
replica may become ouf-of-sync with bag if
restored versions are not the current ones. When restoring
files matching a pattern, you cannot enter
--fn
or
--vn
options. To obtain historic versions of files, use the
--as-of
option with a date of interest. Without that option,
the files will be restored to their current versions.
Syntax: --sync
[ location-pattern ]
Synchronizes file(s) in the bag with the current
replica. This command runs by default if you have selected
the bag with
--medium
, informed data-bag about the
current replica, either using
--local
or by designating the
default replica, and did not enter any other command on the
command line. When you enter this command explicitly, you may add a
pattern argument to limit the operation to a subset of files within
the replica. You can also synchronize a specific file by entering its
number after the
--fn
option. This works
regardless of whether
--sync
is explicitly entered on the
command line. Note that you cannot use the
--nosync
option
with this command.
Syntax: --undo
[ file-or-pattern ]
Reverts file(s) in the bag to a historic state.
This command retains the undone changes to files in a bag as
branches of those files' version trees. The argument
following this command must either be the relative location
of a file in a bag, or a relative location pattern.
Single-file lookup by name will only succeed if there was
just one file having that name, i.e. there were no histories
of deleted or renamed files with the same name in the
bag. Alternatively, you can specify the file number
using the
--fn
option. To return to the file's
version with a certain version, enter the
--vn
option with that number. To return to the file's
contents as of a specific date, enter the
--as-of
option with that date. When reverting files
matching a pattern, you can only use the
--as-of
option
to select the files' versions. Without that option, the files will
be reverted to the current date. Such operation has no effect on
local files unless their histories have future-dated versions. By
default, --undo
will synchronize the file(s) matching the name or
pattern (and the effective filter), or the numbered file regardless of
the filter, with the current replica. To prevent such synchronization,
use the
--nosync
option.
Syntax: --default-action action
Sets the default action to take in case of a
version conflict. Allowed values are NONE
, UPDATE
,
and DISCARD
.
Syntax: --as-of
[ date [time] ]
Specifies the moment in time to look up in files' histories.
Use this option with
--restore
to obtain a copy of the file's
data as of a certain time in the past, or with
--undo
to
return a file in the bag to a historic state. Note that time-bound
commands may produce correct results only within a certain range of
dates. For instance, you may not be able to restore a file
to a state beyond the initial synchronization time or beyond the epoch if
--purge
has been run. Use the
--log
command to
determine the feasible date range for a bag. The
argument must be in yyyy-mm-dd
date format followed by an
optional hh:mm:ss[.f...]
part. The optional part is a
separate argument on the command line. In other words, you
must not escape the white space between the parts of the
argument.
Syntax: --allow-time-diff
threshold
Sets the difference threshold for files' time stamps to be considered distinct. Measured in milliseconds. The default is 3 seconds minus one millisecond.
Syntax: --local
path
Sets the root path of the replica to work with. A user may create
multiple replicas of the same bag and synchronize them one at a
time. Append --default
to make this replica the default replica
for your user account.
Syntax: --cds
percentage
Adjusts the program's memory utilization allowance. The less memory data-bag is allowed to use, the more disk space it will need to store files. This parameter limits the size of a structure describing differences between versions of any file that data-bag is allowed to keep in memory. The boundary is set as a percentage or a fraction of the JVM's maximum heap size. Default value of this parameter is 10%.
Syntax: --compress
mode
Selects a compression algorithm to be used for files stored
in the bag. Supported values are NO
, LZF
, and
DEFLATE
. Defaults to DEFLATE
. This setting is stored in the
bag and affects future invocations. It does not change
the format of existing data in the bag.
Syntax: --create
Asks data-bag to create a new bag. Use the
--medium
option
to choose the new bag's location. To have the bag encrypted, add the
--encrypt
option.
Syntax: --medium
root [path]
Points to a medium or directory containing the bag. The default is current directory. Optional path argument points to a subdirectory on the selected medium if it stores multiple bags.
Syntax: --dcs
percentage
Limits the amount of data that data-bag will have to read when it restores a file. A lower limit reduces the time it will take to synchronize and retore files at the expense of additional storage space used by the bag. The parameter is the maximum total size of all incremental differences between the complete image of a file and any new version stored in the bag. Once data-bag exceeds that limit, it stores the new version of a file in its entirety. The boundary is set as a percentage or a fraction of the file's size. The default value of this parameter is 50%.
Syntax: --encrypt
[ key-source ... ] [ --cipher AES
| --cipher XTEA
]
Tells data-bag to use encryption when creating or opening
the bag. To enable encryption of a bag, use this option when
you create it with
--create
switch. Once a bag is encrypted,
the key and cipher remain the same. You have to include
--encrypt
option with the correct key and cipher every time
you use that bag. To change encryption parameters, use the
org.h2.tools.ChangeFileEncryption
utility included
with the data-bag distribution. That utility also allows you to
encrypt or decrypt an existing bag. You can place the
encryption key on the command line, have it read from
standard input, or enter it interactively when data-bag
starts. An optional argument that follows
--encrypt
selects
an encryption key or its source. If that argument is the
word key
, data-bag will use the next command line argument
as the key. If you enter the ask
string as the argument,
data-bag will attempt to ask you for password interactively.
That only works with Java 6 or newer when data-bag is run
from a shell without input or output redirection. Finally,
you may have data-bag read the key from standard input by
entering stdin
argument. If you do that, your input may
be shown on screen. By default, data-bag will try to use the
console and fall back to the standard input if the console
is unavailable. Regardless of how data-bag obtains the
encryption key, it will not accept keys that contain a space
character (ASCII 32). If the password is entered
interactively or read from the standard input, it cannot
contain end-of-line sequences either. You can append the --cipher
switch to
--encrypt
to select an encryption
algorithm. Supported algorithms are AES
and XTEA
. The
default cipher is AES
.
Syntax: --filter
name [ --default
| --invert
]
Selects a filter to apply to the set of files before
performing the requested command. Files that satisfy the
filter will be processed, while those that don't will be
ignored. Filters apply to both local files and files in a bag.
You can add the --invert
modifier following the filter name to
reverse the filter's effect. You can designate a default
filter for the current replica that will apply when no other
filter is selected. You do that by entering the --default
modifier after the filter name. Replicas that do not have a
default filter assigned will use the filter named default
,
if it exists, or the built-in filter all
otherwise. Filter
option is also used to designate a filter to load, display,
save, or delete, when applicable.
Syntax: --fn
file-id
Chooses a file in a bag by its number. Use this option with
commands like
--history
or
--restore
to
resolve ambiguity among the file records. When a file is specified
by number, normal filtering rules are ignored during the
file lookup.
Syntax: --load
from-file
Loads a filter definition from a file. Use it in conjunction
with
--filter
option that specifies the name of a filter to
load. A file name must follow the --load
option and point to
a file with a valid filter definition. Filter definition files are
created by running
--list filter
command with the
--save
option. Note that you cannot load the built-in filter
all
, but you can load the filter named default
.
Syntax: --lob-size
bytes
Adjusts the storage policy that data-bag applies to its binary data. The parameter is the maximum size of a binary object, such as contents of a file, that triggers its storage in a separate file on the medium containing the bag. Smaller objects are stored in the bag's main file. The default threshold is 3500 bytes. This setting is stored in the bag and affects future invocations. It does not change the storage strategy for existing data in the bag.
Syntax: --nosync
Disables automatic synchronization of the current replica. Use this option when you want to do additional setup before using the replica, or to change settings without synchronizing.
Syntax: --nobanner
Instructs data-bag to omit the header from its output. This option simplifies the output parsing when running data-bag from a script.
Syntax: --save
file
Writes the program's output to a file. When used in conjunction with
--restore
command, this option causes data-bag to
restore files to locations other than their current replicas.
With commands that display lists or other information, this
option redirects the output and suppresses some formatting. With
--list filter
, this option instructs data-bag to create
a file that you can later load into a filter using the
--load
option.
Syntax: --set
include exclude
Updates a filter definition from the command line. Use it in
conjunction with the
--filter
option that specifies the
name of a filter to change. Note that you cannot change the built-in
filter all
, but you can change the filter named default
.
There must be two arguments following this option. First
argument is expected to list the location patterns to
include in filtered results, while second argument should
list the patterns to exclude. Both lists must use
system-dependent path delimiter (for example, :
on Unix
and Mac, or ;
on Windows) to separate their elements.
Lists that contain spaces must be properly escaped to
prevent the operating system from treating them as multiple
arguments. To omit one of the lists, use either an empty
argument or a single path delimiter. If the inclusion list
is omitted or empty, data-bag implies an include-all
pattern.
Syntax: --upgrade-db
Enables schema evolution for bags created by previous versions of data-bag. Please remember to back up your bag before using this option. That will help you recover from problems that you may encounter during the upgrade.
Syntax: --verbose
[ level ]
Runs in verbose mode, logging additional status information. The level
argument is optional. -vv
makes the data-bag run in the debug mode.
Syntax: --vn
version-id
Selects a file's version by its number. Use this
option with
--restore
to restore an older version of a file.
--purge
command.
You are welcome to submit feedback as well as your suggestions about the software. If you would like to contribute to the project, we are looking forward to working with you!
You can send a message to the project's team via the Contact page at http://www.livitski.name/ or via the project's page on GitHub.
Thank you for using data-bag!