Skip to content
Matthew Brett edited this page Oct 2, 2014 · 8 revisions

Data packages

  • Should not be in the code source repos;
  • Should be easily available for testing;
  • Should be optional for testing;
  • Should not be installed with the package.

Here are some terms:

  • PACKAGE_PATH – a directory containing a data package. The directory will contain a datapkg.json file (see below);
  • CONTAINER_PATH – A path that contains directories that are PACKAGE_PATHs.

Making data available

You might want to make a suite with git submodules containing the data packages you want:

mkdir nibabel-data
cd nibabel-data
git init
git submodule add https://github.com/yarikoptic/nitest-balls1
git submodule add https://github.com/yarikoptic/nitest-balls2
git commit -m "Added some data packages"

Register by recording the enclosing path as a CONTAINER_PATH:

cd nibabel-data
nib-data container-path add .

or specify that the record should go in the user configuration (see below):

nib-data container-path add --user .

or in the system configuration (see below):

nib-data container-path add --system .

Unregister with:

nib-data container-path rm --user .

You can instead add the contained directories as PACKAGE_PATHs with:

nib-data pkg-path add nitests-balls1
nib-data pkg-path add nitests-balls2

Using data

For a package name nipy-templates:

>>> from nibabel.data import get_package
>>> templates = get_package('nipy-templates')
>>> templates.path
'/usr/local/share/data/nipy-templates'
>>> templates.version
'0.1'

You can also specify a version string:

>>> templates = get_package('nipy-templates', '>=0.3')

Without a version string, get_package returns the package with the highest version.

You can get a package path from the command line too:

nib-data pkg-path get nipy-templates>=0.3

Making a data package

There is a utility to make data packages from files in a directory:

nib-data make-pkg .

This writes a default datapkg.json file (see below).

A data package is a directory with a configuration file called datapkg.json. This must specify package name:

{
 "name" : "nipy-templates",
 "version" : "0.1"
 }

It may also specify version:

{
 "name" : "nipy-templates",
 "version" : "0.1"
 }

If there is no "version", or the version is null, then the library should get this from version control of the package directory, or fail. So this:

{
    "name" : "nipy-templates",
    "version" : null
}

would cause nibabel to try git describe in the first instance to get the package version. If this fails, the package is not valid.

Version comparisons use distutils.version.LooseVersion:

>>> from distutils.version import LooseVersion
>>> LooseVersion('1.3.1') > LooseVersion('1.3.0-519-ga1b925f')
True

By default nibabel will strip an initial v before digits from the output of git describe – for example git describe output of v0.1 will give version 0.1.

If you want a more complicated rule relating git describe to version, use vcs_version_regex:

{
    "name" : "nipy-templates",
    "version" : null,
    "vcs_version_regex" : "rel-(.*)"
}

vcs_version_regex accepts the output of git describe and returns a single group containing the version string, as in:

>>> import re
>>> git_describe_output = 'rel-0.1-111-g1234567'
>>> re.match('rel-(.*)', git_describe_output).groups()[0]
'0.1-111-g1234567'

This allows the package author to have their own preferred tag naming scheme.

datapkg.json can also give MD5 hashes for the files in the archive:

{
    "name" : "nipy-templates",
    "version" : null,
    "md5sums" : {
        "mni/T1.img" : "1ea8f4f1e41bc17a94602e48141fdbc8",
        "mni/T2.img" : "f41f2e1516d880547fbf7d6a83884f0d"
        }
}

Paths are always in Unix (/) format, the data package application will adapt Unix paths when validating MD5 hashes on Windows.

The verify command checks the MD5 sums if present:

nib-data verify nipy-templates

Or, from Python:

>>> templates = get_package('nipy-templates')
>>> result, message = templates.verify()

A data package will usually have both a Unix register executable and a Windows register.bat executable. Running these will register the PACKAGE_PATH with a specified application configuration files (see below). For example, register might be:

#!/bin/bash
nib-data pkg-path add $(dirname $BASH_SOURCE[0]) $@

Data configuration file(s)

The default locations for configuration files are (in order of decreasing precedence):

  • Contents of file namd in NIPY_DATA_CONFIG environment variable;
  • Contents of data.json in $HOME/.nipy (more generally, directory returned by nibabel.environment.get_nipy_user_dir());
  • Contents of data.json in /etc/nipy (more generally, directory returned by nibabel.environment.get_nipy_system_dir()).

In general, values in files with higher precedence override values in files with lower precedence.

If values are lists, files with higher precedence prepend values to the list, so the files with higher precedence put values earlier in the list.

The configuration file can have fields data, with optional subfields package_containers and package_paths:

{
    "data" : {
        "package_containers" : [
            "/usr/local/share/nipy/dipy",
            "/usr/share/nipy/dipy" ],
        "package_paths" : [
            "/usr/local/share/data/nipy-templates",
            "/usr/local/share/data/nipy-data" ]
        }
}

package_paths take precedence over paths found in package_containers, but a path in a package_containers list, in a file with higher precedence, overrides package_paths in files with lower precedence. So, assuming this is a file with lower precendence than the JSON above:

{
    "data" : {
        "package_paths" : [
            "/usr/share/dipy/nipy-dicom"
            ]
        }
}

– then if /usr/share/nipy/dipy/ contains the same nipy-dicom package, this package will override a package with the same name and version contained in /usr/share/dipy/nipy-dicom above.

The configuration files can also include other configuration files:

{
    "data" : {
        "include": [
            "~/data/other_data.json",
            "~/data/more_data.json" ],
        "package_paths" : [
            "/usr/share/dipy/nipy-dicom"
            ]
        }
}

Values in included files take lower precedence than values in the file including them.

Tilde ~ will be expanded to the path of the users home directory for all paths in the configuration file.

Default package container paths

The default package container paths have the lowest precedence. The default package container paths are:

  • $HOME/.nipy/data (more generally, data subdirectory of directory returned by nibabel.environment.get_nipy_user_dir());
  • /usr/share/nipy/data and /usr/local/share/nipy/data (more generally, data subdirectories of directories returned by nibabel.environment.get_nipy_share_dirs()).
Clone this wiki locally