scitex_io

scitex-io — Universal scientific data I/O with plugin registry.

Functionalities

save(obj, “path.ext”) / load(“path.ext”) — extension-dispatched one-call I/O for 30+ formats (CSV, Parquet, Feather, NumPy, pickle, YAML, JSON, HDF5, Zarr, MATLAB, images, matplotlib figures, PyTorch, MNE, EDF, video).
register_saver(“.ext”) / register_loader(“.ext”) — plugin hooks for user-defined formats; dispatch lookup follows the same registry.
load_configs() — collect every <project-root>/config/*.yaml into a single DotDict with UPPER_CASE normalisation + DEBUG_ overrides.
glob / parse_glob — natural-sorted globbing with {placeholder} parsing; cache / reload / flush — load-cache management.

IO

Reads: any registered extension; ./config/*.yaml; $SCITEX_DIR cache; figure metadata (PNG tEXt, JPEG EXIF, SVG XML, PDF XMP).
Writes: relative paths resolve under {caller}_out/ (script / notebook) or $SCITEX_DIR/io/runtime/cache/ (REPL); absolute paths pass through unchanged.

Dependencies

Hard: tqdm, PyYAML, ruamel.yaml, mne, numpy, pandas, click, rich, natsort, scitex-dev, scitex-logging.
Optional ([scientific]): scipy, h5py, zarr>=3, numcodecs, matplotlib. ([mcp]): fastmcp.

Register custom handlers:

from scitex_io import register_saver, register_loader

@register_saver(".myformat")
def save_myformat(obj, path, **kw): ...

@register_loader(".myformat")
def load_myformat(path, **kw): ...

Top-level imports are PEP 562 lazy — import scitex_io is cheap. Public symbols load on first attribute access. See _skills/general/03_interface_01_python-api/04_lazy-imports-and-optional-deps.md.

scitex_io.register_saver(ext, fn=None, *, builtin=False)[source]

Register a save handler for a file extension.

Can be used as a decorator or called directly:

@register_saver(".json")
def my_json_saver(obj, path, **kwargs): ...

register_saver(".json", my_json_saver)

Parameters:

ext (str) – File extension (e.g., “.json”, “json” — dot is optional).
fn (Callable, optional) – Handler function (obj, path, **kwargs) -> None. If None, returns a decorator.
builtin (bool) – If True, registers as built-in (lower priority). User registrations always override built-ins.

scitex_io.register_loader(ext, fn=None, *, builtin=False)[source]

Register a load handler for a file extension.

Same API as register_saver().

Parameters:

ext (str) – File extension (e.g., “.json”, “json” — dot is optional).
fn (Callable, optional) – Handler function (path, **kwargs) -> Any.
builtin (bool) – If True, registers as built-in (lower priority).

scitex_io.get_saver(ext)[source]

Look up a save handler. User overrides take priority.

Lazy builtin specs ((module_path, attr_name) tuples) are resolved on first access and memoised in place.

Return type:: Optional[Callable]

scitex_io.get_loader(ext)[source]

Look up a load handler. User overrides take priority.

Lazy builtin specs ((module_path, attr_name) tuples) are resolved on first access and memoised in place.

Return type:: Optional[Callable]

scitex_io.list_formats()[source]

List all registered formats.

Returns:: A dict with keys "save" and "load", each containing "builtin" and "user" format lists.
Return type:: dict

Notes

Builtin entries are listed regardless of whether they have been lazy-resolved yet — registration is what counts.

scitex_io.unregister_saver(ext)[source]

Remove a user-registered saver. Returns True if found.

Return type:: bool

scitex_io.unregister_loader(ext)[source]

Remove a user-registered loader. Returns True if found.

Return type:: bool

scitex_io.save(obj, specified_path, makedirs=True, verbose=True, symlink_from_cwd=False, symlink_to=None, dry_run=False, no_csv=False, use_caller_path=True, env_detector=None, **kwargs)[source]

Save obj by extension; specified_path is caller-anchored.

The file format is selected from specified_path’s extension via the plugin registry — .csv, .npy, .pkl, .yaml, .png, .h5, … 30+ formats are built in; custom extensions can be added with register_saver.

Path resolution rules (when specified_path is relative):

Called from a script /path/to/analysis.py → /path/to/analysis_out/<specified_path>.
Called from a notebook /path/to/exp.ipynb → /path/to/exp_out/<specified_path>.
Called from python -i / IPython / interactive REPL → $SCITEX_DIR/io/runtime/cache/<specified_path> (default ~/.scitex/io/runtime/cache/). Honours the canonical scitex local-state convention; see scitex-dev skills/general 01_ecosystem_06_local-state-directories.md.
Absolute path → used as-is, no routing.

Intermediate directories are created automatically — callers do not need os.makedirs() / Path.mkdir().

Parameters:

obj (Any) – The object to be saved.
specified_path (Union[str, Path]) – The filename or relative path under which to save obj. May contain subdirectories ("sub/dir/file.csv"); intermediates are auto-created. Absolute paths bypass routing.
makedirs (bool, optional) – Create parent directories on demand. Default True.
verbose (bool, optional) – Print a one-line success message. Default True.
symlink_from_cwd (bool, optional) – Drop a symlink at ./<specified_path> pointing into the auto-routed location. Default False.
symlink_to (Union[str, Path], optional) – Plant a symlink at this custom path pointing to the saved file.
dry_run (bool, optional) – Print the resolved path without writing. Default False.
no_csv (bool, optional) – Skip the auto-CSV sidecar for figure saves. Default False.
use_caller_path (bool, optional) – Resolve the anchor from the calling script, not the immediate caller — needed when save is wrapped by a library. Default False.
**kwargs – Passed through to the per-format handler.

Returns:

Path to saved file on success, None/False on error.

Return type:

Path or None

scitex_io.load(lpath, ext=None, show=False, verbose=False, cache=True, **kwargs)[source]

Public wrapper around _load_impl() that fires post-load hooks.

Glob expansion is handled inside _load_impl; the inner per-file recursion already routes through load so each match fires its own hook. Only the outer non-glob path fires once here.

Return type:: Any

scitex_io.load_configs(IS_DEBUG=None, show=False, verbose=False, config_dir=None)[source]

Load and merge every YAML under config_dir into one DotDict.

Filename stems become top-level keys; YAML keys become nested attributes. Every string key (filename stem and every nested key) is normalised to UPPER_CASE at load time so the in-memory tree is case-stable regardless of source casing — model.yaml with hidden_dim: 256 lands at CONFIG.MODEL.HIDDEN_DIM. Lookups on the returned DotDict are case-insensitive for string keys, so CONFIG.SEIZURE.STR2COLOR["seizure"] resolves the stored "SEIZURE" entry — no surprise KeyError for the lowercase key the author wrote (non-string keys are matched exactly).

If two keys inside one mapping fold to the same UPPER form (e.g. MODEL.yaml next to model.yaml, or HIDDEN_DIM next to hidden_dim, or "seizure" next to "SEIZURE" in one string-mapping), a loud ValueError is raised at load time naming the source file, the mapping path, and both offending keys. The collision is never silently merged or dropped.

Debug mode promotes any DEBUG_<KEY> sibling over its non-debug counterpart, so a single IS_DEBUG.yaml flips the whole project between production and debug values. Equivalent triggers: IS_DEBUG.yaml with IS_DEBUG: true, the IS_DEBUG=True kwarg, or running under CI=True.

Parameters:

IS_DEBUG (bool, optional) – Force debug mode. If None (default), inferred from IS_DEBUG.yaml inside config_dir or from the CI env var.
show (bool) – Echo the DEBUG_<KEY> -> <KEY> substitutions to stdout.
verbose (bool) – Print detailed information.
config_dir (Union[str, Path], optional) – Directory containing the YAML files. Defaults to "./config".

Returns:

Merged configuration tree with UPPER_CASE keys throughout.

Return type:

DotDict

Raises:

ValueError – If two keys inside one mapping fold to the same UPPER form (a case collision). Raised at load time, naming the file, the mapping path, and both offending keys.
ConfigLoadError – If reading or processing any YAML file under config_dir fails for any reason other than a case collision (malformed YAML, missing required file under categories/, an apply_debug_values walker crash on a malformed mapping, …). The message names the offending file path; the original exception is chained as __cause__ so the traceback shows the root error. Replaces the prior swallow-and-return-empty- DotDict behaviour, which made every config bug surface as a baffling 'DotDict' object has no attribute 'X' three frames away from the actual failure.

Examples

>>> CONFIG = load_configs()                       # ./config/*.yaml
>>> CONFIG.MODEL.HIDDEN_DIM                       # 256
>>> CONFIG = load_configs(IS_DEBUG=True)
>>> CONFIG.MODEL.HIDDEN_DIM                       # 32 (DEBUG_ promoted)

scitex_io.glob(expression, parse=False, ensure_one=False)[source]

Perform a glob operation with natural sorting and extended pattern support.

This function extends the standard glob functionality by adding natural sorting and support for curly brace expansion in the glob pattern.

Parameters:

expression (str or Path) – The glob pattern. Supports standard glob and {a,b} expansion.
parse (bool, optional) – Whether to parse the matched paths. Default is False.
ensure_one (bool, optional) – Ensure exactly one match is found. Default is False.

Returns:

If parse=False: naturally sorted file paths. If parse=True: tuple of (paths, parsed_results).

Return type:

list or tuple

Examples

>>> glob('data/*.txt')
['data/file1.txt', 'data/file2.txt', 'data/file10.txt']

>>> glob('data/{a,b}/*.txt')
['data/a/file1.txt', 'data/a/file2.txt', 'data/b/file1.txt']

>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True)
>>> paths
['data/subj_001/run_01.txt', 'data/subj_001/run_02.txt']
>>> parsed
[{'id': '001', 'run': '01'}, {'id': '001', 'run': '02'}]

>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True, ensure_one=True)
AssertionError  # if more than one file matches

scitex_io.parse_glob(expression, ensure_one=False)[source]

Convenience function for glob with parsing enabled.

Parameters:

expression (str or Path) – The glob pattern.
ensure_one (bool, optional) – Ensure exactly one match is found. Default is False.

Returns:

Matched paths and parsed results.

Return type:

tuple

Examples

>>> paths, parsed = pglob('data/subj_{id}/run_{run}.txt')
>>> paths
['data/subj_001/run_01.txt', 'data/subj_001/run_02.txt']
>>> parsed
[{'id': '001', 'run': '01'}, {'id': '001', 'run': '02'}]

>>> paths, parsed = pglob('data/subj_{id}/run_{run}.txt', ensure_one=True)
AssertionError  # if more than one file matches

scitex_io.reload(module_or_func, verbose=False)[source]

Reload a module or the module containing a given function.

This function attempts to reload a module directly if a module is passed, or reloads the module containing the function if a function is passed. This is useful during development to reflect changes without restarting the Python interpreter.

Parameters:

module_or_funcmodule or function: The module to reload, or a function whose containing module should be reloaded.
verbosebool, optional: If True, print additional information during the reload process. Default is False.

Returns:

: None

Raises:

Exception: If the module cannot be found or if there’s an error during the reload process.

Notes:

Reloading modules can have unexpected side effects, especially for modules that maintain state or have complex imports. Use with caution.
This function modifies sys.modules, which affects the global state of the Python interpreter.

Examples:

>>> import my_module
>>> reload(my_module)

>>> from my_module import my_function
>>> reload(my_function)

scitex_io.flush(sys=<module 'sys' (built-in)>, *, sync_fn=None)[source]

Flushes the system’s stdout and stderr, and syncs the file system. This ensures all pending write operations are completed.

sync_fn lets callers (typically tests) substitute os.sync() with a no-op or recording callable. Defaults to os.sync().

scitex_io.cache(id, *args, cache_root=None)[source]

Store or fetch data using a pickle file.

This function provides a simple caching mechanism for storing and retrieving Python objects. It uses pickle to serialize the data and stores it in a file with a unique identifier. If the data is already cached, it can be retrieved without recomputation.

Parameters:

id (str) – A unique identifier for the cache file.
*args (str) – Variable names to be cached or loaded.
cache_root (Path or None, optional) – Explicit cache directory. Defaults to $SCITEX_DIR/io/runtime/cache/ (~/.scitex/io/runtime/cache/ fallback), honouring the canonical scitex local-state convention.

Returns:

A tuple of cached values corresponding to the input variable names.

Return type:

tuple

Raises:

ValueError – If the cache file is not found and not all variables are defined.

Example

>>> import scitex
>>> import numpy as np
>>>
>>> # Variables to cache
>>> var1 = "x"
>>> var2 = 1
>>> var3 = np.ones(10)
>>>
>>> # Saving
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)
>>>
>>> # Loading when not all variables are defined and the id exists
>>> del var1, var2, var3
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)

scitex_io.configure_cache(enabled=None, max_size=None, verbose=None)[source]

Configure cache settings.

Parameters:

enabled (Optional[bool]) – Enable or disable caching
max_size (Optional[int]) – Maximum number of files to cache
verbose (Optional[bool]) – Enable verbose logging

Return type:

None

scitex_io.get_cache_info()[source]

Get cache statistics and configuration.

Returns:: Cache information including stats and config
Return type:: Dict[str, Any]

scitex_io.clear_load_cache()

Clear all cached data.

Return type:: None

class scitex_io.DotDict(dictionary=None)[source]

Bases: object

A dictionary-like object that allows attribute-like access (for valid identifier keys) and standard item access for all keys (including integers, etc.).

Case-insensitive on string-key lookup, storage-stable

Keys are stored exactly as set (load_configs separately normalises every config key to UPPER on load). Lookups, however, are case-insensitive for string keys: d["seizure"], d["SEIZURE"], d.seizure and d.SEIZURE all resolve to the same stored value regardless of the stored case, and "seizure" in d matches a stored "SEIZURE" (and vice versa).

This means a config written STR2COLOR: {"seizure": "red"} — which load_configs stores as {"SEIZURE": "red"} — can still be looked up with the lowercase key the user wrote (CONFIG.X.STR2COLOR["seizure"]) without a surprise KeyError.

keys() / values() / items() / iteration return the stored (canonical) form — they are NOT case-folded. Non-string keys (ints, etc.) are left untouched and matched exactly.

__init__(dictionary=None)[source]

_resolve_key(key)[source]

Return the stored key matching key case-insensitively.

Resolution order, designed so the common (UPPER-stored) path stays O(1) and the case-insensitive scan runs only on a genuine miss:

Exact match — covers non-string keys and same-case lookups.
For string keys, key.upper() — covers lowercase lookup of an UPPER-stored key (the load_configs case).
For string keys, a case-insensitive scan over stored string keys — covers any other case mix (e.g. lowercase storage).

Raises KeyError (carrying the original lookup key) when nothing matches, so callers see the key they actually asked for.

get(key, default=None)[source]

to_dict(include_private=False)[source]: Recursively convert to plain dict.

keys()[source]

values()[source]

items()[source]

update(dictionary)[source]

setdefault(key, default=None)[source]

pop(key, *args)[source]

copy()[source]

scitex_io.register_post_save_hook(fn)[source]

Register a function to run after every successful scitex_io.save.

Hooks fire in registration order. They MUST NOT raise — exceptions are swallowed with logger.debug. A misbehaving observer must never break the host’s I/O.

Return type:: None

scitex_io.register_post_load_hook(fn)[source]

Register a function to run after every successful scitex_io.load.

Return type:: None

scitex_io.iter_io_bypass_targets()[source]

Yield canonical (module_path, attr, rule_id, severity) targets.

One row per distinct (real module path, attr) — aliases in call_rules that resolve to the same canonical module collapse to a single row. Skips rules with no module-level call shape (receiver- agnostic method rules like .savefig()).

Return type:: Iterator[IOBypassTarget]

scitex_io.save_image(obj, spath, **kwargs)[source]

scitex_io.save_text(obj, spath)

Save text content to a file.

Parameters:

obj (str) – The text content to save.
spath (str) – Path where the text file will be saved.

Return type:

None

scitex_io.save_mp4(fig, spath_mp4)

Create an MP4 animation from a matplotlib figure.

matplotlib is lazy-imported inside the function body so that import scitex.io (and the parent import scitex) does not fail on venvs without matplotlib installed (todo#443, same class as #279 / #441 / #442). Without lazy-import, the eager top-level from matplotlib import animation propagated ModuleNotFoundError up the _save_modules.__init__ walk and broke any caller of scitex.io.save(dict, "out.json") who had no business needing matplotlib.

scitex_io.save_listed_dfs_as_csv(listed_dfs, spath_csv, indi_suffix=None, overwrite=False, verbose=False)

listed_dfs:: [df1, df2, df3, …, dfN]. They will be written vertically in the order.
spath_csv:: /hoge/fuga/foo.csv
indi_suffix:: At the left top cell on the output csv file, ‘{}’.format(indi_suffix[i]) will be added, where i is the index of the df.On the other hand, when indi_suffix=None is passed, only ‘{}’.format(i) will be added.

scitex_io.save_listed_scalars_as_csv(listed_scalars, spath_csv, column_name='_', indi_suffix=None, round=3, overwrite=False, verbose=False): Puts to df and save it as csv

scitex_io.save_optuna_study_as_csv_and_pngs(study, sdir)[source]

scitex_io.json2md(obj, level=1)[source]

Core I/O

scitex_io.save(obj, specified_path, makedirs=True, verbose=True, symlink_from_cwd=False, symlink_to=None, dry_run=False, no_csv=False, use_caller_path=True, env_detector=None, **kwargs)[source]

Save obj by extension; specified_path is caller-anchored.

The file format is selected from specified_path’s extension via the plugin registry — .csv, .npy, .pkl, .yaml, .png, .h5, … 30+ formats are built in; custom extensions can be added with register_saver.

Path resolution rules (when specified_path is relative):

Called from a script /path/to/analysis.py → /path/to/analysis_out/<specified_path>.
Called from a notebook /path/to/exp.ipynb → /path/to/exp_out/<specified_path>.
Called from python -i / IPython / interactive REPL → $SCITEX_DIR/io/runtime/cache/<specified_path> (default ~/.scitex/io/runtime/cache/). Honours the canonical scitex local-state convention; see scitex-dev skills/general 01_ecosystem_06_local-state-directories.md.
Absolute path → used as-is, no routing.

Intermediate directories are created automatically — callers do not need os.makedirs() / Path.mkdir().

Parameters:

obj (Any) – The object to be saved.
specified_path (Union[str, Path]) – The filename or relative path under which to save obj. May contain subdirectories ("sub/dir/file.csv"); intermediates are auto-created. Absolute paths bypass routing.
makedirs (bool, optional) – Create parent directories on demand. Default True.
verbose (bool, optional) – Print a one-line success message. Default True.
symlink_from_cwd (bool, optional) – Drop a symlink at ./<specified_path> pointing into the auto-routed location. Default False.
symlink_to (Union[str, Path], optional) – Plant a symlink at this custom path pointing to the saved file.
dry_run (bool, optional) – Print the resolved path without writing. Default False.
no_csv (bool, optional) – Skip the auto-CSV sidecar for figure saves. Default False.
use_caller_path (bool, optional) – Resolve the anchor from the calling script, not the immediate caller — needed when save is wrapped by a library. Default False.
**kwargs – Passed through to the per-format handler.

Returns:

Path to saved file on success, None/False on error.

Return type:

Path or None

scitex_io.load(lpath, ext=None, show=False, verbose=False, cache=True, **kwargs)[source]

Public wrapper around _load_impl() that fires post-load hooks.

Glob expansion is handled inside _load_impl; the inner per-file recursion already routes through load so each match fires its own hook. Only the outer non-glob path fires once here.

Return type:: Any

scitex_io.load_configs(IS_DEBUG=None, show=False, verbose=False, config_dir=None)[source]

Load and merge every YAML under config_dir into one DotDict.

Filename stems become top-level keys; YAML keys become nested attributes. Every string key (filename stem and every nested key) is normalised to UPPER_CASE at load time so the in-memory tree is case-stable regardless of source casing — model.yaml with hidden_dim: 256 lands at CONFIG.MODEL.HIDDEN_DIM. Lookups on the returned DotDict are case-insensitive for string keys, so CONFIG.SEIZURE.STR2COLOR["seizure"] resolves the stored "SEIZURE" entry — no surprise KeyError for the lowercase key the author wrote (non-string keys are matched exactly).

If two keys inside one mapping fold to the same UPPER form (e.g. MODEL.yaml next to model.yaml, or HIDDEN_DIM next to hidden_dim, or "seizure" next to "SEIZURE" in one string-mapping), a loud ValueError is raised at load time naming the source file, the mapping path, and both offending keys. The collision is never silently merged or dropped.

Debug mode promotes any DEBUG_<KEY> sibling over its non-debug counterpart, so a single IS_DEBUG.yaml flips the whole project between production and debug values. Equivalent triggers: IS_DEBUG.yaml with IS_DEBUG: true, the IS_DEBUG=True kwarg, or running under CI=True.

Parameters:

IS_DEBUG (bool, optional) – Force debug mode. If None (default), inferred from IS_DEBUG.yaml inside config_dir or from the CI env var.
show (bool) – Echo the DEBUG_<KEY> -> <KEY> substitutions to stdout.
verbose (bool) – Print detailed information.
config_dir (Union[str, Path], optional) – Directory containing the YAML files. Defaults to "./config".

Returns:

Merged configuration tree with UPPER_CASE keys throughout.

Return type:

DotDict

Raises:

ValueError – If two keys inside one mapping fold to the same UPPER form (a case collision). Raised at load time, naming the file, the mapping path, and both offending keys.
ConfigLoadError – If reading or processing any YAML file under config_dir fails for any reason other than a case collision (malformed YAML, missing required file under categories/, an apply_debug_values walker crash on a malformed mapping, …). The message names the offending file path; the original exception is chained as __cause__ so the traceback shows the root error. Replaces the prior swallow-and-return-empty- DotDict behaviour, which made every config bug surface as a baffling 'DotDict' object has no attribute 'X' three frames away from the actual failure.

Examples

>>> CONFIG = load_configs()                       # ./config/*.yaml
>>> CONFIG.MODEL.HIDDEN_DIM                       # 256
>>> CONFIG = load_configs(IS_DEBUG=True)
>>> CONFIG.MODEL.HIDDEN_DIM                       # 32 (DEBUG_ promoted)

scitex_io.glob(expression, parse=False, ensure_one=False)[source]

Perform a glob operation with natural sorting and extended pattern support.

This function extends the standard glob functionality by adding natural sorting and support for curly brace expansion in the glob pattern.

Parameters:

expression (str or Path) – The glob pattern. Supports standard glob and {a,b} expansion.
parse (bool, optional) – Whether to parse the matched paths. Default is False.
ensure_one (bool, optional) – Ensure exactly one match is found. Default is False.

Returns:

If parse=False: naturally sorted file paths. If parse=True: tuple of (paths, parsed_results).

Return type:

list or tuple

Examples

>>> glob('data/*.txt')
['data/file1.txt', 'data/file2.txt', 'data/file10.txt']

>>> glob('data/{a,b}/*.txt')
['data/a/file1.txt', 'data/a/file2.txt', 'data/b/file1.txt']

>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True)
>>> paths
['data/subj_001/run_01.txt', 'data/subj_001/run_02.txt']
>>> parsed
[{'id': '001', 'run': '01'}, {'id': '001', 'run': '02'}]

>>> paths, parsed = glob('data/subj_{id}/run_{run}.txt', parse=True, ensure_one=True)
AssertionError  # if more than one file matches

scitex_io.reload(module_or_func, verbose=False)[source]

Reload a module or the module containing a given function.

This function attempts to reload a module directly if a module is passed, or reloads the module containing the function if a function is passed. This is useful during development to reflect changes without restarting the Python interpreter.

Parameters:

module_or_funcmodule or function: The module to reload, or a function whose containing module should be reloaded.
verbosebool, optional: If True, print additional information during the reload process. Default is False.

Returns:

: None

Raises:

Exception: If the module cannot be found or if there’s an error during the reload process.

Notes:

Reloading modules can have unexpected side effects, especially for modules that maintain state or have complex imports. Use with caution.
This function modifies sys.modules, which affects the global state of the Python interpreter.

Examples:

>>> import my_module
>>> reload(my_module)

>>> from my_module import my_function
>>> reload(my_function)

scitex_io.flush(sys=<module 'sys' (built-in)>, *, sync_fn=None)[source]

Flushes the system’s stdout and stderr, and syncs the file system. This ensures all pending write operations are completed.

sync_fn lets callers (typically tests) substitute os.sync() with a no-op or recording callable. Defaults to os.sync().

scitex_io.cache(id, *args, cache_root=None)[source]

Store or fetch data using a pickle file.

This function provides a simple caching mechanism for storing and retrieving Python objects. It uses pickle to serialize the data and stores it in a file with a unique identifier. If the data is already cached, it can be retrieved without recomputation.

Parameters:

id (str) – A unique identifier for the cache file.
*args (str) – Variable names to be cached or loaded.
cache_root (Path or None, optional) – Explicit cache directory. Defaults to $SCITEX_DIR/io/runtime/cache/ (~/.scitex/io/runtime/cache/ fallback), honouring the canonical scitex local-state convention.

Returns:

A tuple of cached values corresponding to the input variable names.

Return type:

tuple

Raises:

ValueError – If the cache file is not found and not all variables are defined.

Example

>>> import scitex
>>> import numpy as np
>>>
>>> # Variables to cache
>>> var1 = "x"
>>> var2 = 1
>>> var3 = np.ones(10)
>>>
>>> # Saving
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)
>>>
>>> # Loading when not all variables are defined and the id exists
>>> del var1, var2, var3
>>> var1, var2, var3 = scitex.io.cache("my_id", "var1", "var2", "var3")
>>> print(var1, var2, var3)

Registry

scitex_io.register_saver(ext, fn=None, *, builtin=False)[source]

Register a save handler for a file extension.

Can be used as a decorator or called directly:

@register_saver(".json")
def my_json_saver(obj, path, **kwargs): ...

register_saver(".json", my_json_saver)

Parameters:

ext (str) – File extension (e.g., “.json”, “json” — dot is optional).
fn (Callable, optional) – Handler function (obj, path, **kwargs) -> None. If None, returns a decorator.
builtin (bool) – If True, registers as built-in (lower priority). User registrations always override built-ins.

scitex_io.register_loader(ext, fn=None, *, builtin=False)[source]

Register a load handler for a file extension.

Same API as register_saver().

Parameters:

ext (str) – File extension (e.g., “.json”, “json” — dot is optional).
fn (Callable, optional) – Handler function (path, **kwargs) -> Any.
builtin (bool) – If True, registers as built-in (lower priority).

scitex_io.get_saver(ext)[source]

Look up a save handler. User overrides take priority.

Lazy builtin specs ((module_path, attr_name) tuples) are resolved on first access and memoised in place.

Return type:: Optional[Callable]

scitex_io.get_loader(ext)[source]

Look up a load handler. User overrides take priority.

Lazy builtin specs ((module_path, attr_name) tuples) are resolved on first access and memoised in place.

Return type:: Optional[Callable]

scitex_io.list_formats()[source]

List all registered formats.

Returns:: A dict with keys "save" and "load", each containing "builtin" and "user" format lists.
Return type:: dict

Notes

Builtin entries are listed regardless of whether they have been lazy-resolved yet — registration is what counts.

scitex_io.unregister_saver(ext)[source]

Remove a user-registered saver. Returns True if found.

Return type:: bool

scitex_io.unregister_loader(ext)[source]

Remove a user-registered loader. Returns True if found.

Return type:: bool

Cache Control

scitex_io.get_cache_info()[source]

Get cache statistics and configuration.

Returns:: Cache information including stats and config
Return type:: Dict[str, Any]

scitex_io.configure_cache(enabled=None, max_size=None, verbose=None)[source]

Configure cache settings.

Parameters:

enabled (Optional[bool]) – Enable or disable caching
max_size (Optional[int]) – Maximum number of files to cache
verbose (Optional[bool]) – Enable verbose logging

Return type:

None

scitex_io.clear_load_cache()

Clear all cached data.

Return type:: None

Dict Utilities

class scitex_io.DotDict(dictionary=None)[source]

A dictionary-like object that allows attribute-like access (for valid identifier keys) and standard item access for all keys (including integers, etc.).

Case-insensitive on string-key lookup, storage-stable

Keys are stored exactly as set (load_configs separately normalises every config key to UPPER on load). Lookups, however, are case-insensitive for string keys: d["seizure"], d["SEIZURE"], d.seizure and d.SEIZURE all resolve to the same stored value regardless of the stored case, and "seizure" in d matches a stored "SEIZURE" (and vice versa).

This means a config written STR2COLOR: {"seizure": "red"} — which load_configs stores as {"SEIZURE": "red"} — can still be looked up with the lowercase key the user wrote (CONFIG.X.STR2COLOR["seizure"]) without a surprise KeyError.

keys() / values() / items() / iteration return the stored (canonical) form — they are NOT case-folded. Non-string keys (ints, etc.) are left untouched and matched exactly.

__init__(dictionary=None)[source]

_resolve_key(key)[source]

Return the stored key matching key case-insensitively.

Resolution order, designed so the common (UPPER-stored) path stays O(1) and the case-insensitive scan runs only on a genuine miss:

Exact match — covers non-string keys and same-case lookups.
For string keys, key.upper() — covers lowercase lookup of an UPPER-stored key (the load_configs case).
For string keys, a case-insensitive scan over stored string keys — covers any other case mix (e.g. lowercase storage).

Raises KeyError (carrying the original lookup key) when nothing matches, so callers see the key they actually asked for.

get(key, default=None)[source]

to_dict(include_private=False)[source]: Recursively convert to plain dict.

keys()[source]

values()[source]

items()[source]

update(dictionary)[source]

setdefault(key, default=None)[source]

pop(key, *args)[source]

copy()[source]

Metadata

scitex_io.embed_metadata()

scitex_io.read_metadata()

scitex_io.has_metadata()

Explorers

scitex_io.H5Explorer: alias of None

scitex_io.ZarrExplorer: alias of None