Improve floating point precision documentation in Xarray's NetCDF I/O #10252

SoumitAddanki · 2025-04-26T00:49:33Z

This pull request enhances the documentation for floating point precision handling when saving datasets to NetCDF using to_netcdf().

Changes:

Added examples demonstrating how float formatting can affect saved files.

Clarified precision control through encoding options.

This documentation should help users better manage numeric precision when exporting scientific datasets.

@kmuehlbauer

* Allow passing a CFTimedeltaCoder instance to decode_timedelta * Updates based on @kmuehlbauer's branch https://github.com/kmuehlbauer/xarray/tree/split-out-coders * Increment what's new PR number * Add FutureWarning for change in decode_timedelta behavior * Include a note about opting out of timedelta decoding Co-authored-by: Kai Mühlbauer <kmuehlbauer@wradlib.org> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix typing * Fix typo * Fix doc build * Fix order of arguments in filterwarnings * Switch to :okwarning: * Fix missing :okwarning: --------- Co-authored-by: Kai Mühlbauer <kai.muehlbauer@uni-bonn.de> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Kai Mühlbauer <kmuehlbauer@wradlib.org> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…#9999) * Fix infer_freq, check for subdtype "datetime64"/"timedelta64" * update infer_freq test * add whats-new.rst entry * add typing to test function * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

pydata#9940) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

* check that aggregations result in array objects * don't consider numpy scalars as arrays * changelog [skip-ci] * retrigger CI * Update xarray/tests/test_namedarray.py --------- Co-authored-by: Kai Mühlbauer <kmuehlbauer@wradlib.org>

* FIX: do not sort datasets in combine_by_coords * add test * add whats-new.rst entry * use groupby_defaultdict * Apply suggestions from code review Co-authored-by: Michael Niklas <mick.niklas@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix typing * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update xarray/core/combine.py * fix typing, replace other occurrence * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix groupby * fix groupby --------- Co-authored-by: Michael Niklas <mick.niklas@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

…9855) * new blank whatsnew * FAQ answer on API stability * link from API docs page * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * whatsnew * Update doc/getting-started-guide/faq.rst Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> * use hyphen in target names --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Maximilian Roos <5635139+max-sixty@users.noreply.github.com> Co-authored-by: Kai Mühlbauer <kai.muehlbauer@uni-bonn.de>

* finalize release notes * add contributors * Tweak main what's new entry for time coding (pydata#4) --------- Co-authored-by: Spencer Clark <spencerkclark@gmail.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…10016) * `map_over_datasets`: fix error message for wrong result type * newline for result

…#10017) When lazily encoding non-nanosecond times, the appropriate optimal integer encoding units are resolution-dependent. This PR updates our encoding pipeline accordingly. Note that due to our internal reliance on pandas for date string parsing, we are still not able to round trip times outside the range -9999-01-01 to 9999-12-31 with pandas / NumPy, but this at least should pick more natural default units than nanoseconds for chunked arrays of non-nanosecond precision times. This gives users another way of addressing pydata#9154 (i.e. use non-nanosecond time arrays).

…pydata#10035) * use mean of min/max years as offset in caclulation of datetime64 mean * reinstate _datetime_nanmin as it is used downstream in flox<0.10.0 * add whats-new.rst entry * add whats-new.rst entry

* Fix DataArray().drop_attrs(deep=False) * Add DataArray().drop_attrs() tests * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply small cosmetics * Add support for attrs to DataArray()._replace * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove testing relict * Fix (try) incompatible types mypy error * Fix (2.try) incompatible types mypy error * Update whats-new * Fix replacing simultaneously passed variable * Add DataArray()._replace() tests --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Start splitting up `dataset.py` Currently, `dataset.py` is 10956 lines long. This makes doing any work with current LLMs basically impossible — with Claude's tokenizer, the file is 104K tokens, or >2.5x the size of the _per-minute_ rate limit for basic accounts. Most of xarray touches it in some way, so you generally want to give it the file for context. Even if you don't think "LLMs are the future, let's code with vibes!", the file is still really long; can be difficult to navigate (though OTOH it can be easy to just grep, to be fair...). So I would propose: - We start breaking it up, while also being cognizant that big changes can cause merge conflicts - Start with the low-hanging fruit - For example, this PR moves code outside the class (but that's quite limited) - Then move some of the code from the big methods into functions in other files, like `curve_fit` - Possibly (has tradeoffs; needs discussion) build some mixins so we can split up the class, if we want to have much smaller files - We can also think about other files: `dataarray.py` is 7.5K lines. The tests are also huge (`test_dataset` is 7.5K lines), but unlike with the library code, we can copy out & in chunks of tests when developing. (Note that I don't have any strong views on exactly what code should go in which file; I made a quick guess — very open to any suggestions; also easy to change later, particularly since this code doesn't change much so is less likely to cause conflicts) * .

Seems easy locally; maybe too easy and something will break in CI...

* add kwargs to map_over_datasets (similar to apply_ufunc), add test. * try to fix typing * improve typing and simplify kwargs-handling per review suggestions * apply changes to DataTree.map_over_datasets * add whats-new.rst entry * Update xarray/core/datatree_mapping.py Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com> * add suggestions from review. --------- Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

* don't install `dask` reason: `dask-expr` depends on `pyarrow`, which doesn't support python 3.13 yet * don't install `pydap` reason: depends on `webob`, which makes use of `cgi`, a stdlib that got removed in python 3.13 * run CI on python 3.13 * same for windows * classifier * whats-new * fix bad merge * try installing `dask` + `distributed` * move the whats-new entry * Update .github/workflows/ci.yaml * explicitly install `pyarrow` * install `numba` and packages depending on it * More to 3.13, prep for 3.14, bump all-but-dask to 3.12 * comment out sparse * fix whats-new --------- Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Deepak Cherian <deepak@cherian.net>

* add Coordinates.from_xindex method * doc: refactor Coordinates API reference Make it more consistent with ``Dataset``, ``DataArray`` and ``DataTree``. The ``Coordinates`` class is 2nd order compared to the former ones, but it is public API and useful (for creating coordinates from indexes and merging coordinates together) so it deserves its own (expanded) section + summary tables in the API reference doc. * add tests * update what's new * fix doc build? * docstring tweaks * doc (api): add missing Coordinates.sizes property * update what's new (documentation) * improve docstrings on Coordinates creation * doc: move what's new entries after last release --------- Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

* Add types stubs to optional dependencies * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Add coordinate transform classes from prototype * lint, public API and docstrings * missing import * sel: convert inverse transform results to ints * sel: add todo note about rounding decimal pos * rename create_coordinates -> create_coords More consistent with the rest of Xarray API where `coords` is used everywhere. * add a Coordinates.from_transform convenient method * fix repr (extract subset values of any n-d array) * Apply suggestions from code review Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com> * remove specific create coordinates methods In favor of the more generic `Coordinates.from_xindex()`. * fix more typing issues * remove public imports: not ready yet for public use * add experimental notice in docstrings * add coordinate transform tests * typing fixes * update what's new --------- Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com> Co-authored-by: Max Jones <14077947+maxrjones@users.noreply.github.com>

* Upgrade mypy to 1.15 Mypy 1.15 includes fix for <python/mypy#9031>, allowing several "type: ignore" comments to be removed. * Add type annotations to DataTree.pipe tests * More precisely type `pipe` methods. In addition, enhance mypy job configuration to support running it locally via `act`. Fixes pydata#9997 * Pin mypy to 1.15 in CI * Revert mypy CI job changes * Add pytest-mypy-plugin and typestub packages * Add pytest-mypy-plugins to all conda env files * Remove dup pandas-stubs dep * Revert pre-commit config changes * Place mypy tests behind pytest mypy marker * Set default pytest numprocesses to 4 * Ignore pytest-mypy-plugins for min version check

* Default to phony_dims="access" in open_datatree for h5ntecdf-backend. Warn user about behaviour change. * relocate * duplicate as needed in both places * ignore warning * conditionally warn users if phony_dims are found * add test for warning * add whats-new.rst entry * remove unneeded assignment to fix typing * Update doc/whats-new.rst * use phony_dims="access" per default also in open_dataset for h5netcdf backend * fix test * fix whats-new.rst

* Index.isel: more permissive return type This allows an index to return a new index of another type, e.g., a 1-dimensional CoordinateTransformIndex to return a PandasIndex when a new transform cannot be computed (selection at arbitrary locations). * Index.equals: more permissive `other` type. Xarray alignment logic is such that Xarray indexes are always compared with other indexes of the same type. However, this is not necessarily the case for "meta" indexes (i.e., indexes encapsulating one or more index objects that may have another type) that are dispatching `equals` to their wrapped indexes.

* Move chunks-related functions to a new file Part of pydata#10089 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Move fit computation code to dedicated new file Part of pydata#10089 * . * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix GroupBy first, last with flox Closes pydata#10169 * fix test * parallelize upstream tests

* Allow setting `fill_value` on Zarr format 3 arrays Closes pydata#10064 * fix * fix format detection * fix * Set use_zarr_fill_value_as_mask=False

* DataTree: sel & isel add error context * add test * changelog --------- Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

The name of the repo has changed from `zarr` to `zarr-python` it was still working due to github re-direct, but better to be explicit about which repo this is aiming at

) * Add test to check units appear in FacetGrid plot - appended test to `TestFacetGrid` class inside test_plot.py - checks that units are added to the plot axis labelling * fix: ensure axis labels include units in FacetGrid plots - Fixed an issue where axis labels for FacetGrid plots did not display units when provided. - Now, both the dimension name and its corresponding unit (if available) are shown on the axis label. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * added whats-new documentation --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

xref pydata#9661

) * bug: fix write_empty_chunks for zarr v3 * future proof write_empty_chunks in append flow * test: fix write_empty_test for zarr 2 * typing: fix typing for write_empty_chunks * small edits --------- Co-authored-by: Deepak Cherian <deepak@cherian.net>

…data#10192) Bumps the actions group with 1 update: [scientific-python/upload-nightly-action](https://github.com/scientific-python/upload-nightly-action). Updates `scientific-python/upload-nightly-action` from 0.6.1 to 0.6.2 - [Release notes](https://github.com/scientific-python/upload-nightly-action/releases) - [Commits](scientific-python/upload-nightly-action@82396a2...b36e8c0) --- updated-dependencies: - dependency-name: scientific-python/upload-nightly-action dependency-type: direct:production update-type: version-update:semver-patch dependency-group: actions ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Fixes pydata#10196

RUF046 Value being cast to `int` is already an integer

* add `scipy-stubs` as extra `[types]` dependency * add changelog entry for pydata#10202

* Update pre-commit hooks updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.9 → v0.11.4](astral-sh/ruff-pre-commit@v0.9.9...v0.11.4) - [github.com/abravalheri/validate-pyproject: v0.23 → v0.24.1](abravalheri/validate-pyproject@v0.23...v0.24.1) - [github.com/crate-ci/typos: dictgen-v0.3.1 → v1](crate-ci/typos@dictgen-v0.3.1...v1) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* fix references to core classes in docs * what's new

) * Fixes dimension order in xarray.Dataset.to_stacked_array * corrected dummy variable name to satisfy mypy * added type annotation to satisfy mypy * corrected type annotation to satisfy mypy

* Revert "Remove PR labeler bot (pydata#8525)" This reverts commit ce1af97. * Remove redundant globs There was some duplication here as `**` would capture `*` etc * Update actions/labeler to v5 * Label all PRs as "needs triage" * Try my best to update config to match current structure * patch needs triage regex * run pre-commit * Update labeler.yml to match current project structure * pr feedback

* DatasetView.map fix keep_attrs * fix comment * add changelog

* fix missing colon in sphinx roles * add space after comma for two broken roles

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

…ydata#10238)

spencerkclark and others added 30 commits January 29, 2025 17:59

Revert "Use flox for grouped first, last (pydata#9986)" (pydata#10001)

8ccf1cb

Migrate Zarr region="auto" tests to a class (pydata#9990)

5fdceff

Fix the push method when the limit parameter is bigger than the chunk… (

d7ac79a

pydata#9940) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

Use freq='D' instead of freq='d' in pd.timedelta_range (pydata#10004)

e84e421

fix typing (pydata#10006)

97a4a71

release notes: 2025.01.2 (pydata#10007)

d8d1d9e

* finalize release notes * add contributors * Tweak main what's new entry for time coding (pydata#4) --------- Co-authored-by: Spencer Clark <spencerkclark@gmail.com>

add new section in whats-new.rst (pydata#10011)

c252152

Update pre-commit hooks (pydata#10021)

d924d93

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

spelling fix (pydata#10023)

2658c00

Duck array ops for all and any (pydata#9883)

4b48cf7

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

map_over_datasets: fix error message for wrong result type (pydata#…

160cced

…10016) * `map_over_datasets`: fix error message for wrong result type * newline for result

DOC: Fix 404 (pydata#10029)

d57f05c

use mean of min/max years as offset in calculation of datetime64 mean (…

df2ecf4

…pydata#10035) * use mean of min/max years as offset in caclulation of datetime64 mean * reinstate _datetime_nanmin as it is used downstream in flox<0.10.0 * add whats-new.rst entry * add whats-new.rst entry

Upgrade mypy to 1.15 (pydata#10041)

c8f7dc6

Seems easy locally; maybe too easy and something will break in CI...

max-sixty and others added 30 commits March 25, 2025 04:59

Fix GroupBy first, last with flox (pydata#10173)

ec88c28

* Fix GroupBy first, last with flox Closes pydata#10169 * fix test * parallelize upstream tests

Allow setting fill_value on Zarr format 3 arrays (pydata#10161)

7ffdcc7

* Allow setting `fill_value` on Zarr format 3 arrays Closes pydata#10064 * fix * fix format detection * fix * Set use_zarr_fill_value_as_mask=False

DataTree: sel & isel add error context (pydata#10154)

66f6c17

* DataTree: sel & isel add error context * add test * changelog --------- Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

DOC: Update docstring to reflect renamed section (pydata#10180)

fdfe15e

Use explicit repo name in upstream wheels (pydata#10181)

d008e33

The name of the repo has changed from `zarr` to `zarr-python` it was still working due to github re-direct, but better to be explicit about which repo this is aiming at

Vendor pandas to xarray conversion tests (pydata#10187)

6167eaa

xref pydata#9661

release 2025.03.1 (pydata#10188)

2aa2e73

Add new whats-new section (pydata#10190)

e8b1daf

DOC: Remove mention of netcdf pypi package (pydata#10197)

5a77a6e

Fixes pydata#10196

Apply ruff preview rule RUF046 (pydata#10199)

6f354e2

RUF046 Value being cast to `int` is already an integer

Fix sparse dask repr test (pydata#10200)

81fe55a

add scipy-stubs as extra [types] dependency (pydata#10202)

08fa7b9

* add `scipy-stubs` as extra `[types]` dependency * add changelog entry for pydata#10202

Fix references to core classes in docs (pydata#10207)

05072ed

* fix references to core classes in docs * what's new

Fixes dimension order in xarray.Dataset.to_stacked_array (pydata#10205

eb2ff69

) * Fixes dimension order in xarray.Dataset.to_stacked_array * corrected dummy variable name to satisfy mypy * added type annotation to satisfy mypy * corrected type annotation to satisfy mypy

Add datatree repr asv (pydata#10214)

aa9e2bd

DatasetView.map fix keep_attrs (pydata#10219)

430d642

* DatasetView.map fix keep_attrs * fix comment * add changelog

Fix broken Sphinx Roles (pydata#10225)

72ffff5

* fix missing colon in sphinx roles * add space after comma for two broken roles

Fix doctests (pydata#10230)

a66d5b5

Fix mypy (pydata#10232)

969d991

Add RangeIndex (pydata#10076)

3816901

Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

opendap / dap4 support for pydap backend (pydata#10182)

e39f59e

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Deepak Cherian <dcherian@users.noreply.github.com>

Fix: Docs generation create temporary files that are not cleaned up. (p…

ee862fe

…ydata#10238)

Remove test_dask_layers_and_dependencies (pydata#10242)

888dfdb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve floating point precision documentation in Xarray's NetCDF I/O #10252

Improve floating point precision documentation in Xarray's NetCDF I/O #10252

SoumitAddanki commented Apr 26, 2025

Improve floating point precision documentation in Xarray's NetCDF I/O #10252

Are you sure you want to change the base?

Improve floating point precision documentation in Xarray's NetCDF I/O #10252

Conversation

SoumitAddanki commented Apr 26, 2025