The Wayback Machine - https://web.archive.org/web/20200825030854/https://github.com/pydata/xarray/issues/3248
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

combine_by_coords fails with DataArrays #3248

Open
dcherian opened this issue Aug 23, 2019 · 5 comments
Open

combine_by_coords fails with DataArrays #3248

dcherian opened this issue Aug 23, 2019 · 5 comments

Comments

@dcherian
Copy link
Contributor

@dcherian dcherian commented Aug 23, 2019

MCVE Code Sample

da1 = xr.DataArray([1, 2, 3], dims='x', coords={'x': [0, 1, 2]})
da2 = xr.DataArray([3, 4, 5], dims='x', coords={'x': [2, 3, 4]})
xr.combine_by_coords([da1, da2])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-133-d27216ba5688> in <module>
      1 da1 = xr.DataArray([1, 2, 3], dims='x', coords={'x': [0, 1, 2]})
      2 da2 = xr.DataArray([3, 4, 5], dims='x', coords={'x': [2, 3, 4]})
----> 3 xr.combine_by_coords([da1, da2])

~/work/python/xarray/xarray/core/combine.py in combine_by_coords(datasets, compat, data_vars, coords, fill_value, join)
    619         compat=compat,
    620         fill_value=fill_value,
--> 621         join=join,
    622     )
    623 

~/work/python/xarray/xarray/core/merge.py in merge(objects, compat, join, fill_value)
    588             )
    589 
--> 590         obj = obj.to_dataset() if isinstance(obj, DataArray) else obj
    591         dict_like_objects.append(obj)
    592 

~/work/python/xarray/xarray/core/dataarray.py in to_dataset(self, dim, name)
    478             return self._to_dataset_split(dim)
    479         else:
--> 480             return self._to_dataset_whole(name)
    481 
    482     @property

~/work/python/xarray/xarray/core/dataarray.py in _to_dataset_whole(self, name, shallow_copy)
    426         if name is None:
    427             raise ValueError(
--> 428                 "unable to convert unnamed DataArray to a "
    429                 "Dataset without providing an explicit name"
    430             )

ValueError: unable to convert unnamed DataArray to a Dataset without providing an explicit name

To get what I want, I need
xr.combine_by_coords([da1.to_dataset(name='a'), da2.to_dataset(name='a')]).a

I think the issue is that the code uses to_dataset instead of _to_temp_dataset

@shoyer
Copy link
Member

@shoyer shoyer commented Aug 24, 2019

+1 we should make this work!

@TomNicholas
Copy link
Contributor

@TomNicholas TomNicholas commented Aug 29, 2019

@dcherian is this a merge problem, or a combine_by_coords problem?

@dcherian
Copy link
Contributor Author

@dcherian dcherian commented Aug 29, 2019

What do you think of:

  1. If combine receives a list of unnamed DataArrays, we convert them to temp datasets and then pass datasets down to merge.

  2. If combine receives a list of named DataArrays, then merge will do the right thing.

  3. If combine receives a list containing both Datasets and unnamed DataArrays then merge will raise the right error.

So really it's just (1) that needs a special case in the combine functions?

@friedrichknuth
Copy link

@friedrichknuth friedrichknuth commented Sep 14, 2019

Some additional information on the topic:

Combining named 1D data arrays works.

da1 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [1, 2, 3])])
da2 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [5, 6, 7])])
xr.combine_by_coords([da1, da2])

<xarray.Dataset>
Dimensions:  (x: 6)
Coordinates:
  * x        (x) int64 1 2 3 5 6 7
Data variables:
    foo      (x) float64 1.443 0.4889 0.9233 0.1946 -1.639 -1.455

However, when combining 2D gridded data...

da1 = xr.DataArray(name='foo', 
                  data=np.random.rand(3,3), 
                  coords=[('x', [1, 2, 3]),
                          ('y', [1, 2, 3])])

da2 = xr.DataArray(name='foo', 
                  data=np.random.rand(3,3), 
                  coords=[('x', [5, 6, 7]),
                          ('y', [5, 6, 7])])

xr.combine_by_coords([da1, da2])

...the method fails, despite passing a data variable name.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-145-77ae89136c1f> in <module>
      9                           ('y', [5, 6, 7])])
     10 
---> 11 xr.combine_by_coords([da1, da2])

~/xarray/xarray/core/combine.py in combine_by_coords(datasets, compat, data_vars, coords, fill_value, join)
    580 
    581     # Group by data vars
--> 582     sorted_datasets = sorted(datasets, key=vars_as_keys)
    583     grouped_by_vars = itertools.groupby(sorted_datasets, key=vars_as_keys)
    584 

~/xarray/xarray/core/combine.py in vars_as_keys(ds)
    465 
    466 def vars_as_keys(ds):
--> 467     return tuple(sorted(ds))
    468 
    469 

~/xarray/xarray/core/common.py in __bool__(self)
    119 
    120     def __bool__(self: Any) -> bool:
--> 121         return bool(self.values)
    122 
    123     def __float__(self: Any) -> float:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

Again, converting to a dataset bypasses the issue.

ds1 = da1.to_dataset()
ds2 = da2.to_dataset()
xr.combine_by_coords([ds1, ds2])

<xarray.Dataset>
Dimensions:  (x: 6, y: 6)
Coordinates:
  * x        (x) int64 1 2 3 5 6 7
  * y        (y) int64 1 2 3 5 6 7
Data variables:
    foo      (x, y) float64 0.5078 0.8981 0.8707 nan ... 0.4172 0.7259 0.8431
@dcherian
Copy link
Contributor Author

@dcherian dcherian commented Sep 16, 2019

Thanks @friedrichknuth .

It looks like all the combine functions expect lists of Datasets, not DataArrays (see docstrings).

It should be relatively easily to make this work like merge which can take a list of Datasets or DataArrays. See this code in merge:

xarray/xarray/core/merge.py

Lines 593 to 602 in 756c941

dict_like_objects = list()
for obj in objects:
if not (isinstance(obj, (DataArray, Dataset, dict))):
raise TypeError(
"objects must be an iterable containing only "
"Dataset(s), DataArray(s), and dictionaries."
)
obj = obj.to_dataset() if isinstance(obj, DataArray) else obj
dict_like_objects.append(obj)

to_dataset will fail for unnamed DataArrays, but _to_temp_dataset() will succeed. It seems to me like we want lists of unnamed DataArrays to work (like in the first example), so call _to_temp_dataset when needed. Maybe @shoyer has an idea on how to implement that cleanly.

Can you send in a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
4 participants
You can’t perform that action at this time.