Maybe wrong default axis with operators (add, sub, mul, div) between datetime-indexed df and series 1.0.0 #31487

giuliobeseghi · 2020-01-31T10:20:31Z

Code Sample, a copy-pastable example if possible

import pandas as pd

index = pd.date_range(start='2020', periods=5)
df = pd.DataFrame([[1, 2, 3]] * 5, columns=['a', 'b', 'c'], index=index)
series = pd.Series([10, 20, 30, 40, 50], index=index)

print(df + series)

	2020-01-01 00:00:00	2020-01 02 00:00:00	2020-01-03 00:00:00	2020-01-04 00:00:00	2020-01-05 00:00:00	a	b	c
2020-01-01	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2020-01-02	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2020-01-03	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2020-01-04	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
2020-01-05	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

Problem description

According to the docs (https://pandas.pydata.org/pandas-docs/stable/getting_started/dsintro.html#data-alignment-and-arithmetic):

When doing an operation between DataFrame and Series, the default behavior is to align the Series index on the DataFrame columns, thus broadcasting row-wise

In the special case of working with time series data, if the DataFrame index contains dates, the broadcasting will be column-wise

It seems to me that in both cases now the broadcasting is row-wise.

Is this an expected change for pandas 1.0.0 (I hope not - I never saw any FutureWarnings about it)? If so, the docs (and the examples) must be updated.

The same happens for the operators -, /, *, %

Expected Output

Not sure if this is the expected output anymore, but it used to be equivalent to:

df.add(series, axis=0)

	a	b	c
2020-01-01	11	12	13
2020-01-02	21	22	23
2020-01-03	31	32	33
2020-01-04	41	42	43
2020-01-05	51	52	53

Although I can't replicate it, I'm pretty sure this was the behaviour until pandas 0.25.3

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.0
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0.post20200127
Cython : 0.29.14
pytest : 5.3.4
hypothesis : 4.54.2
sphinx : 2.3.1
blosc : None
feather : None
xlsxwriter : 1.2.7
lxml.etree : 4.4.2
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.11.1
pandas_datareader: None
bs4 : 4.8.2
bottleneck : 1.3.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.2
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : None
pytables : None
pytest : 5.3.4
pyxlsb : None
s3fs : 0.4.0
scipy : 1.3.2
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.7
numba : 0.47.0

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2020-01-31T12:36:59Z

That documentation might not have been up to date for a long time. I went back up to pandas 0.18 with your example above, and it is still giving the same result as we have now on 1.0.

Could you try to find a reproducible example and show the result you get? (if you still have an environment with an older version of pandas, or otherwise you can try to recreate that?)

jbrockmendel · 2020-02-01T03:17:03Z

I think I started refactoring the arithmetic code about 2 years ago, and dont remember the described behavior existing at the time.

The described behavior does seem analogous to the slicing special-casing xref #31476.

giuliobeseghi · 2020-02-01T15:39:18Z

I couldn't find an example, I'll post it if I get an error at some point :( you can close the issue in the meantime if you want. Thanks for letting me know about the documentation (it probably needs updating then?).

By the way, what is the rationale of aligning a series to the columns of a dataframe with arithmetics? Is it to replicate the behavior of numpy arrays?
I guess that aligning index to index would be more intuitive.

jbrockmendel added the Numeric Operations Arithmetic, Comparison, and Logical operations label Feb 25, 2020

jbrockmendel added the Docs label Oct 1, 2020

jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Oct 1, 2020

DOC: remove outdated doc closes pandas-dev#31487

471870a

jbrockmendel mentioned this issue Oct 1, 2020

DOC: remove outdated doc closes #31487 #36797

Merged

5 tasks

jreback added this to the 1.2 milestone Oct 2, 2020

jreback closed this as completed in #36797 Oct 2, 2020

jreback pushed a commit that referenced this issue Oct 2, 2020

DOC: remove outdated doc closes #31487 (#36797)

810d8cd

kesmit13 pushed a commit to kesmit13/pandas that referenced this issue Nov 2, 2020

DOC: remove outdated doc closes pandas-dev#31487 (pandas-dev#36797)

c92a09e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Maybe wrong default axis with operators (add, sub, mul, div) between datetime-indexed df and series 1.0.0 #31487

Maybe wrong default axis with operators (add, sub, mul, div) between datetime-indexed df and series 1.0.0 #31487

giuliobeseghi commented Jan 31, 2020

INSTALLED VERSIONS

jorisvandenbossche commented Jan 31, 2020

Uh oh!

jbrockmendel commented Feb 1, 2020

Uh oh!

giuliobeseghi commented Feb 1, 2020 •

edited

Loading

Uh oh!

Uh oh!

Maybe wrong default axis with operators (add, sub, mul, div) between datetime-indexed df and series 1.0.0 #31487

Maybe wrong default axis with operators (add, sub, mul, div) between datetime-indexed df and series 1.0.0 #31487

Comments

giuliobeseghi commented Jan 31, 2020

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Jan 31, 2020

Uh oh!

jbrockmendel commented Feb 1, 2020

Uh oh!

giuliobeseghi commented Feb 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Output of `pd.show_versions()`

giuliobeseghi commented Feb 1, 2020 •

edited

Loading