Issue2121
This issue tracker has been migrated to GitHub,
and is currently read-only.
For more information,
see the GitHub FAQs in the Python's Developer Guide.
Created on 2008-02-15 08:15 by christian.heimes, last changed 2022-04-11 14:56 by admin. This issue is now closed.
| Files | ||||
|---|---|---|---|---|
| File name | Uploaded | Description | Edit | |
| nan_parse.patch | cdavid, 2008-12-31 11:34 | |||
| Messages (21) | |||
|---|---|---|---|
| msg62422 - (view) | Author: Christian Heimes (christian.heimes) * ![]() |
Date: 2008-02-15 08:15 | |
This is a reminder. The issue makes the complex(repr(...)) round-trip impossible when either the real or imag part is nan or infinite. |
|||
| msg63169 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2008-03-01 21:38 | |
Signed zeros cause difficulties with round-tripping, too: >>> z = complex(-0.0, -0.0); w = complex(repr(z)) >>> z -0j >>> w -0j >>> z.real -0.0 >>> w.real 0.0 |
|||
| msg78592 - (view) | Author: Cournapeau David (cdavid) | Date: 2008-12-31 07:41 | |
I started a patch against the trunk to handle nan/inf/infinite (I have not yet tackled the problem of negative zero). The patch is a bit big, because I found the function quite difficult to follow, so I refactored it a bit first (replacing the state machine with the big switch with a sequential parsing). One potential problem is that I do some computation with inf to make signed infinite, I don't know if it is safe. I don't know is setting a variable to Nan is safe either. Although I tested the function in a simple main under valgrind for various legit and bogus input, there is no tests in this patch. I would like to add some, but I am not familiar with python testing, so I would be glad to receive some indications on how to do it properly. |
|||
| msg78593 - (view) | Author: Cournapeau David (cdavid) | Date: 2008-12-31 07:53 | |
Of course, I notice two bugs just after sending the patch... New patch to fix those two issues (no check for closing bracket if opening ones are there and a bug when only imaginary part is given). |
|||
| msg78594 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2008-12-31 09:05 | |
-1 on this feature request. IMO, it adds no value to any real world applications and isn't worth the added code complexity, the on-going maintenance burden, and the risk of introducing new bugs. |
|||
| msg78595 - (view) | Author: Cournapeau David (cdavid) | Date: 2008-12-31 09:13 | |
I disagree the feature is not needed, for several reasons: - complex(repr(..)) roundtrip should work, whatever the value of complex is - it is supported for float, so why not for complex ? - I believe on the contrary it solves a very real problem: incidently, I got interested in this patch while working on numpy, and it is certainly useful to be able to parse nan and inf (for example when we create arrays from strings). Nan may be seen as non useful for so called real usage of python, but for scientific people, it is a crucial to have proper support of nan (which may mean missing data depending on the context) and inf. - it does not add complexity: I would argue that independantly of nan/inf support, my patch makes the function simpler to follow (no state machine with done/sw_error/etc... state). |
|||
| msg78596 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2008-12-31 10:05 | |
> complex(repr(..)) roundtrip should work, Nice-to-have but not a requirement that the entire input domain be supported. > it is supported for float, so why not for complex ? It made sense for floats because of prevalence of use cases and because we wanted to match IEEE-754R as much as possible. Unfortunately, the support for Infs and NaNs has already negatively impacted the float world by making the math module harder to maintain and test. I would not like to see that extended to cmath or complex() unless compelling real-world use cases arise. > ... for scientific people, it is a crucial to have proper support > of nan (which may mean missing data depending on the context) and inf. Mark, does Inf have a standard interpretation for complex numbers? Do all infinities meet or do they radiate, each with their own phase angle? Also, do you want to stick with the 754 interpretation of NaNs as the result of invalid operations or are you comfortable with the MatLab/Octave notion of using NaNs to indicate missing values (something they do as an optimization because it is the only way to flag a non-numeric value in a floating point register or C double)? > it does not add complexity: That depends on whether handling of NaNs and Infs creeps into cmath. Mainly, I'm just questioning whether there exist compelling use cases for parsing NaNs and Infs in the context of complex numbers. |
|||
| msg78597 - (view) | Author: Cournapeau David (cdavid) | Date: 2008-12-31 10:38 | |
> Nice-to-have but not a requirement that the entire input domain be > supported. Ok. > It made sense for floats because of prevalence of use cases and > because we wanted to match IEEE-754R as much as possible. But why shouldn't this apply to complex numbers ? I am biased because I mainly use python for scientific work, but complex numbers are not more special than float in my mind. > Unfortunately, the support for Infs and NaNs has already negatively > impacted the float world by making the math module harder to maintain > and test. Yes, it is difficult to handle nan and inf, there are a lot of corner cases. But I fail to see how this applies here: my patch is essentially a rewrote of the parsing, and the code to handle nan/inf is only 7 lines. This is independent of the handling of how nan/inf is handled by math operations on it. > Mark, does Inf have a standard interpretation for complex numbers? > Do all infinities meet or do they radiate, each with their own phase > angle? Hm, not sure what you mean here by standard interpretation, but infinities "do not meet", in the sense that (x,inf) and (inf,x) for example can never been 'near' from each other (in the distance sense), except if x is inf. It does not make more sense to talk about phase of complex numbers with inf than for 0. But again, I don't see how this is relevant to the proposed feature. > Mainly, I'm just questioning whether there exist compelling use cases > for parsing NaNs and Infs in the context of complex numbers. For any task where parsing complex makes sense. Since many numerical operations may end up with nan or inf, this is relatively common I would say. It this can make the review easier, I can split the patch in two: first the refactoring (+ tests once someone tells me how to), and then inf/nan handling (with additional tests). |
|||
| msg78598 - (view) | Author: Cournapeau David (cdavid) | Date: 2008-12-31 11:34 | |
Ok, I found out how to make tests, and I found some problems while using this on numpy. A third version of the patch, with unit tests: all tests in test_complex.py pass. |
|||
| msg78824 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2009-01-02 15:57 | |
[Raymond]
> I would
> not like to see that extended to cmath or complex() unless compelling
> real-world use cases arise.
Hmm. Have you looked at the cmath module recently? You may be in for a
nasty surprise...
> Mark, does Inf have a standard interpretation for complex numbers? Do
> all infinities meet or do they radiate, each with their own phase
> angle?
Shrug. Mathematically, by far the most common and useful model is the
complex plane plus a single extra point at infinity. But when complexes
are implemented as pairs of floats things look a little strange. Kahan,
in his 'branch cuts' paper identifies 9 'different' complex infinities
in this model, and C99 Annex G specifies exactly how functions should
behave for various different infinities.
It's never really been clear to me how useful it is to be able to
distinguish these infinities. But the cmath module *does* make an
effort to return the 'correct' (according to C99 Annex G) values for
inputs with a special real or imaginary component. On balance, I'd
support making complex('nan + nan*j') do the right thing.
(Though I can't help feeling that the repr of complex(nan, nan)
should have been 'nan + nan*1j' rather than the current 'nan + nan*j'.)
> Also, do you want to stick with the 754 interpretation of NaNs as the
> result of invalid operations
I've never much liked the use of NaN to represent missing values, and
I don't think Python should embrace that usage.
|
|||
| msg78827 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2009-01-02 16:08 | |
cdavid, in your application, how hard is it to work around the problem by simply printing and parsing pairs of floats rather than complex numbers? E.g., <get real part and imaginary part from string> z = complex(float(real_part), float(imag_part)) |
|||
| msg78833 - (view) | Author: Cournapeau David (cdavid) | Date: 2009-01-02 16:26 | |
It is not really for an application, but for numpy. Of course, one can always get around the problem - but since this is really a limitation which can be easily fixed, why not fixing the problem :) ? |
|||
| msg79250 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2009-01-06 11:56 | |
[Mark]
> inputs with a special real or imaginary component. On balance, I'd
> support making complex('nan + nan*j') do the right thing.
Having thought about this a bit more, I take this back. Some reasons are
given below.
[David]
> - complex(repr(..)) roundtrip should work, whatever the value of complex
is
I disagree. *Why* do you think it should work? It fails for many other
types:
>>> def roundtrip(x): return type(x)(repr(x))
...
>>> roundtrip("abc")
"'abc'"
>>> roundtrip([1,2,3])
['[', '1', ',', ' ', '2', ',', ' ', '3', ']']
>>> roundtrip((1,))
('(', '1', ',', ')')
>>> roundtrip(Fraction(1, 2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in roundtrip
File "/Users/dickinsm/python_source/trunk/Lib/fractions.py", line 73, in
__new__
raise ValueError('Invalid literal for Fraction: %r' % input)
ValueError: Invalid literal for Fraction: 'Fraction(1, 2)'
>>> roundtrip(Decimal(1))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in roundtrip
File "/Users/dickinsm/python_source/trunk/Lib/decimal.py", line 545, in
__new__
"Invalid literal for Decimal: %r" % value)
File "/Users/dickinsm/python_source/trunk/Lib/decimal.py", line 3721, in
_raise_error
raise error(explanation)
decimal.InvalidOperation: Invalid literal for Decimal: "Decimal('1')"
In general, type(repr(x)) == x only works for simple types (ints, floats),
not compound types like lists, strings, tuples, ... I think it's
reasonable to regard complex as such a compound type.
There *is* a general Python guideline that eval(repr(x)) == x should work
where possible. Note that this fails even for floats:
>>> eval(repr(float('nan')))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
NameError: name 'nan' is not defined
The *real* problem here is that repr(complex) is a mess when it comes to
niceties like signed zeros, infinities and nans:
>>> complex(-0., 0.) # throws away sign of 0 on real part
0j
>>> eval(repr(complex(2., -0.))) # throws away sign on imag. part
(2+0j)
>>> nan = float('nan')
>>> complex(0, nan)
nan*j
>>> nan*j # can't even eval this...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'j' is not defined
>>> nan*1j # not much better: real part should be 0, not nan
(nan+nan*j)
>>> inf = float('inf')
>>> inf*1j # ouch: nan in real part!
(nan+inf*j)
I think the *right* solution is to define repr of a complex number
z to be:
"complex(%r, %r)" % (z.real, z.imag)
but that's not possible before the mythical, compatibility-breaking,
Python 4.0. And even then your desired complex(repr(z)) roundtripping
wouldn't work, though eval(repr(z)) would.
While we're daydreaming, another possibility is to follow C99's lead, and
introduce a type of 'imaginary' numbers, and make 1j be an imaginary
literal rather than a complex literal. That solves some of the above
eval(repr(.)) problems. Again, I don't see how this could happen before
4.0, and I'm not even sure that it's desirable.
In the circumstances, repr does the best it can, which is to output a
whole expression which more-or-less describes the given complex number,
and will usually eval back to the right thing. I think it's not the
business of the complex() constructor to parse such expressions.
To summarize: it's a mess. I'd recommend that you and/or numpy don't use
repr for roundtrip-capable conversions to string. Instead, output the
real and imaginary parts separately, and use float-from-string and the
complex-from-pair-of-floats constructors to reconstruct.
|
|||
| msg79251 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2009-01-06 12:05 | |
Christian, any comments? Is it okay to close this as a 'won't fix'? |
|||
| msg79257 - (view) | Author: Cournapeau David (cdavid) | Date: 2009-01-06 13:16 | |
> I disagree. *Why* do you think it should work? It fails for many other
> types:
I don't understand the rationale: why not making something work better
if possible ? Also, I don't understand the comparison with Decimal or
Fraction; there is a big difference between making the roundtrip not
possible at all for a type, and making it possible only for some values.
> I think the *right* solution is to define repr of a complex number
> z to be:
Why ? I don't understand those examples: maybe I am missing your
argument, but I understand it as we should not fix one bug because there
are another bugs related to inf and nan in complex. For me, something like:
a = complex(0)
a *= float('inf') * 1j
-> a = nan+nanj
Is really not desirable. That's actually a more serious bug than this
one. Are you saying that python does not care about inf/nan for complex
numbers ? If so, be it, but it would a bit unfortunate.
|
|||
| msg79261 - (view) | Author: Christian Heimes (christian.heimes) * ![]() |
Date: 2009-01-06 13:30 | |
Definitely! I don't see a reason the make the code more complex. David, we are against the patch because we have to keep the code maintainable. The code is already long and spreads over lots of lines. Your use case isn't common enough to warrant even more complicated parsing code. |
|||
| msg79266 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2009-01-06 15:44 | |
As Christian says, it's about not increasing code complexity without a
good reason. For me, it's also about mental complexity: the set of valid
inputs to the complex constructor should be small, and should be easy to
understand and to describe.
The addition of the '*1j' part of the expression bothers me particularly,
as does allowing complex('j') to be valid.
I have a compromise proposal:
1. For nans and infs, get rid of the * on output: change repr so that
e.g., repr(complex(0, nan)) is simply 'nanj', and repr(complex(inf,-inf))
is '(inf-infj)'. (Yes, I know it looks ugly, but bear with me.)
2. On input, allow strings of the form "complex-string" or "(complex-
string)" where (in pseudo BNF):
sign ::= '+' | '-'
real-part ::= float-string
imag-part ::= float-string 'j'
complex-string ::= real-part [sign imag-part] | imag-part
and float-string is any string currently accepted by float
(excluding leading or trailing whitespace). Nothing else would be
permitted. (Well, okay, we still have to allow whitespace around the
complex string, both inside and outside the parentheses.)
I think this would simplify the parsing, and remove need for special
casing of nans and infs. It might even allow the code to become simpler,
by sharing some of it with stuff in floatobject.c.
The above would allow double signs: e.g. '2+-1j', with the first sign
being the one explicitly described above and the second being part of the
float-string. I don't think this is a terrible thing, if it simplifies
parsing. It might even be helpful. For example: "%r+%rj" % (z.real,
z.imag) would always be valid input.
The current 'nan*j' is certainly more attractive than 'nanj', but it's
just too suggestive of the actual expression float('nan')*1j, which it
turns out doesn't produce the same thing.
If you could come up with a patch that does something like this I'd take a
look.
By the way, "0 + inf*1j" giving nan + inf*j isn't really a bug; it's
pretty much unavoidable: 1j is really complex(0, 1), and when multiplying
by inf we get complex(inf*0, inf*1). With the usual IEEE 754 semantics
inf*0 is nan.
I think it's exactly to fix this sort of behaviour that C99 introduced its
imaginary type (see Annex G of the standard). It doesn't seem to have
caught on, though: I don't think gcc implements this, and I'd be very
surprised if Visual Studio does.
|
|||
| msg79268 - (view) | Author: Cournapeau David (cdavid) | Date: 2009-01-06 16:05 | |
@ Mark
Concerning float('inf') * 1j: you're right, my rambling did not make any
sense, sorry.
I agree that adding complexity may be a good reason to warrant an
arbitrary feature; actually, I did not manage to handle nan/inf at first
because of the complexity :) But as I mentioned in a previous comment, I
think my patch simplifies the parsing as well. I suggested to split the
patch in to: one whithout any new feature (no handling of nan/inf), and
then anoter handling nan/inf on the top of it. Would that be acceptable ?
Also, how should I proceed to share implementation between floatobject.c
and complexobject.c ? Would creating a new file for the common code be
acceptable ?
|
|||
| msg79272 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2009-01-06 16:28 | |
David, at this point, I recommend dropping this one. It has become a time waster. Possibly, there is a refactoring of the code that makes parsing a bit simpler or shares some common code but it is not worth destabilizing the battle-tested code and risking the introduction of subtle new parsing errors. |
|||
| msg79273 - (view) | Author: Raymond Hettinger (rhettinger) * ![]() |
Date: 2009-01-06 16:35 | |
Mark, I'm somewhat uncomfortable with your proposal also. It changes what the parser will accept and the potential benefits are almost nil. Also, putting NaNs and Infs in complex numbers is probably not something that should be pursued. I see no utility in pairs like (NaN, 3.0) or (-2.0, Inf). The whole request is use case challenged. AFAICT, the current implementation suffice for real cases and there is no compelling need to rearrange and redesign this code. |
|||
| msg80284 - (view) | Author: Mark Dickinson (mark.dickinson) * ![]() |
Date: 2009-01-20 21:40 | |
Closing this one as won't fix. |
|||
| History | |||
|---|---|---|---|
| Date | User | Action | Args |
| 2022-04-11 14:56:30 | admin | set | github: 46374 |
| 2009-01-20 21:40:12 | mark.dickinson | set | status: open -> closed resolution: wont fix messages: + msg80284 |
| 2009-01-06 16:35:42 | rhettinger | set | messages: + msg79273 |
| 2009-01-06 16:28:37 | rhettinger | set | priority: normal -> low messages: + msg79272 versions: + Python 3.1, Python 2.7, - Python 2.6, Python 3.0 |
| 2009-01-06 16:05:00 | cdavid | set | messages: + msg79268 |
| 2009-01-06 15:44:30 | mark.dickinson | set | messages: + msg79266 |
| 2009-01-06 13:30:07 | christian.heimes | set | messages: + msg79261 |
| 2009-01-06 13:16:41 | cdavid | set | messages: + msg79257 |
| 2009-01-06 12:05:01 | mark.dickinson | set | messages: + msg79251 |
| 2009-01-06 11:56:28 | mark.dickinson | set | messages: + msg79250 |
| 2009-01-02 16:26:11 | cdavid | set | messages: + msg78833 |
| 2009-01-02 16:08:18 | mark.dickinson | set | messages: + msg78827 |
| 2009-01-02 15:57:20 | mark.dickinson | set | messages: + msg78824 |
| 2008-12-31 11:35:12 | cdavid | set | files: - nan_parse.patch |
| 2008-12-31 11:35:08 | cdavid | set | files: - nan_parse.patch |
| 2008-12-31 11:34:56 | cdavid | set | files:
+ nan_parse.patch messages: + msg78598 |
| 2008-12-31 10:38:24 | cdavid | set | messages: + msg78597 |
| 2008-12-31 10:05:16 | rhettinger | set | assignee: mark.dickinson messages: + msg78596 |
| 2008-12-31 09:14:01 | cdavid | set | messages: + msg78595 |
| 2008-12-31 09:05:25 | rhettinger | set | type: enhancement messages: + msg78594 nosy: + rhettinger |
| 2008-12-31 07:53:18 | cdavid | set | files:
+ nan_parse.patch messages: + msg78593 |
| 2008-12-31 07:41:56 | cdavid | set | files:
+ nan_parse.patch keywords: + patch messages: + msg78592 nosy: + cdavid |
| 2008-03-01 21:38:49 | mark.dickinson | set | nosy:
+ mark.dickinson messages: + msg63169 |
| 2008-02-15 08:15:16 | christian.heimes | create | |
