Message56604
Thanks for persevering!!!
The dangers of switching between fileno(fp) and fp are actually well
documented in the C and/or POSIX standards. The problem is caused in
PyFile_FromFileEx() -- it creates a Python file object from the file
descriptor. The fix actually only works because we're not using the
FILE struct once PyTokenizer_FindEncoding() is called. I think it
would be better to move the lseek() into call_find_module() so the
FILE abstraction is not broken by PyTokenizer_FindEncoding().
I think there's still a bug or two lurking in this area: first, each
time you call imp.find_module() you leak a FILE object; second, the
encoding allocated in PyTokenizer_FindEncoding() is leaked.
You're right that a lot of this could be avoided if we used file
descriptors consistently. It seems find_module() itself doesn't read
the file; it just needs to know that it's possible to open the file.
Rewriting everywhere that uses PyFile_FromFile[Ex] to use file
descriptors doesn't seem too hard; there are only a few places. |
|
| Date |
User |
Action |
Args |
| 2007-10-20 15:36:08 | gvanrossum | set | spambayes_score: 0.0525689 -> 0.052568886 recipients:
+ gvanrossum, brett.cannon, christian.heimes |
| 2007-10-20 15:36:07 | gvanrossum | link | issue1267 messages |
| 2007-10-20 15:36:07 | gvanrossum | create | |
|