Message358458
I tried to take a look at the code to see where the fix needs to be and I probably need some help.
I looked at the parse tree for the header and it looks something like this:
ContentDisposition([Token([ValueTerminal('attachment')]), ValueTerminal(';'), MimeParameters([Parameter([Attribute([CFWSList([WhiteSpaceTerminal(' ')]), ValueTerminal('filename')]), ValueTerminal('='), Value([QuotedString([BareQuotedString([EncodedWord([ValueTerminal('Schulbesuchsbestättigung.')]), WhiteSpaceTerminal(' '), EncodedWord([ValueTerminal('pdf')])])])])])])])
The offending piece of code, which seems to be working as designed is get_bare_quoted_string() in email/_header_value_parser.py.
while value and value[0] != '"':
if value[0] in WSP:
token, value = get_fws(value)
elif value[:2] == '=?':
try:
token, value = get_encoded_word(value)
bare_quoted_string.defects.append(errors.InvalidHeaderDefect(
"encoded word inside quoted string"))
except errors.HeaderParseError:
token, value = get_qcontent(value)
else:
token, value = get_qcontent(value)
bare_quoted_string.append(token)
It just loops and parses the values. We cannot ignore the FWS until we know that the atom before and after the FWS are encoded words. I can't seem to find a clean way to look-ahead (which can perhaps be used in get_parameters()) or look-back (which can be used after parsing the entire bare_quoted_string?) in the parse tree to delete the offending whitespace.
Any example of such kind of parse-tree manipulation in the code base would be awesome! |
|
| Date |
User |
Action |
Args |
| 2019-12-15 23:05:17 | maxking | set | recipients:
+ maxking, barry, r.david.murray, mkaiser |
| 2019-12-15 23:05:17 | maxking | set | messageid: <[email protected]> |
| 2019-12-15 23:05:17 | maxking | link | issue39040 messages |
| 2019-12-15 23:05:17 | maxking | create | |
|