|
|
TIKA-4329
|
Release tika-3.0.0's docker image
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4311
|
Avoid potential ClassCastException in angle detection PDF extraction
|
Tilman Hausherr
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4310
|
Add CloseShield to JSoupParser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4308
|
ExecutableParser: PE 0x14c and 0x8664 both yield MACHINE_x86_32
|
Tilman Hausherr
|
Alexey Pelykh
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4299
|
Clean up pagination in AbstractPDF2XHTML
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4298
|
Failed to detect charset for zip entry with short non-Unicode file name
|
Tilman Hausherr
|
Mingchun Zhao
|
|
Closed |
Fixed
|
|
|
|
|
|
|
TIKA-4296
|
"Parameter must be 1-based, but is -1" when using Tika with PDFBox 2.0.32
|
Tilman Hausherr
|
Thomas Mortagne
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4295
|
Allow bypass of emitKey in AbstractEmbeddedDocumentBytesHandler
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4294
|
Simplify serialization/deserialization of ParseContext
|
Tim Allison
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4293
|
Mismatched type in contains() calls in StreamingDetectContext
|
Tilman Hausherr
|
Dmitrii Kriukov
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4292
|
Mismatched type in contains() calls in OneNoteTreeWalker
|
Tilman Hausherr
|
Dmitrii Kriukov
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4291
|
In JDBCEmitter local var dateFormats shadows class filed with the same name
|
Tilman Hausherr
|
Dmitrii Kriukov
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4290
|
Fix code inspection anomalies
|
Unassigned
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4289
|
Further improvements to the metadata filter and serialization
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4288
|
Allow user configuration of MetadataFilters in PipesServer
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4287
|
Improve PDFParserConfig serialization
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4283
|
Add detection for JKS Keystore
|
Tilman Hausherr
|
Lonzak
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4282
|
Syntax error with h2 version 2.3.230
|
Tilman Hausherr
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4280
|
Tasks for the 3.0.0 release
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4279
|
Setter/Adder for spoolToTemp non existent in org.apache.tika.pipes.fetchers.microsoftgraph.MicrosoftGraphFetcher
|
Unassigned
|
Bartek Ciszkowski
|
|
Closed |
Invalid
|
|
|
|
|
|
|
TIKA-4278
|
TextAndCSVParser doesn't detect semicolon separated file
|
Unassigned
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4274
|
Improve ExtractReaderException
|
Tilman Hausherr
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4268
|
Use title for embedded resource path in embedded msg files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4259
|
Decouple xml parser stuff from ParseContext
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4257
|
Tika detect() recognizes some p7m files as format x-dbf
|
Unassigned
|
Luca Bentivoglio
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4256
|
Allow inlining of ocr'd text in container document
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4252
|
PipesClient#process - seems to lose the Fetch input metadata?
|
Unassigned
|
Nicholas DiPiazza
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4249
|
EML file is treating it as text file in 2.9.2 version
|
Unassigned
|
Tika User
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4247
|
HttpFetcher - add ability to send request headers
|
Unassigned
|
Nicholas DiPiazza
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4244
|
Tika idenifies MIME type of ics files with html content as text/html
|
Unassigned
|
Kartik Jain
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4243
|
tika configuration overhaul
|
Unassigned
|
Nicholas DiPiazza
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4238
|
replace some deprecated code
|
Tilman Hausherr
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4236
|
tika-parser-nlp-module has an unnecessary Guava dependency
|
Tilman Hausherr
|
Manfred Baedke
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4234
|
Further improvements to jdbc pipes reporter
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4225
|
Add detection for AMF
|
Unassigned
|
Robin Schimpf
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4224
|
Add detection for 3MF
|
Unassigned
|
Robin Schimpf
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4223
|
STL file exported with OpenSCAD not detected correctly
|
Unassigned
|
Robin Schimpf
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4222
|
Add detection for OpenSCAD
|
Unassigned
|
Robin Schimpf
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4221
|
Regression in pack200 parsing in commons-compress
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4220
|
Commons-compress too lenient on headless tar detection
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4217
|
Require newline in image/x-portable-graymap and image/x-portable-bitmap magic
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4216
|
Avoid checking for ImageMagick if image processing is disabled in TesseractOCRParser
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4215
|
Avoid loading resources in Tika() just to get version number
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4213
|
Improvements to jdbc pipes reporter
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4211
|
Tika extractor fails to extract embedded excel from pptx
|
Unassigned
|
Xiaohong Yang
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4207
|
PipesParser should have option to extract raw bytes of embedded files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4205
|
Add more columns to profiles table in tika-eval Profile mode
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4204
|
ChmExtractor unable to decompress file
|
Tim Allison
|
Robert Fromholz
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4203
|
Add @deprecated annotation where needed
|
Tilman Hausherr
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4202
|
Add page count of OCR'd pages in metadata for PDF files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4199
|
commons-compress 1.26.0 breaks Apache Tika 2.9.1
|
Tilman Hausherr
|
Alexander Veit
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4198
|
Skip blob fields in geopkg files
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4195
|
JSoupParser conceals null from the EncodingDetector
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4193
|
Add num common tokens to tika-eval metadatafilter
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4188
|
Add support for ARC files
|
Unassigned
|
Gregory Lepore
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4187
|
Add detection for geopackage
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4184
|
Fix small handful of broken assertNotNulls
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4176
|
DRM EPUBs should cause encrypted document exception
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4174
|
sip pcap mime type being misidentified as x-robots
|
Unassigned
|
Nissim Shiman
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4173
|
Fix dev version in main branch
|
Unassigned
|
Tim Allison
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4166
|
dependency updates for Tika 3.0
|
Unassigned
|
Tilman Hausherr
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-4119
|
Return media type "text/javascript" instead of "application/javascript to follow RFC-9239
|
Unassigned
|
Matthias Juchmes
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-3604
|
Upgrade pdfbox3
|
Unassigned
|
mannixli
|
|
Resolved |
Fixed
|
|
|
|
|
|
|
TIKA-1907
|
Big Pdf parsing to text - Out of memory
|
Tilman Hausherr
|
Nicolas Daniels
|
|
Closed |
Fixed
|
|
|
|
|