ASF JIRA

Tika
3.0.0
Key descending
164 of 64 as at: 25/Jul/25 00:33
T Patch Info Key Summary Assignee Reporter P Status Resolution Created Updated Due Development
Task TIKA-4329

Release tika-3.0.0's docker image

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-4311

Avoid potential ClassCastException in angle detection PDF extraction

Tilman Hausherr Tilman Hausherr Major Resolved Fixed  
Task TIKA-4310

Add CloseShield to JSoupParser

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-4308

ExecutableParser: PE 0x14c and 0x8664 both yield MACHINE_x86_32

Tilman Hausherr Alexey Pelykh Trivial Resolved Fixed  
Task TIKA-4299

Clean up pagination in AbstractPDF2XHTML

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-4298

Failed to detect charset for zip entry with short non-Unicode file name

Tilman Hausherr Mingchun Zhao Major Closed Fixed  
Bug TIKA-4296

"Parameter must be 1-based, but is -1" when using Tika with PDFBox 2.0.32

Tilman Hausherr Thomas Mortagne Major Resolved Fixed  
Task TIKA-4295

Allow bypass of emitKey in AbstractEmbeddedDocumentBytesHandler

Unassigned Tim Allison Minor Resolved Fixed  
Task TIKA-4294

Simplify serialization/deserialization of ParseContext

Tim Allison Tim Allison Trivial Resolved Fixed  
Bug TIKA-4293

Mismatched type in contains() calls in StreamingDetectContext

Tilman Hausherr Dmitrii Kriukov Major Resolved Fixed  
Bug TIKA-4292

Mismatched type in contains() calls in OneNoteTreeWalker

Tilman Hausherr Dmitrii Kriukov Major Resolved Fixed  
Bug TIKA-4291

In JDBCEmitter local var dateFormats shadows class filed with the same name

Tilman Hausherr Dmitrii Kriukov Major Resolved Fixed  
Bug TIKA-4290

Fix code inspection anomalies

Unassigned Tilman Hausherr Minor Resolved Fixed  
Task TIKA-4289

Further improvements to the metadata filter and serialization

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-4288

Allow user configuration of MetadataFilters in PipesServer

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-4287

Improve PDFParserConfig serialization

Unassigned Tim Allison Trivial Resolved Fixed  
New Feature TIKA-4283

Add detection for JKS Keystore

Tilman Hausherr Lonzak Major Resolved Fixed  
Bug TIKA-4282

Syntax error with h2 version 2.3.230

Tilman Hausherr Tilman Hausherr Minor Resolved Fixed  
Task TIKA-4280

Tasks for the 3.0.0 release

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-4279

Setter/Adder for spoolToTemp non existent in org.apache.tika.pipes.fetchers.microsoftgraph.MicrosoftGraphFetcher

Unassigned Bartek Ciszkowski Major Closed Invalid  
Bug TIKA-4278

TextAndCSVParser doesn't detect semicolon separated file

Unassigned Tilman Hausherr Major Resolved Fixed  
Improvement TIKA-4274

Improve ExtractReaderException

Tilman Hausherr Tilman Hausherr Minor Resolved Fixed  
Task TIKA-4268

Use title for embedded resource path in embedded msg files

Unassigned Tim Allison Trivial Resolved Fixed  
Task TIKA-4259

Decouple xml parser stuff from ParseContext

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-4257

Tika detect() recognizes some p7m files as format x-dbf

Unassigned Luca Bentivoglio Major Resolved Fixed  
Task TIKA-4256

Allow inlining of ocr'd text in container document

Unassigned Tim Allison Major Resolved Fixed  
Bug TIKA-4252

PipesClient#process - seems to lose the Fetch input metadata?

Unassigned Nicholas DiPiazza Major Resolved Fixed  
Bug TIKA-4249

EML file is treating it as text file in 2.9.2 version

Unassigned Tika User Blocker Resolved Fixed  
New Feature TIKA-4247

HttpFetcher - add ability to send request headers

Unassigned Nicholas DiPiazza Major Resolved Fixed  
Bug TIKA-4244

Tika idenifies MIME type of ics files with html content as text/html

Unassigned Kartik Jain Major Resolved Fixed  
New Feature TIKA-4243

tika configuration overhaul

Unassigned Nicholas DiPiazza Major Resolved Fixed  
Task TIKA-4238

replace some deprecated code

Tilman Hausherr Tilman Hausherr Minor Resolved Fixed  
Bug TIKA-4236

tika-parser-nlp-module has an unnecessary Guava dependency

Tilman Hausherr Manfred Baedke Major Resolved Fixed  
Improvement TIKA-4234

Further improvements to jdbc pipes reporter

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-4225

Add detection for AMF

Unassigned Robin Schimpf Major Resolved Fixed  
Improvement TIKA-4224

Add detection for 3MF

Unassigned Robin Schimpf Major Resolved Fixed  
Improvement TIKA-4223

STL file exported with OpenSCAD not detected correctly

Unassigned Robin Schimpf Major Resolved Fixed  
Improvement TIKA-4222

Add detection for OpenSCAD

Unassigned Robin Schimpf Major Resolved Fixed  
Task TIKA-4221

Regression in pack200 parsing in commons-compress

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-4220

Commons-compress too lenient on headless tar detection

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-4217

Require newline in image/x-portable-graymap and image/x-portable-bitmap magic

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-4216

Avoid checking for ImageMagick if image processing is disabled in TesseractOCRParser

Unassigned Tim Allison Minor Resolved Fixed  
New Feature TIKA-4215

Avoid loading resources in Tika() just to get version number

Unassigned Tim Allison Major Resolved Fixed  
New Feature TIKA-4213

Improvements to jdbc pipes reporter

Unassigned Tim Allison Trivial Resolved Fixed  
Bug TIKA-4211

Tika extractor fails to extract embedded excel from pptx

Unassigned Xiaohong Yang Major Resolved Fixed  
New Feature TIKA-4207

PipesParser should have option to extract raw bytes of embedded files

Unassigned Tim Allison Major Resolved Fixed  
New Feature TIKA-4205

Add more columns to profiles table in tika-eval Profile mode

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-4204

ChmExtractor unable to decompress file

Tim Allison Robert Fromholz Blocker Resolved Fixed  
Task TIKA-4203

Add @deprecated annotation where needed

Tilman Hausherr Tilman Hausherr Trivial Resolved Fixed  
New Feature TIKA-4202

Add page count of OCR'd pages in metadata for PDF files

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-4199

commons-compress 1.26.0 breaks Apache Tika 2.9.1

Tilman Hausherr Alexander Veit Major Resolved Fixed  
Improvement TIKA-4198

Skip blob fields in geopkg files

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-4195

JSoupParser conceals null from the EncodingDetector

Unassigned Tim Allison Minor Resolved Fixed  
Improvement TIKA-4193

Add num common tokens to tika-eval metadatafilter

Unassigned Tim Allison Trivial Resolved Fixed  
Improvement TIKA-4188

Add support for ARC files

Unassigned Gregory Lepore Minor Resolved Fixed  
New Feature TIKA-4187

Add detection for geopackage

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-4184

Fix small handful of broken assertNotNulls

Unassigned Tim Allison Trivial Resolved Fixed  
New Feature TIKA-4176

DRM EPUBs should cause encrypted document exception

Unassigned Tim Allison Minor Resolved Fixed  
Bug TIKA-4174

sip pcap mime type being misidentified as x-robots

Unassigned Nissim Shiman Major Resolved Fixed  
Bug TIKA-4173

Fix dev version in main branch

Unassigned Tim Allison Major Resolved Fixed  
Task TIKA-4166

dependency updates for Tika 3.0

Unassigned Tilman Hausherr Minor Resolved Fixed  
Improvement TIKA-4119

Return media type "text/javascript" instead of "application/javascript to follow RFC-9239

Unassigned Matthias Juchmes Major Resolved Fixed  
Improvement TIKA-3604

Upgrade pdfbox3

Unassigned mannixli Minor Resolved Fixed  
Bug TIKA-1907

Big Pdf parsing to text - Out of memory

Tilman Hausherr Nicolas Daniels Major Closed Fixed