Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Apache Avro integration

TODO: Add description and examples how to use parquet-avro

Available options via Hadoop Configuration

Configuration for reading

Name Type Description
parquet.avro.data.supplier Class The implementation of the interface org.apache.parquet.avro.AvroDataSupplier. Available implementations in the library: GenericDataSupplier, ReflectDataSupplier, SpecificDataSupplier.
The default value is org.apache.parquet.avro.SpecificDataSupplier
parquet.avro.read.schema String The Avro schema to be used for reading. It shall be compatible with the file schema. The file schema will be used directly if not set.
parquet.avro.projection String The Avro schema to be used for projection.
parquet.avro.compatible boolean Flag for compatibility mode. true for materializing Avro IndexedRecord objects, false for materializing the related objects for either generic, specific, or reflect records.
The default value is true.
parquet.avro.readInt96AsFixed boolean Flag for handling the INT96 Parquet types. true for converting it to the fixed Avro type, false for not handling INT96 types (throwing exception).
The default value is false.
NOTE: The INT96 Parquet type is deprecated. This option is only to support old data.
parquet.avro.serializable.classes String List of the fully qualified class names separated by ',' that may be referenced from the Avro schema by "java-class" or "java-key-class" and are allowed to be loaded.

Configuration for writing

Name Type Description
parquet.avro.write.data.supplier Class The implementation of the interface org.apache.parquet.avro.AvroDataSupplier. Available implementations in the library: GenericDataSupplier, ReflectDataSupplier, SpecificDataSupplier.
The default value is org.apache.parquet.avro.SpecificDataSupplier
parquet.avro.schema String The Avro schema to be used for generating the Parquet schema of the file.
parquet.avro.write-old-list-structure boolean Flag whether to write list structures in the old way (2 levels) or the new one (3 levels). When writing at 2 levels no null values are available at the element level.
The default value is true
parquet.avro.add-list-element-records boolean Flag whether to assume that any repeated element in the schema is a list element.
The default value is true.
parquet.avro.write-parquet-uuid boolean Flag whether to write the Parquet UUID logical type in case of an Avro UUID type is present.
The default value is false.
parquet.avro.writeFixedAsInt96 String Comma separated list of paths pointing to Avro schema elements which are to be converted to INT96 Parquet types.
The path is a '.' separated list of field names and does not contain the name of the schema nor the namespace. The type of the referenced schema elements must be fixed with the size of 12 bytes.
NOTE: The INT96 Parquet type is deprecated. This option is only to support old data.