pyarrow.dataset.DirectoryPartitioning¶

class pyarrow.dataset.DirectoryPartitioning¶

Bases: pyarrow._dataset.Partitioning

A Partitioning based on a specified Schema.

The DirectoryPartitioning expects one segment in the file path for each field in the schema (all fields are required to be present). For example given schema<year:int16, month:int8> the path “/2009/11” would be parsed to (“year”_ == 2009 and “month”_ == 11).

Parameters: schema (Schema) – The schema that describes the partitions present in the file path.
Returns: DirectoryPartitioning

Examples

>>> from pyarrow.dataset import DirectoryPartitioning
>>> partition = DirectoryPartitioning(
...     pa.schema([("year", pa.int16()), ("month", pa.int8())]))
>>> print(partitioning.parse("/2009/11"))
((year == 2009:int16) and (month == 11:int8))

__init__(*args, **kwargs)¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__init__`(args, *kwargs)	Initialize self.
`discover`	Discover a DirectoryPartitioning.
`parse`

Attributes

schema

The arrow Schema attached to the partitioning.

static discover()¶

Discover a DirectoryPartitioning.

Parameters

field_names (list of str) – The names to associate with the values from the subdirectory names.
max_partition_dictionary_size (int or None, default 0) – The maximum number of unique values to consider for dictionary encoding. By default no field will be inferred as dictionary encoded. If None is provided dictionary encoding will be used for every string field.

Returns

DirectoryPartitioningFactory – To be used in the FileSystemFactoryOptions.

parse()¶

schema¶: The arrow Schema attached to the partitioning.