pyarrow.array

pyarrow.array(obj, type=None, mask=None, size=None, from_pandas=None, bool safe=True, MemoryPool memory_pool=None)

Create pyarrow.Array instance from a Python object.

Parameters
  • obj (sequence, iterable, ndarray or Series) – If both type and size are specified may be a single use iterable. If not strongly-typed, Arrow type will be inferred for resulting array.

  • type (pyarrow.DataType) – Explicit type to attempt to coerce to, otherwise will be inferred from the data.

  • mask (array[bool], optional) – Indicate which values are null (True) or not null (False).

  • size (int64, optional) – Size of the elements. If the input is larger than size bail at this length. For iterators, if size is larger than the input iterator this will be treated as a “max size”, but will involve an initial allocation of size followed by a resize to the actual size (so if you know the exact size specifying it correctly will give you better performance).

  • from_pandas (bool, default None) – Use pandas’s semantics for inferring nulls from values in ndarray-like data. If passed, the mask tasks precedence, but if a value is unmasked (not-null), but still null according to pandas semantics, then it is null. Defaults to False if not passed explicitly by user, or True if a pandas object is passed in.

  • safe (bool, default True) – Check for overflows or other unsafe conversions.

  • memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default memory pool.

Returns

array (pyarrow.Array or pyarrow.ChunkedArray) – A ChunkedArray instead of an Array is returned if: - the object data overflowed binary storage. - the object’s __arrow_array__ protocol method returned a chunked

array.

Notes

Localized timestamps will currently be returned as UTC (pandas’s native representation). Timezone-naive data will be implicitly interpreted as UTC.

Examples

>>> import pandas as pd
>>> import pyarrow as pa
>>> pa.array(pd.Series([1, 2]))
<pyarrow.array.Int64Array object at 0x7f674e4c0e10>
[
  1,
  2
]
>>> import numpy as np
>>> pa.array(pd.Series([1, 2]), np.array([0, 1],
... dtype=bool))
<pyarrow.array.Int64Array object at 0x7f9019e11208>
[
  1,
  null
]