pyarrow.array¶
-
pyarrow.
array
(obj, type=None, mask=None, size=None, from_pandas=None, bool safe=True, MemoryPool memory_pool=None)¶ Create pyarrow.Array instance from a Python object.
- Parameters
obj (sequence, iterable, ndarray or Series) – If both type and size are specified may be a single use iterable. If not strongly-typed, Arrow type will be inferred for resulting array.
type (pyarrow.DataType) – Explicit type to attempt to coerce to, otherwise will be inferred from the data.
mask (array[bool], optional) – Indicate which values are null (True) or not null (False).
size (int64, optional) – Size of the elements. If the input is larger than size bail at this length. For iterators, if size is larger than the input iterator this will be treated as a “max size”, but will involve an initial allocation of size followed by a resize to the actual size (so if you know the exact size specifying it correctly will give you better performance).
from_pandas (bool, default None) – Use pandas’s semantics for inferring nulls from values in ndarray-like data. If passed, the mask tasks precedence, but if a value is unmasked (not-null), but still null according to pandas semantics, then it is null. Defaults to False if not passed explicitly by user, or True if a pandas object is passed in.
safe (bool, default True) – Check for overflows or other unsafe conversions.
memory_pool (pyarrow.MemoryPool, optional) – If not passed, will allocate memory from the currently-set default memory pool.
- Returns
array (pyarrow.Array or pyarrow.ChunkedArray) – A ChunkedArray instead of an Array is returned if: - the object data overflowed binary storage. - the object’s
__arrow_array__
protocol method returned a chunkedarray.
Notes
Localized timestamps will currently be returned as UTC (pandas’s native representation). Timezone-naive data will be implicitly interpreted as UTC.
Examples
>>> import pandas as pd >>> import pyarrow as pa >>> pa.array(pd.Series([1, 2])) <pyarrow.array.Int64Array object at 0x7f674e4c0e10> [ 1, 2 ]
>>> import numpy as np >>> pa.array(pd.Series([1, 2]), np.array([0, 1], ... dtype=bool)) <pyarrow.array.Int64Array object at 0x7f9019e11208> [ 1, null ]