Filesystem Interface (legacy)¶
Note
This section documents the deprecated filesystem layer. It is highly recommended to use the new filesystem layer instead.
Hadoop File System (HDFS)¶
PyArrow comes with bindings to a C++-based interface to the Hadoop File System. You connect like so:
import pyarrow as pa
fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path)
with fs.open(path, 'rb') as f:
# Do something with f
By default, pyarrow.hdfs.HadoopFileSystem
uses libhdfs, a JNI-based
interface to the Java Hadoop client. This library is loaded at runtime
(rather than at link / library load time, since the library may not be in your
LD_LIBRARY_PATH), and relies on some environment variables.
HADOOP_HOME
: the root of your installed Hadoop distribution. Often has lib/native/libhdfs.so.JAVA_HOME
: the location of your Java SDK installation.ARROW_LIBHDFS_DIR
(optional): explicit location oflibhdfs.so
if it is installed somewhere other than$HADOOP_HOME/lib/native
.CLASSPATH
: must contain the Hadoop jars. You can set these using:
export CLASSPATH=`$HADOOP_HOME/bin/hdfs classpath --glob`
If CLASSPATH
is not set, then it will be set automatically if the
hadoop
executable is in your system path, or if HADOOP_HOME
is set.
You can also use libhdfs3, a thirdparty C++ library for HDFS from Pivotal Labs:
fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path,
driver='libhdfs3')
HDFS API¶
|
Connect to an HDFS cluster. |
|
Return contents of file as a bytes object. |
|
Change file permissions |
|
Change file permissions |
|
Delete the indicated file or directory. |
|
Return free space on disk, like the UNIX df command |
Compute bytes used by all contents under indicated path in file tree. |
|
|
|
|
Return True if path exists. |
Get reported total capacity of file system |
|
Get space used on file system |
|
|
Return detailed HDFS information for path |
|
Retrieve directory contents and metadata, if requested. |
|
Create directory in HDFS. |
|
Open HDFS file for reading or writing |
|
Rename file, like UNIX mv command. |
|
Alias for FileSystem.delete. |
|
Upload file-like object to HDFS path |