Apache Spark Connector for Lance

The Apache Spark Connector for Lance allows Apache Spark to efficiently read datasets stored in Lance format. By using the Apache Spark Connector for Lance, you can leverage Apache Spark's powerful data processing, SQL querying, and machine learning training capabilities on the AI data lake powered by Lance.

Features

The connector is built using the Spark DatasourceV2 (DSv2) API. Please check this presentation to learn more about DSv2 features. Specifically, you can use the Apache Spark Connector for Lance to:

Read & Write Lance Datasets: Seamlessly read and write datasets stored in the Lance format using Spark.
Distributed, Parallel Scans: Leverage Spark's distributed computing capabilities to perform parallel scans on Lance datasets.
Column and Filter Pushdown: Optimize query performance by pushing down column selections and filters to the data source.

Quick Start

The project contains a docker image in the docker folder you can build and run a simple example notebook. To do so, clone the repo and run:

make docker-build
make docker-up

And then open the notebook at http://localhost:8888.

Contributing

See contributing for the detailed contribution guidelines and local development setup.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github		.github
.mvn/wrapper		.mvn/wrapper
docker		docker
docs		docs
lance-spark-3.4_2.12		lance-spark-3.4_2.12
lance-spark-3.4_2.13		lance-spark-3.4_2.13
lance-spark-3.5_2.12		lance-spark-3.5_2.12
lance-spark-3.5_2.13		lance-spark-3.5_2.13
lance-spark-4.0_2.13		lance-spark-4.0_2.13
lance-spark-base_2.12		lance-spark-base_2.12
lance-spark-base_2.13		lance-spark-base_2.13
lance-spark-bundle-3.4_2.12		lance-spark-bundle-3.4_2.12
lance-spark-bundle-3.4_2.13		lance-spark-bundle-3.4_2.13
lance-spark-bundle-3.5_2.12		lance-spark-bundle-3.5_2.12
lance-spark-bundle-3.5_2.13		lance-spark-bundle-3.5_2.13
lance-spark-bundle-4.0_2.13		lance-spark-bundle-4.0_2.13
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
checkstyle.xml		checkstyle.xml
mvnw		mvnw
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Apache Spark Connector for Lance

Features

Quick Start

Contributing

About

Uh oh!

Releases

Uh oh!

Contributors 7

Uh oh!

Languages

License

lancedb/lance-spark

Folders and files

Latest commit

History

Repository files navigation

Apache Spark Connector for Lance

Features

Quick Start

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 7

Uh oh!

Languages