Input Sources

Input Sources is an abstraction for loading Spark data via configuration files. Currently, it can handle

file path sources
table sources
SQL sources
BigQuery sources

This library aims to be easily extended to other sources by using sealed trait with case classes for each new sources.

// https://central.sonatype.com/artifact/com.growingintech/spark-input-sources_2.12/1.0.1 libraryDependencies += "com.growingintech" %% "spark-input-sources" % "1.0.1"

New Sources

Feel free to submit a PR for any new sources you would like to add. I don't plan on creating cloud accounts for all clouds, so it will be helpful if others can work on Amazon and Azure.

Basic Usage

In this simple example, we have a HOCON pipeline configuration string which can have as many parameters as needed for the user's use case. For our data definition, I am using a TableSource example.

/*  * Copyright 2023 GrowingInTech.com. All Rights Reserved.  *  * Licensed under the Apache License, Version 2.0 (the "License"). You may not  * use this file except in compliance with the License. A copy of the License  * is located at  *  * http://www.apache.org/licenses/LICENSE-2.0  *  * or in the "license" file accompanying this file. This file is distributed on  * an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either  * express or implied. See the License for the specific language governing  * permissions and limitations under the License.  *  */ import com.growingintech.datasources.InputSources import com.typesafe.config.ConfigFactory import pureconfig._ import pureconfig.generic.auto._ import org.apache.spark.sql.DataFrame val strConfig: String = """  |{  | pipeline-name: Data Runner  | date: 20230216  | data: {  | type: table-source  | table-name: default.test_data  | filter: "date = 20230101 AND x > 2"  | }  |}  |""".stripMargin case class Params( pipelineName: String, date: Int, data: InputSources ) val config: Params = ConfigSource.fromConfig(ConfigFactory.parseString(strConfig)).loadOrThrow[Params] val df: Dataframe = Params.data.loadData

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
project		project
src		src
.gitignore		.gitignore
.sbtopts		.sbtopts
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
publish.sbt		publish.sbt
scalastyle-config.xml		scalastyle-config.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Input Sources

New Sources

Basic Usage

About

Uh oh!

Releases 1

Packages

Uh oh!

Languages

License

GrowingInTech/spark-input-sources

Folders and files

Latest commit

History

Repository files navigation

Input Sources

New Sources

Basic Usage

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Languages

Packages