This library helps to read and write data from most of the data sources. It accelerate the ML and ETL process without worrying about the multiple data connectors.
pip install -U dataligoInstall from sources
Alternatively, you can also clone the latest version from the repository and install it directly from the source code:
pip install -e .>>> from dataligo import Ligo >>> from transformers import pipeline >>> ligo = Ligo('./ligo_config.yaml') # Check the sample_ligo_config.yaml for reference >>> print(ligo.get_supported_data_sources_list()) ['s3', 'gcs', 'azureblob', 'bigquery', 'snowflake', 'redshift', 'starrocks', 'postgresql', 'mysql', 'oracle', 'mssql', 'mariadb', 'sqlite', 'elasticsearch', 'mongodb', 'dynamodb', 'redis'] >>> mongodb = ligo.connect('mongodb') >>> df = mongodb.read_as_dataframe(database='reviewdb',collection='reviews',return_type='pandas') # Default return_type is pandas >>> df.head() _id Review 064272bb06a14f52787e0a09egood and interesting 164272bb06a14f52787e0a09fThis class is very helpful to me. Currently, I... 264272bb06a14f52787e0a0a0like!Prof and TAs are helpful and the discussi... 364272bb06a14f52787e0a0a1Easy to follow and includes a lot basic and im... 464272bb06a14f52787e0a0a2Really nice teacher!I could got the point eazl... >>> classifier = pipeline("sentiment-analysis") >>> reviews = df.Review.tolist() >>> results = classifier(reviews,truncation=True) >>> for result in results: >>> print(f"label: {result['label']}, with score: {round(result['score'], 4)}") label: POSITIVE, with score: 0.9999 label: POSITIVE, with score: 0.9997 label: POSITIVE, with score: 0.9999 label: POSITIVE, with score: 0.999 label: POSITIVE, with score: 0.9967 >>> df['predicted_label'] = [result['label'] for result in results] >>> df['predicted_score'] = [round(result['score'], 4) for result in results] # Write the results to the MongoDB >>> mongodb.write_dataframe(df,'reviewdb','review_sentiments')| Data Sources | Type | pandas | polars | dask |
|---|---|---|---|---|
| S3 | datalake |
|
|
|
| GCS | datalake |
|
|
|
| Azure Blob Storage | datalake |
|
|
|
| Snowflake | datawarehouse |
|
|
|
| BigQuery | datawarehouse |
|
|
|
| StarRocks | datawarehouse |
|
|
|
| Redshift | datawarehouse |
|
|
|
| PostgreSQL | database |
|
|
|
| MySQL | database |
|
|
|
| MariaDB | database |
|
|
|
| MsSQL | database |
|
|
|
| Oracle | database |
|
|
|
| SQLite | database |
|
|
|
| MongoDB | nosql |
|
|
|
| ElasticSearch | nosql |
|
|
|
| DynamoDB | nosql |
|
|
|
| Redis(beta) | nosql |
|
|
|
Some functionalities of DataLigo are inspired by the following packages.
-
DataLigo used Connectorx to read data from most of the RDBMS databases to utilize the performance benefits and inspired the return_type parameter from it
-
DataLigo used dynamo-pandas to read and write data from DynamoDB

