1

I am new to Airflow, and I am trying to create a Python pipeline scheduling automation process. My project youtubecollection01 utilizes custom created modules, so when I run the DAG it fails with ModuleNotFoundError: No module named 'Authentication'.

This is how my project is structured:

Directory Structure

This is my dag file:

# This to intialize the file as a dag file from airflow import DAG from datetime import datetime, timedelta from airflow.operators.python import PythonOperator # from airflow.utils.dates import days_ago from youtubecollectiontier01.src.__main__ import main default_args = { 'owner': 'airflow', 'depends_on_past': False, # 'start_date': days_ago(1), 'email': ['[email protected]'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), # 'priority_weight': 10, # 'end_date': datetime(2016, 1, 1), } # curate dag with DAG('collect_layer_01', start_date=datetime(2022,7,25), schedule_interval='@daily', catchup=False, default_args=default_args) as dag: curate = PythonOperator( task_id='collect_tier_01', # name for the task you would like to execute python_callable=main, # the name of your python function provide_context=True, dag=dag)

I am importing main function from the __main__.py, however inside the main I am importing other classes such as Authentication.py, ChannelClass.py, Common.py and that's where Airflow is not recognizing.

Airflow Failure Log

Why it is failing for the imports, is it a directory issue or an Airflow issue? I tried moving the project under plugins and run it, but it did not work, any feedback would be highly appreciated!

Thank you!

1 Answer 1

1

Up until the last part, you got everything setup according to the tutorials! Also, thank you for a well documented question.

If you have not changed the PYTHON_PATH for airflow, you can try the following to get the default with:

$ airflow info 

In the paths info part, you get "airflow_home", "system_path", "python_path" and "airflow_on_path".

Now within the "python_path", you'll basically see that, airflow is set up so that it will check everything inside /dags, /plugins and /config folder.

More about this topic in documents called "Module Management"


Now, I think, the problem with your code can be fixed with a little change.

In your main code you import:

from Authentication import Authentication 

in a default setup, Airflow doesn't know where that is!

If you import it this way:

from youtubecollectiontier01.src.Authentication import Authentication 

Just like the one you did in the DAG file. I believe it will work. Same goes for the other classes you have ChannelClass, Common, etc.

Waiting to hear from you!

Sign up to request clarification or add additional context in comments.

7 Comments

Thank you so much for the help! This worked for me perfectly, though I have a question regarding the imports of other python modules such as AWS boto3. It still gives me an error when I write import boto3 although I have installed the requirments.txt, how can I fix that?
You are welcome. Are you using a venv or conda environment to store your code? If you are using a conda environment for example, you can check all the installed libraries with $ pip list or $ conda list . If boto3 is not on there, you can install it with $ pip install boto3 .
No, I am operating on docker as of now. Airflow Docker using Docker-Compose, so when I have it installed using pip I can see it under this path /home/maryam-airflow-01/.local/lib/python2.7/site-packages/boto3, but when I import it, it seems it is not recognizing it, any idea why? or is there a best practice I should follow in regards to installed modules?
Okay, than you need to install all the libraries from "requirement.txt" inside your airflow container. This does not mean the local installation you have at your PC. For example, you can take a look on this document. You'll see that all the docker containers are using an image. For your specific needs, you can build a container with other packages installed as well, using a Dockerfile.
Thank you so much, I tried doing it with the Dockerfile but it didn't work for me, so I have added all the pip installs I need inside my docker-compose file under this ` _PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-pandas numpy google-api-python-client beautifulsoup4 mysql-connector-python}` and then it worked. Thanks a lot for the help!
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.