When a data application fails, the first question is usually:
What happened?
Without proper logging, finding the answer can be difficult. A data pipeline may fail overnight, an API request may return unexpected results, or a machine learning model may produce incorrect predictions. If no logs exist, troubleshooting becomes a frustrating guessing game.
This is why logging is one of the most important practices in professional data projects.
Good logging helps data teams monitor applications, identify failures, debug issues faster, and understand how systems behave in production.
In this guide, you’ll learn what logging is, why it matters, and the best practices for implementing logging in Python data applications.
What Is Logging?
Python logging is the practice of recording application events, errors, warnings, and operational information. Effective logging improves monitoring, debugging, auditing, and production reliability in data pipelines, analytics systems, and machine learning applications.
Logging is the process of recording information about an application’s execution.
Instead of:
print("Data loaded")
use:
import logging
logging.info(
"Data loaded"
)
The information is stored in structured log records that can be reviewed later.
Why Logging Matters in Data Applications
Data systems often run without direct user interaction.
Examples include:
- ETL pipelines
- Scheduled jobs
- Machine learning workflows
- Data ingestion systems
- Analytics platforms
When something fails at 2 AM, logs become the primary source of information.
Good logs help answer questions such as:
- Did the pipeline start?
- Which file failed?
- How many records were processed?
- Which API request timed out?
- When did the error occur?
The Problem with Print Statements
Many beginners rely on:
print("Pipeline started")
Although useful during development, print statements have limitations.
They:
- Lack timestamps
- Cannot be filtered easily
- Are difficult to centralize
- Do not support severity levels
Professional applications should use the logging module instead.
Using Python’s Logging Module
Basic example:
import logging
logging.basicConfig(
level=logging.INFO
)
logging.info(
"Pipeline started"
)
Output:
INFO:root:Pipeline started
This provides a foundation for production logging.
Understanding Log Levels
Python supports several log levels.
DEBUG
Detailed information for troubleshooting.
logging.debug(
"Reading CSV file"
)
INFO
Normal operational messages.
logging.info(
"Pipeline completed"
)
WARNING
Unexpected but non-critical situations.
logging.warning(
"Missing values detected"
)
ERROR
An operation failed.
logging.error(
"Database connection failed"
)
CRITICAL
Severe failures requiring immediate attention.
logging.critical(
"Pipeline terminated"
)
Recommended Logging Configuration
A simple configuration:
import logging
logging.basicConfig(
level=logging.INFO,
format=(
"%(asctime)s "
"%(levelname)s "
"%(message)s"
)
)
Example output:
2026-06-08 08:00:00
INFO Pipeline Started
Timestamps are essential for troubleshooting.
Log Important Pipeline Events
Every pipeline should log major stages.
Example:
logging.info(
"Extract phase started"
)
logging.info(
"Transform phase started"
)
logging.info(
"Load phase started"
)
This makes it easier to identify where failures occur.
Log Record Counts
Data engineers often track processing volume.
Example:
logging.info(
f"Processed {rows} rows"
)
Output:
Processed 1,250,000 rows
This helps validate pipeline execution.
Log File Information
When processing files:
logging.info(
f"Reading {filename}"
)
Example output:
Reading sales_2026.csv
This simplifies troubleshooting.
Log Exceptions Properly
Avoid:
except:
print("Error")
Instead:
except Exception as e:
logging.exception(
"Pipeline failed"
)
Output includes:
- Error message
- Stack trace
- Failure location
This is far more useful for debugging.
Use Structured Logging
Instead of vague messages:
logging.info("Done")
Use descriptive logs:
logging.info(
"Loaded 50,000 records "
"from customer table"
)
Detailed logs provide valuable context.
Avoid Logging Sensitive Information
Never log:
- Passwords
- API keys
- Authentication tokens
- Personally identifiable information
- Credit card details
Bad example:
logging.info(api_key)
Protect sensitive data at all times.
Write Logs to Files
Store logs persistently.
Example:
logging.basicConfig(
filename="pipeline.log",
level=logging.INFO
)
Logs remain available even after the application exits.
Rotate Log Files
Long-running systems generate large log files.
Use rotating logs:
from logging.handlers \
import RotatingFileHandler
Benefits:
- Prevents oversized log files
- Conserves disk space
- Improves maintainability
Use Separate Loggers
Large applications should avoid relying solely on the root logger.
Example:
logger = logging.getLogger(
"etl_pipeline"
)
Benefits include:
- Better organization
- Module-specific logging
- Easier troubleshooting
Logging in ETL Pipelines
Example workflow:
Extract
Transform
Load
Recommended logs:
logger.info(
"Extract started"
)
logger.info(
"Rows extracted: 500000"
)
logger.info(
"Load completed"
)
These logs create a clear execution history.
Logging API Requests
Data applications often rely on APIs.
Example:
logger.info(
f"Calling {endpoint}"
)
Error example:
logger.error(
f"API returned {status}"
)
This helps identify integration issues.
Logging Machine Learning Workflows
Machine learning projects benefit from logging:
- Training duration
- Dataset size
- Evaluation metrics
- Model versions
Example:
logger.info(
f"Accuracy: {score}"
)
Logs provide a useful audit trail.
Centralized Logging
Large organizations often aggregate logs using platforms such as:
- Elastic Stack
- Splunk
- Datadog
- Grafana
Centralized logging enables monitoring across many applications.
Monitoring Data Pipelines
Logs support operational monitoring.
Examples include:
- Pipeline failures
- Missing files
- Slow jobs
- Unexpected row counts
- API outages
Well-designed logs improve observability significantly.
Common Logging Mistakes
Logging Too Little
Missing logs make troubleshooting difficult.
Logging Too Much
Excessive logs create noise.
Using Only Print Statements
Print statements are not suitable for production systems.
Logging Sensitive Data
This creates security and compliance risks.
Ignoring Log Levels
Everything should not be logged as INFO.
Recommended Logging Checklist
For every data application:
Log pipeline start and end times
Log record counts
Log file names processed
Log API interactions
Log exceptions with stack traces
Use timestamps
Use appropriate log levels
Store logs in files or centralized systems
Avoid sensitive information
Monitor logs regularly
Real-World Example
Imagine a nightly ETL pipeline processing customer transactions.
Without logging:
Pipeline Failed
No further information exists.
With logging:
01:00 Extract Started
01:02 File Loaded
01:05 Transformation Started
01:08 Database Connection Failed
The issue can be identified immediately.
This demonstrates why logging is essential in production systems.
Logging is one of the most valuable practices in professional data applications. It provides visibility into how systems operate, simplifies debugging, supports monitoring, and helps teams respond quickly when issues occur.
By using Python’s logging module, choosing appropriate log levels, recording meaningful operational events, and avoiding sensitive information, data professionals can build more reliable and maintainable systems.
Whether you’re developing ETL pipelines, analytics workflows, or machine learning applications, strong logging practices are a critical part of production-ready software.
FAQ
What is logging in Python?
Logging is the process of recording application events, errors, warnings, and operational information.
Why is logging important in data engineering?
Logging helps monitor pipelines, troubleshoot failures, and understand system behavior.
Should I use print statements instead of logging?
Print statements are useful during development, but production applications should use the logging module.
What log level should I use?
Use DEBUG for troubleshooting, INFO for normal events, WARNING for unusual situations, ERROR for failures, and CRITICAL for severe issues.
What should never be logged?
Passwords, API keys, authentication tokens, and other sensitive information should never be written to logs.