AI in Travel
Build ETL Pipelines for Data Science Workflows in About 30 Lines of Python
Image by Author | Ideogram
You know that feeling when you have data scattered across different formats and sources, and you need to make sense of it all? That’s exactly what we’re solving today. Let’s build an ETL pipeline that takes messy data and turns it into something actually useful.
In this article, I’ll walk you through creating a pipeline that processes e-commerce transactions. Nothing fancy, just practical code that gets the job done.
We’ll grab data from a CSV file (like you’d download from an e-commerce platform), clean it up, and store it in a proper database for analysis.
What Is an Extract, Transform, Load (ETL) Pipeline?
Every ETL pipeline follows the same pattern. You grab data from somewhere (Extract), clean it up and make it better (Transform), then put it somewhere useful (Load).
ETL Pipeline | Image by Author | diagrams.net (draw.io)
The process begins with the extract phase, where data is retrieved from various source systems such as databases, APIs, files, or streaming platforms. During this phase, the pipeline identifies and pulls relevant data while maintaining connections to disparate systems that may operate on different schedules and formats.
Next the transform phase represents the core processing stage, where extracted data undergoes cleaning, validation, and restructuring. This step addresses data quality issues, applies business rules, performs calculations, and converts data into the required format and structure. Common transformations include data type conversions, field mapping, aggregations, and the removal of duplicates or invalid records.
Finally, the load phase transfers the now transformed data into the target system. This step can occur through full loads, where entire datasets are replaced, or incremental loads, where only new or changed data is added. The loading strategy depends on factors such as data volume, system performance requirements, and business needs.
Step 1: Extract
The “extract” step is where we get our hands on data. In the real world, you might be downloading this CSV from your e-commerce platform’s reporting dashboard, pulling it from an FTP server, or getting it via API. Here, we’re reading from an available CSV file.
def extract_data_from_csv(csv_file_path):
try:
print(f"Extracting data from {csv_file_path}...")
df = pd.read_csv(csv_file_path)
print(f"Successfully extracted {len(df)} records")
return df
except FileNotFoundError:
print(f"Error: {csv_file_path} not found. Creating sample data...")
csv_file = create_sample_csv_data()
return pd.read_csv(csv_file)
Now that we have the raw data from its source (raw_transactions.csv), we need to transform it into something usable.
Step 2: Transform
This is where we make the data actually useful.
def transform_data(df):
print("Transforming data...")
df_clean = df.copy()
# Remove records with missing emails
initial_count = len(df_clean)
df_clean = df_clean.dropna(subset=['customer_email'])
removed_count = initial_count - len(df_clean)
print(f"Removed {removed_count} records with missing emails")
# Calculate derived fields
df_clean['total_amount'] = df_clean['price'] * df_clean['quantity']
# Extract date components
df_clean['transaction_date'] = pd.to_datetime(df_clean['transaction_date'])
df_clean['year'] = df_clean['transaction_date'].dt.year
df_clean['month'] = df_clean['transaction_date'].dt.month
df_clean['day_of_week'] = df_clean['transaction_date'].dt.day_name()
# Create customer segments
df_clean['customer_segment'] = pd.cut(df_clean['total_amount'],
bins=[0, 50, 200, float('inf')],
labels=['Low', 'Medium', 'High'])
return df_clean
First, we’re dropping rows with missing emails because incomplete customer data isn’t helpful for most analyses.
Then we calculate total_amount
by multiplying price and quantity. This seems obvious, but you’d be surprised how often derived fields like this are missing from raw data.
The date extraction is really handy. Instead of just having a timestamp, now we have separate year, month, and day-of-week columns. This makes it easy to analyze patterns like “do we sell more on weekends?”
The customer segmentation using pd.cut()
can be particularly useful. It automatically buckets customers into spending categories. Now instead of just having transaction amounts, we have meaningful business segments.
Step 3: Load
In a real project, you might be loading into a database, sending to an API, or pushing to cloud storage.
Here, we’re loading our clean data into a proper SQLite database.
def load_data_to_sqlite(df, db_name="ecommerce_data.db", table_name="transactions"):
print(f"Loading data to SQLite database '{db_name}'...")
conn = sqlite3.connect(db_name)
try:
df.to_sql(table_name, conn, if_exists="replace", index=False)
cursor = conn.cursor()
cursor.execute(f"SELECT COUNT(*) FROM {table_name}")
record_count = cursor.fetchone()[0]
print(f"Successfully loaded {record_count} records to '{table_name}' table")
return f"Data successfully loaded to {db_name}"
finally:
conn.close()
Now analysts can run SQL queries, connect BI tools, and actually use this data for decision-making.
SQLite works well for this because it’s lightweight, requires no setup, and creates a single file you can easily share or backup. The if_exists="replace"
parameter means you can run this pipeline multiple times without worrying about duplicate data.
We’ve added verification steps so you know the load was successful. There’s nothing worse than thinking your data is safely stored only to find an empty table later.
Running the ETL Pipeline
This orchestrates the entire extract, transform, load workflow.
def run_etl_pipeline():
print("Starting ETL Pipeline...")
# Extract
raw_data = extract_data_from_csv('raw_transactions.csv')
# Transform
transformed_data = transform_data(raw_data)
# Load
load_result = load_data_to_sqlite(transformed_data)
print("ETL Pipeline completed successfully!")
return transformed_data
Notice how this ties everything together. Extract, transform, load, done. You can run this and immediately see your processed data.
You can find the complete code on GitHub.
Wrapping Up
This pipeline takes raw transaction data and turns it into something an analyst or data scientist can actually work with. You’ve got clean records, calculated fields, and meaningful segments.
Each function does one thing well, and you can easily modify or extend any part without breaking the rest.
Now try running it yourself. Also try to modify it for another use case. Happy coding!
Bala Priya C is a developer and technical writer from India. She likes working at the intersection of math, programming, data science, and content creation. Her areas of interest and expertise include DevOps, data science, and natural language processing. She enjoys reading, writing, coding, and coffee! Currently, she’s working on learning and sharing her knowledge with the developer community by authoring tutorials, how-to guides, opinion pieces, and more. Bala also creates engaging resource overviews and coding tutorials.
AI in Travel
India-France Partnership to Build Drones for Defence and Global Exports
RRP Defence (RRP Group), through its entity Vimananu, has entered into a strategic partnership with the Franco-American firm CYGR to establish an advanced drone manufacturing facility in India.
The project, based in Navi Mumbai, aims to support India’s ‘Make in India’ initiative by developing unmanned aerial vehicles (UAVs) for defence, surveillance, and industrial applications.
The collaboration will manufacture three categories of drones: hand-launched fixed-wing drones for field operations, compact nano drones for close-range use, and ISR drones tailored for intelligence, surveillance, and reconnaissance missions.
Production is expected to start with hundreds of units annually, with an initial contract valued at over $20 million.
Rajendra Chodankar, chairman of RRP Defence, said, “This collaboration is a defining moment for India’s UAV ecosystem. By combining our local manufacturing strength and field understanding with CYGR’s world-class drone technologies, we’re building systems that meet India’s unique operational needs.”
The facility will also contribute to high-skill employment and export-focused manufacturing, further positioning India as a key player in the global UAV supply chain.
The initiative strengthens India’s self-reliance in aerospace and defence technology. Zaynah, the global advisor in Make in India global exports in defence for the collaboration, confirmed that an immediate Letter of Intent (LoI) is being released as part of the $20 million contract for global defence exports.
George El Aily, director of CYGR France, added, “India is a key strategic partner for us. Through this collaboration with RRP Defence Ltd, we are not only transferring technology but also co-developing future-ready solutions that support India’s defence and surveillance landscape.”
The project will focus on sectors including defence, homeland security, and industrial monitoring. The companies aim to deliver solutions customised for India’s operational environments while expanding global market access through technology localisation.
This joint venture marks a step forward in India’s ambition to become a global drone hub through international cooperation and indigenous expertise.
AI in Travel
Figure AI Unveils In-House Battery for F03 Humanoid with 5-Hour Runtime
Figure AI, a California-based robotics company, introduced its third-generation battery for its latest F03 humanoid robot, marking a significant step in its in-house hardware development.
The new battery offers 2.3 kWh of energy, supports five hours of peak performance runtime, and includes built-in safety systems to prevent thermal runaway. The battery is manufactured entirely in-house at Figure’s BotQ facility, which is now ramping up production capabilities.
Today we’re introducing our next generation humanoid battery for F.03
Like our actuators, vertically integrating our battery system is critical to Figure’s success
Engineered in-house and manufactured at BotQ pic.twitter.com/lw81dXZ9qO
— Figure (@Figure_robot) July 17, 2025
Founder Brett Adcock said in a LinkedIn post that Figure “succeeds when we own the full stack,” with every core system, including actuators and batteries, engineered and built internally.
The battery is the first in the humanoid robotics space to be in the process of securing both UN38.3 and UL2271 safety certifications.
The F03 battery is directly integrated into the robot’s torso, unlike the earlier external battery packs of the F01. It utilises structural components such as stamped steel and die-cast aluminium, allowing it to act as a load-bearing part of the robot and saving space and weight.
The company claims a 94% increase in energy density over its first-generation battery and a 78% cost reduction compared to the previous F.02 model. The battery’s structural and thermal design features active cooling, a flame arrestor vent, and a custom Battery Management System (BMS) that helps prevent fault conditions, such as overheating or short circuits.
The battery is manufactured using mass production techniques, including stamping, injection moulding, and die casting, which enables Figure to target production volumes of up to 12,000 humanoid units per year.
To support manufacturing scale-up at its BotQ facility, Figure has opened roles across several verticals, including software test engineering, electrical testing, manufacturing engineering, and equipment handling.
Recently, Adcock had also said that the company has tripled the team to 293 people to support manufacturing, supply chain, and fleet operations.
According to the company’s blog, Figure has collaborated with OSHA-accredited testing labs to establish safety standards for humanoid robots, as these standards did not previously exist. “We specified that the battery system of F.03 must not emit flames should a catastrophic failure of a single cell occur,” Figure said.
AI in Travel
7 Python Web Development Frameworks for Data Scientists
Image by Author | Canva
Python is widely known for its popularity among engineers and data scientists, but it’s also a favorite choice for web developers. In fact, many developers prefer Python over JavaScript for building web applications because of its simple syntax, readability, and the vast ecosystem of powerful frameworks and tools available.
Whether you are a beginner or an experienced developer, Python offers frameworks that cater to every need, from lightweight micro-frameworks that require just a few lines of code, to robust full-stack solutions packed with built-in features. Some frameworks are designed for rapid prototyping, while others focus on security, scalability, or lightning-fast performance.
In this article, we will review seven of the most popular Python web frameworks. You will discover which ones are best suited for building anything from simple websites to complex, high-traffic web applications. No matter your experience level, there is a Python framework that can help you bring your web project to life efficiently and effectively.
Python Web Development Frameworks
1. Django: The Full-Stack Powerhouse for Scalable Web Apps
Django is a robust, open-source Python framework designed for rapid development of secure and scalable web applications. With its built-in ORM, admin interface, authentication, and a vast ecosystem of reusable components, Django is ideal for building everything from simple websites to complex enterprise solutions.
Learn more: https://www.djangoproject.com/
2. Flask: The Lightweight and Flexible Microframework
Flask is a minimalist Python web framework that gives you the essentials to get started, while letting you add only what you need. It’s perfect for small to medium-sized applications, APIs, and rapid prototyping. Flask’s simplicity, flexibility, and extensive documentation make it a top choice for developers who want full control over their project’s architecture.
Learn more: https://flask.palletsprojects.com/
3. FastAPI: Modern, High-Performance APIs with Ease
FastAPI is best known for building high-performance APIs, but with Jinja templates (v2), you can also create fully-featured websites that combine both backend and frontend functionality within the same framework. Built on top of Starlette and Pydantic, FastAPI offers asynchronous support, automatic interactive documentation, and exceptional speed, making it one of the fastest Python web frameworks available.
Learn more: https://fastapi.tiangolo.com/
4. Gradio: Effortless Web Interfaces for Machine Learning
Gradio is an open-source Python framework that allows you to rapidly build and share web-based interfaces for machine learning models. It is highly popular among the machine learning community, as you can build, test, and deploy your ML web demos on Hugging Face for free in just minutes. You don’t need front-end or back-end experience; just basic Python knowledge is enough to create high-performance web demos and APIs.
Learn more: https://www.gradio.app/
5. Streamlit: Instantly Build Data Web Apps
Streamlit is designed for data scientists and engineers who want to create beautiful, interactive web apps directly from Python scripts. With its intuitive API, you can build dashboards, data visualizations, and ML model demos in minutes.No need for HTML, CSS, or JavaScript. Streamlit is perfect for rapid prototyping and sharing insights with stakeholders.
Learn more: https://streamlit.io/
6. Tornado: Scalable, Non-Blocking Web Server and Framework
Tornado is a powerful Python web framework and asynchronous networking library, designed for building scalable and high-performance web applications. Unlike traditional frameworks, Tornado uses a non-blocking network I/O, which makes it ideal for handling thousands of simultaneous connections, perfect for real-time web services like chat applications, live updates, and long polling.
Learn more: https://www.tornadoweb.org/en/stable/guide.html
7. Reflex: Pure Python Web Apps, Simplified
Reflex (formerly Pynecone) lets you build full-stack web applications using only Python, no JavaScript required. It compiles your Python code into modern web apps, handling both the frontend and backend seamlessly. Reflex is perfect for Python developers who want to create interactive, production-ready web apps without switching languages.
Learn more: https://reflex.dev/
Conclusion
FastAPI is my go-to framework for creating REST API endpoints for machine learning applications, thanks to its speed, simplicity, and production-ready features.
For sharing machine learning demos with non-technical stakeholders, Gradio is incredibly useful, allowing you to build interactive web interfaces with minimal effort.
Django stands out as a robust, full-featured framework that lets you build any web-related application with complete control and scalability.
If you need something lightweight and quick to set up, Flask is an excellent choice for simple web apps and prototype.
Streamlit shines when it comes to building interactive user interfaces for data apps in just minutes, making it perfect for rapid prototyping and visualization.
For real-time web applications that require handling thousands of simultaneous connections, Tornado is a strong option due to its non-blocking, asynchronous architecture.
Finally, Reflex is a modern framework designed for building production-ready applications that are both simple to develop and easy to deploy.
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.
-
Mergers & Acquisitions1 week ago
Amazon weighs further investment in Anthropic to deepen AI alliance
-
Mergers & Acquisitions1 week ago
How Elon Musk’s rogue Grok chatbot became a cautionary AI tale
-
Brand Stories2 weeks ago
Voice AI Startup ElevenLabs Plans to Add Hubs Around the World
-
Asia Travel Pulse2 weeks ago
Looking For Adventure In Asia? Here Are 7 Epic Destinations You Need To Experience At Least Once – Zee News
-
Mergers & Acquisitions1 week ago
UK crime agency arrests 4 people over cyber attacks on retailers
-
AI in Travel2 weeks ago
‘Will AI take my job?’ A trip to a Beijing fortune-telling bar to see what lies ahead | China
-
Mergers & Acquisitions1 week ago
EU pushes ahead with AI code of practice
-
Mergers & Acquisitions2 weeks ago
ChatGPT — the last of the great romantics
-
Mergers & Acquisitions1 week ago
Humans must remain at the heart of the AI story
-
The Travel Revolution of Our Era1 month ago
CheQin.ai Redefines Hotel Booking with Zero-Commission Model
You must be logged in to post a comment Login