ChatGBD:
AI Agent Platform
for Global Burden of Disease Analysis

An intelligent, agentic system for analyzing and visualizing GBD-style epidemiological data through natural language interaction. Powered by LLMs and advanced data science tools.

Python 3.11+ FastAPI Streamlit LangChain
chat_gbd.py
# Natural language query
user_query = "What's the trend of cardiovascular DALYs in MENA from 1990โ€“2020?"

# Agent processes and answers
agent.answer_question(user_query)
# โ†’ Returns analysis + visualization
# โ†’ "Cardiovascular DALYs increased from X to Y (+Z%)"
# โ†’ Interactive trend plot generated

Powerful Features

Everything you need for intelligent epidemiological data analysis

๐Ÿค–

Autonomous Agents

Multi-agent architecture with specialized agents for data cleaning, exploration, visualization, and insights.

๐Ÿ’ฌ

Natural Language Interface

Ask questions in plain English. No SQL or complex queries needed. The agent understands and responds.

๐Ÿ“Š

Automated EDA

Intelligent exploratory data analysis with automatic detection of patterns, anomalies, and insights.

๐Ÿงน

Data Harmonization

Automatic detection and correction of data inconsistencies, missing values, and format variations.

๐Ÿ“ˆ

Interactive Visualizations

Publication-quality charts: trends, choropleths, treemaps, and regional comparisons using Plotly.

๐Ÿ”„

Agentic Autonomy

Agents reason, plan, and execute Python code autonomously to answer complex analytical questions.

See It In Action

Real examples of what ChatGBD can do

User Query:
"What are the top 5 causes of death in Eastern Mediterranean Region in 2019?"
Agent Response:

Analysis: Based on the 2019 data, the top 5 causes of death in the Eastern Mediterranean Region are:

  1. Cardiovascular diseases: 31,000 deaths
  2. Ischemic heart disease: 28,500 deaths
  3. Respiratory infections: 12,000 deaths
  4. Neoplasms: 18,500 deaths
  5. Lower respiratory infections: 9,200 deaths

The agent generated a horizontal bar chart visualizing these results and can provide detailed breakdowns by country if requested.

๐Ÿ“Š Interactive Bar Chart (Top 5 Causes)
User Query:
"Show me the trend of cardiovascular DALYs from 1990 to 2020"
Agent Response:

Trend Analysis: Cardiovascular DALYs have shown the following pattern from 1990 to 2020:

  • 1990: 45,000 DALYs
  • 2000: 49,000 DALYs (+8.9% increase)
  • 2010: 51,000 DALYs (+13.3% total increase)
  • 2020: 53,000 DALYs (+17.8% total increase)

The agent automatically generated a time-series plot showing the upward trend, with the option to break down by country or compare with other regions.

๐Ÿ“ˆ Interactive Trend Line Chart (1990-2020)
User Query:
"Compare cardiovascular disease DALYs between Iran and Syria from 2000-2020"
Agent Response:

Comparative Analysis:

Year Iran DALYs Syria DALYs Difference
2000 49,000 29,000 +20,000 (Iran)
2010 51,000 30,000 +21,000 (Iran)
2020 53,000 31,000 +22,000 (Iran)

The agent generated a comparative visualization showing both countries' trends side-by-side, with Iran consistently showing higher cardiovascular DALY burden throughout the period.

๐Ÿ“Š Comparative Line Chart (Iran vs Syria)

System Architecture

Built with modern, extensible technologies

Frontend
Streamlit UI
โ†“
Agent Layer
Data Agent Insight Agent Viz Agent
โ†“
LLM Framework
LangChain
โ†“
Data Processing
pandas numpy plotly
โ†“
API Backend
FastAPI

Core Modules

  • Data Ingestion: CSV/Excel upload with schema validation
  • Data Cleaning Agent: Autonomous preprocessing and harmonization
  • Insight Agent: Natural language question answering with code generation
  • Visualization Agent: Publication-quality chart generation
  • Exploration Agent: Automated EDA and pattern detection

Get Started in Minutes

Quick setup and you're ready to analyze GBD data

1

Install Dependencies

pip install -r requirements.txt
2

Generate Example Data

cd data && python generate_example_data.py
3

Launch Application

streamlit run app.py