Haibo Ding
Senior Applied Scientist & Science Manager · Agentic AI · LLMs · Evaluation
I am a Senior Applied Scientist and Science Manager at AWS AI Labs, where I conduct research and build products on agentic AI systems and large language models (LLMs), with a focus on tool-using agents and LLM/agent evaluation.
My research asks: How can we build agents that use tools effectively, while remaining measurable, reliable, and robust? This motivates work on tool selection and tool-use optimization, and on offline/online evaluation methods that measures LLM outputs, tool calls, and agent trajectories.
Current research topics include LLM / agent evaluation — training evaluator / judge models for trajectory- and tool-level assessment; designing offline (build-time) evaluation with metric selection, evaluation datasets, and automated evaluation pipelines; and developing online evaluation for efficient performance monitoring and continuous quality measurement. I also work on tool selection and optimization, including improving tool selection accuracy, tool retrieval modeling, and tool description optimization.
Background: Previously, I was a Senior Research Scientist at Bosch Research, where I built ML/NLP solutions for dialogue sentiment analysis, customer service understanding, document understanding, and knowledge extraction. I received my Ph.D. in Computer Science from the University of Utah, where I worked on semi-supervised learning and natural language processing.
News
| Dec 03, 2025 | Open-sourced Agent-EvalKit — an AI assistant toolkit for build-time agent evaluation. |
|---|---|
| Dec 02, 2025 | Launched Amazon Bedrock AgentCore Evaluations (preview) for agent performance monitoring. |
| Aug 04, 2025 | Organized the KDD Workshop on Automatic Prompt Optimization |
Selected Publications
- ArXiv
- ArXiv
- EMNLP