ABDULHAMID A. ADEBAYO

Senior Research Scientist, LLM & Cybersecurity Specialist
New York, US.

About

Highly accomplished Senior Research Scientist and technical leader with a unique blend of expertise in LLM customization, data integrity, and cybersecurity research. Proven ability to transform cutting-edge AI research into practical applications, specifically in code and compliance domains, while leading diverse teams across multiple time zones to deliver high-impact solutions.

Work

IBM T.J. Watson Research Lab
|

Lead, Data Model Customization for IT Automation and Code

New York, United States, US

Summary

Leading advanced large language model customization for IT automation, driving innovation in agentic use cases and domain-specific code generation.

Highlights

Designed and implemented reinforcement learning-guided supervised fine-tuning to customize large language models for complex agentic IT automation use cases.

Developed and deployed continuous pre-training and post-training pipelines, integrating supervised fine-tuning and reinforcement learning, to customize base models for domain-specific applications like unit test generation.

Engineered and maintained robust automation tools, optimizing model training workflows within high-performance GPU environments.

Spearheaded the data curation and pretraining of the Granite 165M model, serving as the foundational base for granite-docling-258M, which achieved superior multilingual document parsing performance against smol-LM benchmarks.

IBM T.J. Watson Research Lab
|

Lead, Data Quality, Processing and Operations

New York, United States, US

Summary

Directed data quality, processing, and operational initiatives, significantly enhancing engineering efficiency and data pipeline automation for LLM training.

Highlights

Pioneered novel data processing and filtering methodologies, validated by ablation studies, which boosted engineering efficiency and yielded high-quality tokens for LLM training.

Streamlined the end-to-end data pipeline from acquisition to token delivery, reducing human intervention by up to 80% and accelerating LLM development cycles.

Developed and curated critical pre-training datasets, including GneissWeb, specifically designed for advanced language and code models.

Managed and optimized large-scale CPU clusters, ensuring efficient and reliable data processing operations for LLM development.

IBM T.J. Watson Research Lab
|

Technical Assistant to the Vice President of AI Research

New York, United States, US

Summary

Provided critical strategic and technical support to the VP of AI Research, facilitating cross-functional alignment and executive communication.

Highlights

Provided strategic support to the VP of AI Research, contributing to the planning and execution of key initiatives, quarterly roadmap development, and cross-functional project alignment.

Served as a central liaison, enhancing communication and collaboration between senior management, research scientists, and product teams to ensure timely achievement of project milestones.

Authored comprehensive technical summaries, progress reports, and executive presentations, effectively translating complex AI research into actionable insights for leadership.

IBM T.J. Watson Research Lab
|

Researcher - AI and Security

New York, United States, US

Summary

Conducted advanced research at the intersection of AI and cybersecurity, developing tools and methodologies for compliance, risk assessment, and zero-trust security.

Highlights

Developed an AI-powered tool that automated the mapping of security requirements to industry regulations and standards, including NIST 800-53, HIPAA, and PCI.

Automated risk assessments for Cloud and Edge computing resources, leveraging frameworks such as NIST 800-37, FAIR, and CIS RAM to enhance security posture.

Engineered advanced tools and proof-of-concepts for proactive threat hunting, integrating technologies like Kestrel, Stix-Shifter, and Stix-Extensions.

Architected and implemented a trusted service mesh, significantly advancing Zero-Trust Security principles and infrastructure.

Wilson Consulting Group
|

Security Consultant

Maryland, United States, US

Summary

Managed and executed information security projects, including audits, assessments, and training, ensuring compliance with industry standards.

Highlights

Led and managed multiple information security management system (ISMS) projects, ensuring strict adherence to ISO 27001 standards.

Conducted comprehensive security audits, including penetration tests and security control assessments, alongside privacy compliance evaluations for GDPR and CCPA.

Developed and delivered engaging security awareness and training programs, enhancing organizational security posture and employee compliance.

Jethro Limited
|

Software Consultant

Ibadan, Nigeria, Nigeria

Summary

Developed and optimized software solutions for financial services, focusing on API development, data management, and analytical tools.

Highlights

Developed robust APIs and automated routines to programmatically manage accounting transactions, data backups, and report generation for a core banking application, enhancing operational efficiency.

Created interactive dashboards and analytical tools, providing critical insights to optimize financial services operations.

Education

Howard University
Washington, DC, United States of America

PhD

Computer Science

University of Ibadan
Ibadan, Nigeria, Nigeria

M.Sc.

Computer Science

Fountain University
Osogbo, Nigeria, Nigeria

B.Sc.

Computer Science

Awards

IBM Research Outstanding Accomplishment Award for Granite 3.0 release

Awarded By

IBM Research

Recognized for outstanding contributions to the Granite 3.0 release, a significant achievement in AI research and development.

IBM Research Accomplishment Award for Data-prep-kit: One-Stop Solution for Data Preparation

Awarded By

IBM Research

Awarded for developing Data-prep-kit, a comprehensive solution for data preparation, demonstrating significant impact on research efficiency and quality.

Publications

Gneissweb: Preparing high quality data for LLMs at scale.

Published by

arXiv

Summary

Co-authored a preprint detailing Gneissweb, a methodology for preparing high-quality data essential for large-scale LLM training and deployment.

Towards Continuous Integrity Attestation and Its Challenges in Practice: A Case Study of Keylime

Published by

IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)

Summary

Co-authored a paper discussing continuous integrity attestation challenges and solutions, exemplified by a Keylime case study, presented at a major international conference.

Data-Prep-Kit: getting your data ready for LLM application development.

Published by

IEEE International Conference on Big Data (BigData)

Summary

Co-authored a publication on Data-Prep-Kit, a comprehensive solution for streamlining data readiness for LLM application development, presented at a leading IEEE conference.

Partially Trusting the Service Mesh Control Plane.

Published by

arXiv

Summary

Co-authored a preprint exploring the concept of partially trusting the service mesh control plane, contributing to advancements in secure distributed systems.

Vulnerability prioritization: An offensive security approach.

Published by

arXiv

Summary

Co-authored a preprint proposing an offensive security approach to vulnerability prioritization, enhancing proactive defense strategies.

Automated compliance blueprint optimization with artificial intelligence.

Published by

arXiv

Summary

Co-authored a preprint on leveraging artificial intelligence to optimize compliance blueprints, demonstrating innovation in automated regulatory adherence.

Skills

LLM Customization & Development

Reinforcement Learning, Supervised Fine-tuning, Continuous Pre-training, Post-training, Agentic Use Cases, Domain-Specific Models, Granite 165M, granite-docling-258M, smol-LM, Unit Test Generation.

Data Science & Engineering

Data Processing, Data Filtering, Data Curation, Pre-training Datasets, GneissWeb, Token Delivery, GPU Environments, CPU Clusters, Ablation Studies, Data Pipeline Optimization.

Cybersecurity & Compliance

AI for Security, NIST 800-53, HIPAA, PCI, NIST 800-37, FAIR, CIS RAM, Risk Assessment, Cloud Security, Edge Computing Security, Threat Hunting, Kestrel, Stix-Shifter, Stix-Extensions, Zero-Trust Security, Service Mesh, ISO 27001, Penetration Testing, Security Control Audits, GDPR, CCPA, Security Awareness Training, Vulnerability Prioritization, Compliance Automation.

Software Development & Automation

APIs, Python, Software Development, Automation Tools, Core Banking Application Software, Dashboard Development, Analytical Tools.

Leadership & Project Management

Strategic Planning, Research Initiatives Management, Roadmap Development, Cross-functional Project Alignment, Stakeholder Management, Technical Communication, Executive Reporting, Team Leadership.