Arnav Das

AI/ML Research Lead | Data Scientist
Kolkata, IN.

About

Highly accomplished AI/ML Research Lead and Data Scientist with a strong academic background (M.Sc. in Data Science) and extensive experience in developing advanced AI/ML systems, particularly in LLMs, network security, and distributed systems. Excels at leading research and development initiatives, translating complex data into actionable insights, and driving innovative solutions that significantly enhance system performance, security, and compliance. Expertise spans large-scale data analysis, quantum computing applications, and open-source contributions, positioning him as a valuable asset for cutting-edge technical leadership roles.

Work

India Internet Foundation
|

Research Fellow - Research & Development

Kolkata, West Bengal, India

Summary

Led the development of OLAP pipelines for large-scale DNS data analysis and a nation-scale distributed DNS system, enhancing performance and security.

Highlights

Spearheaded OLAP pipeline development, enabling analysis of ~12TB of raw DNS data across 700 million domains in collaboration with Internet Society and ICANN.

Directed the development of a nation-scale, distributed DNS-in-a-Box project, significantly improving performance and security for critical internet infrastructure.

India Internet Foundation
|

Research Associate - Research & Development

Kolkata, West Bengal, India

Summary

Implemented IPv6 protocol performance metrics and maintained a nationwide distributed measurement network to ensure continuous operation.

Highlights

Implemented IPv6 protocol performance and diagnostic metrics, aligning with RFC8250 standards and integrating them into the existing national measurement network.

Ensured continuous operation and reliability of a nationwide distributed measurement network, supporting critical infrastructure monitoring.

Zeron (TeamCognito Solutions Pvt. Ltd.)
|

Research Lead - Research & Development

Kolkata, West Bengal, India

Summary

Led data collection, analysis, and system development for threat analytics, enhancing visibility and actionable insights for network security.

Highlights

Collected and analyzed vast volumes of log data from 150+ network devices, leveraging OLAP pipelines to provide full system visibility and generate actionable threat insights.

Directed the end-to-end development of an External Attack Surface Management system, encompassing scanning, data ingestion, ETL, and dashboarding functionalities.

Significantly contributed to the "risk analysis" (QBER) system, a core component of the Zeron Platform, enhancing its predictive capabilities.

Zeron (TeamCognito Solutions Pvt. Ltd.)
|

Backend Lead - Product Engineering

Kolkata, West Bengal, India

Summary

Co-designed and implemented the internal architecture for the core product "Zeron", optimizing backend performance and data access patterns.

Highlights

Co-designed and implemented the internal architecture for the core "Zeron" product, optimizing backend performance, data warehousing, and data access patterns.

Managed the integration of Python-based microservices and AWS cloud infrastructure to support scalable product operations.

Enhanced data access patterns and system design for improved efficiency in data retrieval and processing.

TeamCognito Solutions Pvt. Ltd.
|

Machine Learning Lead - Technical Team

Kolkata, West Bengal, India

Summary

Delivered over 3 machine learning-based analytical systems, ensuring flawless operation and high data integrity for technical solutions.

Highlights

Successfully delivered over 3 machine learning-based analytical systems, ensuring flawless operation and high data integrity.

Applied Python, data modeling, and data visualization techniques to develop robust analytical solutions.

Utilized ETL processes, Big Data technologies, and AWS infrastructure to support scalable ML deployments.

Education

St. Xavier's University, Kolkata
Kolkata, West Bengal, India

M.Sc.

Computer Science (Specialization in Data Science)

Grade: 7.69/10 CGPA

Kazi Nazrul University
Kolkata, West Bengal, India

B.Sc.

Computer Science

Grade: 8.3/10 CGPA

Publications

Fast classical simulation of qubit-qudit hybrid systems

Published by

IEEE Software

Summary

Presented a fast classical simulation method for qubit-qudit hybrid systems, advancing the understanding of quantum computing architectures.

Proof of Fair-Chance: An Unbiased Approach to Internet Voting using Fair Chance in Proof of Stake

Published by

2025 International Conference on Artificial Intelligence & Sustainable Computing (AISC)

Summary

Presented a novel approach to internet voting, ensuring fairness and unbiased outcomes through a fair chance in Proof of Stake mechanism.

A Layered Blockchain Architecture for Unbiased and Trustworthy Internet Voting Systems

Published by

2025 IEEE 6th India Council International Subsections Conference (INDISCON)

Summary

Proposed a layered blockchain architecture designed to enhance the trustworthiness and impartiality of internet voting systems.

Quantum Circuit Synthesis of an Approximate Hybrid Kolmogorov-Arnold Network

Published by

IEEE Computer Society Annual Symposium on VLSI (ISVLSI)

Summary

Explored the synthesis of quantum circuits for an approximate hybrid Kolmogorov-Arnold Network, contributing to quantum computing advancements.

A near-term quantum simulation of the transverse field Ising model hints at glassy dynamics

Published by

The European Physical Journal Special Topics

Summary

Investigated near-term quantum simulation of the transverse field Ising model, revealing insights into glassy dynamics.

Approximated Approaches to the Vehicle Routing Problem by Graph Reduction

Published by

IEEE Access

Summary

Developed approximated approaches for the Vehicle Routing Problem utilizing graph reduction techniques to improve efficiency.

Code Generation from Natural Language Statement using Self-Repairing based T5 Model

Published by

St. Xavier's University, Kolkata

Summary

Developed a T5 model incorporating self-repair mechanisms for accurate code generation from natural language statements.

Pre-Quantum to Post-Quantum Cryptography Transition: A Journey Connecting the Security and Challenges Eras

Published by

Integration of AI, Quantum Computing, and Semiconductor Technology. IGI Global

Summary

Explored the transition from pre-quantum to post-quantum cryptography, addressing security challenges and future directions.

FragQC: An efficient quantum error reduction technique using quantum circuit fragmentation

Published by

Journal of Systems and Software, 214, 112085

Summary

Introduced FragQC, an efficient quantum error reduction technique based on quantum circuit fragmentation.

QuDiet: A Qubit-Qudit Hybrid Quantum Simulator with Benchmark Circuits

Published by

IET Quantum Communications

Summary

Developed QuDiet, a qubit-qudit hybrid quantum simulator, and evaluated its performance with benchmark circuits.

Qurzon: A Prototype for a Divide and Conquer-Based Quantum Compiler for Distributed Quantum Systems

Published by

SN Computer Science, 3(4), pp.1-14

Summary

Presented Qurzon, a prototype for a divide-and-conquer-based quantum compiler designed for distributed quantum systems.

Languages

English
Hindi
Bengali

Skills

Artificial Intelligence & Machine Learning

LLMs, Natural Language Processing (NLP), Machine Learning, Deep Learning, Anomaly Detection, Genetic Algorithms, Quantum Computing, Statistical Analysis, Data Modeling, Data Visualization.

Programming & Development

Python, SQL, Linux Kernel Driver, Event Driven Architecture, Microservices.

Cloud & DevOps

AWS, ETL, Airflow, Prefect, Tool Automation.

Data Engineering & Big Data

OLAP, DNS, DNSSEC, Clickhouse, Big Data, Data Warehouse, Data Ingestion.

System Design & Architecture

Distributed System Design, System Design, Optimization, IPv6.

Security & Threat Analytics

Network Security, Attack Surface Management, Threat Analytics, Risk Analysis.

Projects

PyMeasureDNS

Summary

Built a developer-friendly yet fast DNS resolution library for Python.

Arxiv Summarizer

Summary

Co-developed a Python library for automated fetching, parsing, and summarizing scientific documents from arXiv, streamlining research workflows.

Code Generation from Natural Language Statement using Self-Repairing based T5 Model

Summary

Designed and implemented an LLM model for code generation, integrating a 'self-repair' mechanism to enhance accuracy and reliability for complex coding tasks. This project served as a major component of a Master's thesis.

Natural Language to SQL

Summary

Engineered an NLP-based system to translate natural language queries into structured SQL, integrating with a data warehouse and supporting over 20 distinct features using open-source LLM models.

Compliance Control Mapping LLM

Summary

Led the development of an innovative compliance auditing system, leveraging LLMs and a parallel rule engine to automatically map compliance controls from complex PDF documents to specific regulations and standards, enhancing audit efficiency.

HPC Intern Project: Quantum Simulator Optimization

Summary

Analyzed and optimized an existing Quantum Simulator by leveraging distributed and high-performance computing techniques, improving simulation efficiency.

Research Intern Project: Genetic Algorithm for Circuit Fragmentation

Summary

Developed a Genetic Algorithm-based circuit fragmentation algorithm, enhancing circuit cutting strategies for complex systems.

Research Intern Project: Vehicle Routing Problem on NISQ Devices

Summary

Investigated and applied approximation approaches to solve the Vehicle Routing Problem on NISQ Devices, utilizing classical preprocessing techniques.

PyAttck Contribution

Summary

Contributed a fix to PyAttck for compatibility with the updated MITRE data format.

Network Log Anomaly Detection

Summary

Developed and analyzed a real-time anomaly detection system for network and system logs, utilizing Granger causality to identify causal relations for advanced threat analytics.

opensearch-py Contribution

Summary

Contributed support for the alerting plugin in opensearch with its low-level API for Python.

Insurance Claim Authenticity Classification

Summary

Directed the development and deployment of a full end-to-end ML system, employing a hybrid Decision Tree and Deep Learning approach to accurately classify insurance claim authenticity.

TensorFlow.NET Core Routines Contribution

Summary

Contributed to the development of core routines for TensorFlow.NET.