Allan Hoang

Senior Data Engineer
Pembroke Pines, US.

About

Results-driven Senior Data Engineer with 5+ years of expertise in architecting, optimizing, and deploying scalable data pipelines and real-time solutions across diverse cloud environments. Proven leader in leveraging AWS, Databricks, Apache Spark, and Python to build robust ETL workflows and advanced data architectures. Specializes in transforming complex data into actionable insights using dbt, Power BI, and Tableau, while ensuring HIPAA compliance and driving significant operational efficiencies, including a 15% reduction in cloud costs.

Work

Conde Nast
|

Senior Data Engineer

New York, NY, US

Summary

Designed, optimized, and led data engineering initiatives for large-scale data platforms, delivering real-time solutions and enhancing operational efficiency.

Highlights

Designed and maintained robust data pipelines using Databricks, PySpark, and Python, ingesting and transforming large-scale data from marketing, subscriptions, commerce, and social media platforms to ensure high data accuracy and availability for downstream analytics.

Led end-to-end deployment and optimization of complex data workflows for critical business units (marketing, science, privacy, legal), leveraging Airflow for orchestration and AWS services (S3, EC2, Lambda, Glue) and Kafka for real-time data streaming.

Engineered scalable AWS cloud infrastructure using Terraform for Infrastructure as Code (IaC), automating resource provisioning to significantly improve team productivity and reduce manual configuration overhead.

Achieved a 15% annual reduction in cloud operational costs by fine-tuning AWS and Databricks resource usage, optimizing Spark jobs, and enhancing storage efficiency for improved scalability.

Mentored junior data engineers on best practices in pipeline development, debugging, and Databricks usage, fostering team skill growth and accelerating overall development velocity.

Developed and deployed dbt Cloud and PostgreSQL data models, integrating them into Power BI for self-service analytics using DAX and Power Query, enhancing data accessibility and visualization for marketing, product, and legal stakeholders.

Clearsense, LLC
|

Data Engineer

Jacksonville, FL, US

Summary

Designed and implemented scalable ETL pipelines for healthcare data, leveraging AWS and Apache Iceberg to enable real-time analytics and reporting.

Highlights

Designed and implemented scalable ETL pipelines using AWS (S3, RDS, DynamoDB, Lambda) and Java to ingest, transform, and load large-scale patient, clinical, and operational data from diverse healthcare systems into an Apache Iceberg-powered data lakehouse, enabling real-time analytics.

Engineered end-to-end data pipelines for patient health records, claims, and medical imaging data, utilizing Apache Spark, Python, and Java for data normalization, enrichment, and cleansing, ensuring HIPAA/HITRUST compliance and high data quality.

Optimized AWS cloud infrastructure using Terraform (IaC) to automate secure, scalable data storage provisioning, ensuring HIPAA compliance for sensitive healthcare data.

Developed predictive analytics models to forecast healthcare trends (e.g., patient admission rates, disease progression, medication adherence), enhancing clinical outcomes and operational efficiency for healthcare providers.

Implemented near-real-time data feeds into RevealCS healthcare data lakehouse using Apache Kafka for streaming data from IoT medical devices, patient monitoring systems, and electronic prescriptions, ensuring low-latency access for clinical decision support.

Florida Blue
|

IT Developer

Jacksonville, FL, US

Summary

Designed interactive data visualizations and developed robust data analysis workflows for claims processing and member data to drive data-driven decisions.

Highlights

Designed and implemented interactive data visualizations using Power BI and SSRS to display key metrics for claims processing and member data, enabling real-time, data-driven decision-making and KPI tracking for leadership.

Developed data analysis workflows using SQL Server and Azure Data Factory for ETL of large datasets from internal systems, ensuring high data quality for reporting and predictive analytics.

Optimized and automated reporting processes by integrating Power BI dashboards with Azure SQL Database, providing customizable, real-time insights into product performance and user behavior.

Applied statistical analysis techniques within SQL and Power BI to identify trends and anomalies in large datasets, predicting customer behavior, assessing claims processing efficiency, and providing actionable insights for continuous improvement.

Collaborated with business analysts and product owners to ensure data integrity and visualization accuracy, building intuitive dashboards that visualized member engagement, claim status, and operational bottlenecks, leading to improved business processes.

Ernst & Young
|

Data Analytics Associate

New York, NY, US

Summary

Developed and optimized end-to-end data pipelines and built analytical reports for clients across multiple sectors, driving enhanced decision-making.

Highlights

Developed and optimized end-to-end data pipelines using SQL, Informatica, and Teradata, transforming raw business data into actionable insights through data aggregation, normalization, and transformation for diverse clients.

Built and maintained dynamic dashboards and analytical reports using Tableau and Power BI, applying descriptive and trend analysis to provide real-time data visualizations, achieving a 30% reduction in time to insights and reporting.

Applied advanced data analytics techniques, including data profiling, anomaly detection, and statistical validation with SQL, to identify and rectify data discrepancies, resulting in a 20% increase in data accuracy and reliability for key business metrics.

Education

Colorado State University
Fort Collins, CO, United States of America

Master's degree

Data Analytics

The University of Central Florida
Orlando, FL, United States of America

Bachelor's degree

Information Technology

Certificates

Astronomer Certification for Apache Airflow 2 Fundamentals

Issued By

Astronomer

CSU Global Academic Specialization in Applied Data Analytics

Issued By

SAS

CSU Global Academic Specialization in Business Intelligence and Performance Management

Issued By

SAS

Skills

Version Control & CI/CD

Git, GitHub Actions, Jenkins (via Databricks).

Programming Languages

Python, Java, SQL.

Cloud Platforms

AWS (EC2, S3, Lambda, Glue, RDS), Azure (Azure SQL Database, Azure Data Factory).

Data Engineering

Databricks, Apache Spark, PySpark, Apache Airflow, Apache Kafka.

Databases

PostgreSQL, Microsoft SQL Server, Amazon Redshift.

Infrastructure as Code (IaC)

Terraform.

Data Pipelines & ETL

dbt, AWS Glue, Azure Data Factory, Apache Kafka.

Data Warehousing & Lakes

Snowflake, Delta Lake, Iceberg.

Data Visualization

Power BI, Tableau, Mode.