Allan Hoang
Senior Data Engineer
Pembroke Pines, US.About
Results-driven Senior Data Engineer with 5+ years of expertise in architecting, optimizing, and deploying scalable data pipelines and real-time solutions across diverse cloud environments. Proven leader in leveraging AWS, Databricks, Apache Spark, and Python to build robust ETL workflows and advanced data architectures. Specializes in transforming complex data into actionable insights using dbt, Power BI, and Tableau, while ensuring HIPAA compliance and driving significant operational efficiencies, including a 15% reduction in cloud costs.
Work
Conde Nast
|Senior Data Engineer
New York, NY, US
→
Summary
Designed, optimized, and led data engineering initiatives for large-scale data platforms, delivering real-time solutions and enhancing operational efficiency.
Highlights
Designed and maintained robust data pipelines using Databricks, PySpark, and Python, ingesting and transforming large-scale data from marketing, subscriptions, commerce, and social media platforms to ensure high data accuracy and availability for downstream analytics.
Led end-to-end deployment and optimization of complex data workflows for critical business units (marketing, science, privacy, legal), leveraging Airflow for orchestration and AWS services (S3, EC2, Lambda, Glue) and Kafka for real-time data streaming.
Engineered scalable AWS cloud infrastructure using Terraform for Infrastructure as Code (IaC), automating resource provisioning to significantly improve team productivity and reduce manual configuration overhead.
Achieved a 15% annual reduction in cloud operational costs by fine-tuning AWS and Databricks resource usage, optimizing Spark jobs, and enhancing storage efficiency for improved scalability.
Mentored junior data engineers on best practices in pipeline development, debugging, and Databricks usage, fostering team skill growth and accelerating overall development velocity.
Developed and deployed dbt Cloud and PostgreSQL data models, integrating them into Power BI for self-service analytics using DAX and Power Query, enhancing data accessibility and visualization for marketing, product, and legal stakeholders.
Clearsense, LLC
|Data Engineer
Jacksonville, FL, US
→
Summary
Designed and implemented scalable ETL pipelines for healthcare data, leveraging AWS and Apache Iceberg to enable real-time analytics and reporting.
Highlights
Designed and implemented scalable ETL pipelines using AWS (S3, RDS, DynamoDB, Lambda) and Java to ingest, transform, and load large-scale patient, clinical, and operational data from diverse healthcare systems into an Apache Iceberg-powered data lakehouse, enabling real-time analytics.
Engineered end-to-end data pipelines for patient health records, claims, and medical imaging data, utilizing Apache Spark, Python, and Java for data normalization, enrichment, and cleansing, ensuring HIPAA/HITRUST compliance and high data quality.
Optimized AWS cloud infrastructure using Terraform (IaC) to automate secure, scalable data storage provisioning, ensuring HIPAA compliance for sensitive healthcare data.
Developed predictive analytics models to forecast healthcare trends (e.g., patient admission rates, disease progression, medication adherence), enhancing clinical outcomes and operational efficiency for healthcare providers.
Implemented near-real-time data feeds into RevealCS healthcare data lakehouse using Apache Kafka for streaming data from IoT medical devices, patient monitoring systems, and electronic prescriptions, ensuring low-latency access for clinical decision support.
Florida Blue
|IT Developer
Jacksonville, FL, US
→
Summary
Designed interactive data visualizations and developed robust data analysis workflows for claims processing and member data to drive data-driven decisions.
Highlights
Designed and implemented interactive data visualizations using Power BI and SSRS to display key metrics for claims processing and member data, enabling real-time, data-driven decision-making and KPI tracking for leadership.
Developed data analysis workflows using SQL Server and Azure Data Factory for ETL of large datasets from internal systems, ensuring high data quality for reporting and predictive analytics.
Optimized and automated reporting processes by integrating Power BI dashboards with Azure SQL Database, providing customizable, real-time insights into product performance and user behavior.
Applied statistical analysis techniques within SQL and Power BI to identify trends and anomalies in large datasets, predicting customer behavior, assessing claims processing efficiency, and providing actionable insights for continuous improvement.
Collaborated with business analysts and product owners to ensure data integrity and visualization accuracy, building intuitive dashboards that visualized member engagement, claim status, and operational bottlenecks, leading to improved business processes.
Ernst & Young
|Data Analytics Associate
New York, NY, US
→
Summary
Developed and optimized end-to-end data pipelines and built analytical reports for clients across multiple sectors, driving enhanced decision-making.
Highlights
Developed and optimized end-to-end data pipelines using SQL, Informatica, and Teradata, transforming raw business data into actionable insights through data aggregation, normalization, and transformation for diverse clients.
Built and maintained dynamic dashboards and analytical reports using Tableau and Power BI, applying descriptive and trend analysis to provide real-time data visualizations, achieving a 30% reduction in time to insights and reporting.
Applied advanced data analytics techniques, including data profiling, anomaly detection, and statistical validation with SQL, to identify and rectify data discrepancies, resulting in a 20% increase in data accuracy and reliability for key business metrics.
Education
Colorado State University
→
Master's degree
Data Analytics
The University of Central Florida
→
Bachelor's degree
Information Technology
Certificates
Astronomer Certification for Apache Airflow 2 Fundamentals
Issued By
Astronomer
CSU Global Academic Specialization in Applied Data Analytics
Issued By
SAS
CSU Global Academic Specialization in Business Intelligence and Performance Management
Issued By
SAS
Skills
Version Control & CI/CD
Git, GitHub Actions, Jenkins (via Databricks).
Programming Languages
Python, Java, SQL.
Cloud Platforms
AWS (EC2, S3, Lambda, Glue, RDS), Azure (Azure SQL Database, Azure Data Factory).
Data Engineering
Databricks, Apache Spark, PySpark, Apache Airflow, Apache Kafka.
Databases
PostgreSQL, Microsoft SQL Server, Amazon Redshift.
Infrastructure as Code (IaC)
Terraform.
Data Pipelines & ETL
dbt, AWS Glue, Azure Data Factory, Apache Kafka.
Data Warehousing & Lakes
Snowflake, Delta Lake, Iceberg.
Data Visualization
Power BI, Tableau, Mode.