GUNDA SAI VIGNESH

Data Engineer | Cloud Security & IAM Specialist
Denton, US.

About

Highly accomplished Data Engineer and Identity Governance Specialist with 4+ years of experience delivering scalable, secure, and data-driven enterprise solutions across Business Analytics, CRM, and Data Engineering domains. Proven expertise in designing and implementing robust data pipelines, ETL/ELT workflows, and advanced analytics solutions on Google Cloud Platform (GCP) and AWS, leveraging BigQuery, Dataflow, and PySpark. Adept at enhancing identity governance through SailPoint IdentityIQ/IdentityNow, RBAC, and API-driven automation, consistently driving data-driven decision-making and business growth.

Work

Cyberone-Solutions LLC.
|

Data Security Analyst – IAM & Cloud Security Analytics

McKinney, TX, US

Summary

Designed and implemented cloud-native security analytics platforms for centralized identity governance monitoring, processing millions of daily identity lifecycle events and policy violations.

Highlights

Designed and implemented a cloud-native security analytics platform, integrating SailPoint IdentityIQ, Active Directory, and application access logs into GCP BigQuery for centralized identity governance monitoring.

Developed real-time and batch pipelines using PySpark, Dataflow, and Airflow to process millions of daily identity lifecycle events, access requests, and policy violations.

Engineered robust data classification, masking, and encryption workflows in GCP, enhancing protection for PII and sensitive corporate datasets.

Developed ML-driven anomaly detection models in Python, reducing incident response time by 45% for security incidents like abnormal login patterns and data exfiltration.

Automated data quality checks and compliance report generation, improving audit readiness and accuracy to 99.9% while ensuring alignment with SOC 2, GDPR, HIPAA, and ISO 27001.

Kom info Solutions Pvt Ltd
|

Data Engineer

India, India

Summary

Led end-to-end data pipeline development and cloud migration initiatives, focusing on scalable ETL/ELT solutions and big data processing across diverse environments.

Highlights

Designed, developed, and maintained scalable ETL/ELT data pipelines on Google Cloud Platform (GCP) using Dataflow, Dataproc, BigQuery, and Cloud Functions.

Built robust data integration workflows with PySpark and Airflow, migrating legacy ETL processes from Informatica PowerCenter to PySpark frameworks, significantly improving performance and cost efficiency.

Implemented star schema data models in BigQuery, optimizing query performance through partitioning, clustering, and indexing for large-scale datasets.

Automated data archival, quality checks, and metadata management with Python scripts, integrated with GCS and Google Data Catalog.

Deployed pipeline features in an Agile environment using CI/CD workflows (Jenkins, GitHub), improving runtime performance and resource utilization through Stackdriver monitoring and optimization.

Developed Spark programs in Scala and PySpark for large-scale data processing and batch processing, integrating data from SQL, Excel, and Oracle into Power BI for analytics.

Installed, configured, and maintained Apache Hadoop clusters, and built ETL workflows with Pig and Hive to process and clean raw data for Hadoop Data Lake.

Enforced role-based data security via BigQuery authorized views and IAM policies.

Education

University Of North Texas
Denton, TX, United States of America

M.S.

Business Analytics

Amity University
Gurugram, HR, India

BALLB(H)

Arts and Law

Certificates

CompTIA Security+ (SY0-701)

Issued By

CompTIA

Microsoft Identity and Access Administrator Associate (SC-300)

Issued By

Microsoft

UDEMY Certified AI Engineer

Issued By

Udemy

NASSCOM Certified Data Engineer & Data Scientist

Issued By

NASSCOM

Certified SailPoint Implementation Engineer and Architect

Issued By

SailPoint

Skills

Cloud Platforms

GCP (BigQuery, GCS, Dataflow, Pub/Sub, Cloud Functions, Dataproc, Data Catalog, Data Studio, BigQuery ML, Data Prep), AWS (S3, Glue, Redshift, Lambda, EMR).

Programming & Scripting

Python, SQL, Scala, PySpark, Java, J2EE, JavaScript, PowerShell, T-SQL, PL/SQL, Shell Scripts.

Big Data Technologies

Apache Spark, PySpark, Hadoop (Hive, Sqoop, HDFS, MapReduce), Presto, Zeppelin, Jupyter, Pig, HBase.

Databases

PostgreSQL, MySQL, Snowflake, BigQuery, SQL Server, Oracle, MongoDB.

ETL & Orchestration

Apache Airflow, SSIS, Informatica PowerCenter, Azure Data Factory, DataStage, DTS, Stitch ETL.

DevOps & CI/CD

Docker, Git, Terraform, Jenkins, GitHub.

Data Visualization & BI

Power BI, Tableau, Data Studio, Informatica.

Operating Systems

Linux, Unix, Windows.

Identity & Access Management (IAM)

SailPoint IdentityNow, SailPoint IdentityIQ (IIQ), RBAC, Active Directory, JDBC, ServiceNow, Salesforce, Snowflake, Okta, CyberArk, PAM onboarding, Credential Cycling, Just-In-Time Access, IAM Policies, Lifecycle Events (Joiner/Mover/Leaver), Certifications, Provisioning, SSO Integrations, REST APIs.

Machine Learning & Analytics

Pandas, NumPy, Seaborn, Scikit-learn, XGBoost, Linear Regression, Decision Trees, Random Forest, Data Cleaning, EDA (Exploratory Data Analysis), Feature Engineering, Model Evaluation, SAS.

Methodologies & Practices

Agile, Scrum, Troubleshooting, Performance Optimization, Cross-functional Collaboration, Technical Documentation, Test Plans, Standard Operating Procedures (SOPs), Data Modeling (Star, Snowflake Schemas, OLAP/OLTP), Data Warehousing.

Compliance & Security Standards

SOC 2, GDPR, HIPAA, ISO 27001, Zero Trust Identity.

Projects

Taxi Fare Prediction Using Machine Learning and EDA

Summary

Led a capstone project as part of the Business Analytics program at the University of North Texas, focused on predicting taxi fares using December 2023 trip data.