GUNDA SAI VIGNESH
Data Engineer | Cloud Security & IAM Specialist
Denton, US.About
Highly accomplished Data Engineer and Identity Governance Specialist with 4+ years of experience delivering scalable, secure, and data-driven enterprise solutions across Business Analytics, CRM, and Data Engineering domains. Proven expertise in designing and implementing robust data pipelines, ETL/ELT workflows, and advanced analytics solutions on Google Cloud Platform (GCP) and AWS, leveraging BigQuery, Dataflow, and PySpark. Adept at enhancing identity governance through SailPoint IdentityIQ/IdentityNow, RBAC, and API-driven automation, consistently driving data-driven decision-making and business growth.
Work
McKinney, TX, US
→
Summary
Designed and implemented cloud-native security analytics platforms for centralized identity governance monitoring, processing millions of daily identity lifecycle events and policy violations.
Highlights
Designed and implemented a cloud-native security analytics platform, integrating SailPoint IdentityIQ, Active Directory, and application access logs into GCP BigQuery for centralized identity governance monitoring.
Developed real-time and batch pipelines using PySpark, Dataflow, and Airflow to process millions of daily identity lifecycle events, access requests, and policy violations.
Engineered robust data classification, masking, and encryption workflows in GCP, enhancing protection for PII and sensitive corporate datasets.
Developed ML-driven anomaly detection models in Python, reducing incident response time by 45% for security incidents like abnormal login patterns and data exfiltration.
Automated data quality checks and compliance report generation, improving audit readiness and accuracy to 99.9% while ensuring alignment with SOC 2, GDPR, HIPAA, and ISO 27001.
India, India
→
Summary
Led end-to-end data pipeline development and cloud migration initiatives, focusing on scalable ETL/ELT solutions and big data processing across diverse environments.
Highlights
Designed, developed, and maintained scalable ETL/ELT data pipelines on Google Cloud Platform (GCP) using Dataflow, Dataproc, BigQuery, and Cloud Functions.
Built robust data integration workflows with PySpark and Airflow, migrating legacy ETL processes from Informatica PowerCenter to PySpark frameworks, significantly improving performance and cost efficiency.
Implemented star schema data models in BigQuery, optimizing query performance through partitioning, clustering, and indexing for large-scale datasets.
Automated data archival, quality checks, and metadata management with Python scripts, integrated with GCS and Google Data Catalog.
Deployed pipeline features in an Agile environment using CI/CD workflows (Jenkins, GitHub), improving runtime performance and resource utilization through Stackdriver monitoring and optimization.
Developed Spark programs in Scala and PySpark for large-scale data processing and batch processing, integrating data from SQL, Excel, and Oracle into Power BI for analytics.
Installed, configured, and maintained Apache Hadoop clusters, and built ETL workflows with Pig and Hive to process and clean raw data for Hadoop Data Lake.
Enforced role-based data security via BigQuery authorized views and IAM policies.
Certificates
Skills
Cloud Platforms
GCP (BigQuery, GCS, Dataflow, Pub/Sub, Cloud Functions, Dataproc, Data Catalog, Data Studio, BigQuery ML, Data Prep), AWS (S3, Glue, Redshift, Lambda, EMR).
Programming & Scripting
Python, SQL, Scala, PySpark, Java, J2EE, JavaScript, PowerShell, T-SQL, PL/SQL, Shell Scripts.
Big Data Technologies
Apache Spark, PySpark, Hadoop (Hive, Sqoop, HDFS, MapReduce), Presto, Zeppelin, Jupyter, Pig, HBase.
Databases
PostgreSQL, MySQL, Snowflake, BigQuery, SQL Server, Oracle, MongoDB.
ETL & Orchestration
Apache Airflow, SSIS, Informatica PowerCenter, Azure Data Factory, DataStage, DTS, Stitch ETL.
DevOps & CI/CD
Docker, Git, Terraform, Jenkins, GitHub.
Data Visualization & BI
Power BI, Tableau, Data Studio, Informatica.
Operating Systems
Linux, Unix, Windows.
Identity & Access Management (IAM)
SailPoint IdentityNow, SailPoint IdentityIQ (IIQ), RBAC, Active Directory, JDBC, ServiceNow, Salesforce, Snowflake, Okta, CyberArk, PAM onboarding, Credential Cycling, Just-In-Time Access, IAM Policies, Lifecycle Events (Joiner/Mover/Leaver), Certifications, Provisioning, SSO Integrations, REST APIs.
Machine Learning & Analytics
Pandas, NumPy, Seaborn, Scikit-learn, XGBoost, Linear Regression, Decision Trees, Random Forest, Data Cleaning, EDA (Exploratory Data Analysis), Feature Engineering, Model Evaluation, SAS.
Methodologies & Practices
Agile, Scrum, Troubleshooting, Performance Optimization, Cross-functional Collaboration, Technical Documentation, Test Plans, Standard Operating Procedures (SOPs), Data Modeling (Star, Snowflake Schemas, OLAP/OLTP), Data Warehousing.
Compliance & Security Standards
SOC 2, GDPR, HIPAA, ISO 27001, Zero Trust Identity.