Surya - Data Engineer |
[email protected] |
Location: Charlotte, North Carolina, USA |
Relocation: Yes |
Visa: Green Card |
Surya Sudarshan
Email: [email protected] Phone: +1(980)990-6478 LinkedIn: www.linkedin.com/in/suryar123 Professional Summary: Seasoned Data Engineer with 10 years of hands-on experience in designing, implementing, and optimizing data pipelines and analytics solutions across multiple cloud platforms, including GCP, AWS, and Azure. Expertise in Google Cloud Platform (GCP) services such as Google Dataflow, BigQuery, Cloud Pub/Sub, and Cloud Composer, leveraging them to build scalable and cost-effective data processing workflows. Proficient in Python and shell scripting for automating data workflows, orchestrating cloud resources, and implementing data transformation logic, ensuring efficiency and reliability. Skilled in federated queries and VPC configuration, adept at integrating diverse data sources and ensuring secure and compliant data access within GCP environments. Experienced in managing cloud infrastructure using Terraform, Ansible, and AWS CloudFormation, streamlining deployment processes and maintaining infrastructure as code. Familiarity with data governance best practices and tools such as GCP Data Catalog and AWS Glue, facilitating data discovery, lineage tracking, and metadata management. Strong background in SQL, with expertise in querying and optimizing databases on platforms like Azure SQL, SQL Server, Teradata, and Google BigQuery. Proficient in Apache Kafka and Spark for real-time data processing and analytics, enabling organizations to derive actionable insights from streaming data sources. Skilled in building and optimizing data warehouses and data lakes using AWS Redshift, Azure Synapse Analytics, and Google Cloud BigQuery, ensuring scalability and performance. Experienced in implementing microservices architecture and containerization using Docker and Kubernetes, enabling modular and scalable deployment of data applications. Knowledgeable in traditional ETL tools such as SSIS, DataStage, and Informatica, as well as modern cloud-native ETL services like AWS Glue and GCP Dataflow. Proficient in version control systems like Git and collaborative development tools like Jira, facilitating efficient collaboration and code management within development teams. Experienced in data visualization and reporting tools such as Tableau, Power BI, and Google Data Studio, enabling stakeholders to gain actionable insights from data. Skilled in performance tuning and optimization of data processing workflows, databases, and distributed systems, ensuring high availability, reliability, and efficiency. Strong analytical and problem-solving skills, with a proven track record of identifying and resolving complex data engineering challenges in diverse environments. Effective communicator and team player, with a demonstrated ability to collaborate with cross-functional teams and stakeholders to deliver impactful data solutions. Committed to continuous learning and staying updated with the latest trends and technologies in data engineering, cloud computing, and analytics. Proven ability to lead and mentor junior team members, fostering a culture of knowledge sharing, innovation, and excellence in data engineering practices. Technical Skills: Category Skills Google Cloud Platform (GCP) Google Dataflow, GCP, GCS (Google Cloud Storage), BigQuery, GCP Dataproc, Cloud Composer, Cloud Pub/Sub, VPC Configuration, Data Catalog, GCP Databricks, Cloud Spanner, Cloud SQL, Federated Queries, VPN Google-Client AWS (Amazon Web Services) EC2, S3, EBS, ELB (Elastic Load Balancing), RDS (Relational Database Service), SNS (Simple Notification Service), SQS (Simple Queue Service), VPC, CloudFormation, CloudWatch, ELK Stack, DynamoDB, Kinesis, AWS Redshift, AWS Data Pipelines, AWS Glue, CodeDeploy, CodePipeline, CodeBuild, CodeCommit DevOps & Version Control Bitbucket, Ansible, Git, Jira, PowerShell, Terraform, Maven Programming & Scripting Python, Shell Scripting, Java 1.7, Spark, Pig, Spark SQL, Spark Streaming, JDBC Big Data & Data Processing Hadoop, Hive, MapReduce, Spark, HBase, Sqoop, Kafka, Apache Kafka Microsoft Azure Azure, Azure Event Hubs, Azure Synapse, Azure Data Factory, Azure Databricks, Azure Databricks GIT Hub, Azure Service Bus, Azure SQL, SQL Server 2017 Database & BI Tools SQL Server, Teradata, Tableau, Power BI, SSIS (SQL Server Integration Services), SSAS (SQL Server Analysis Services), SSRS (SQL Server Reporting Services), DataStage, QualityStage, Power BI Microservices & Agile Microservices, Agile Others Unix/Linux, WebSphere, Splunk, SonarQube, SFDC (Salesforce Development Cloud), JSON Professional Experience Ascena Retail Group, Patskala, Ohio GCP Data Engineer November 2021 to Present Responsibilities: Developed and maintained scalable data pipelines using GCP Dataflow and Cloud Pub/Sub for real-time data processing, enhancing data availability for financial services. Implemented data warehousing solutions using BigQuery, improving data analytics capabilities for Truist Bank. Managed and optimized data storage with GCS (Google Cloud Storage), ensuring secure and efficient data access. Automated data integration processes using Sqoop and Cloud Storage Transfer Service, streamlining data transfers between Hadoop and GCP. Configured and deployed GCP Dataproc clusters for processing large datasets, optimizing computational resources. Developed ETL processes with Pyspark on GCP Dataproc, enhancing data preparation and analysis for banking operations. Utilized Python and SQL for data manipulation and querying, supporting diverse data analysis projects. Designed and implemented Cloud Spanner and Cloud SQL databases for high availability and global scalability, meeting the bank's operational demands. Leveraged GCP Databricks and Data Bricks for advanced analytics and machine learning projects, driving insights into customer behavior. Integrated Power BI for data visualization and reporting, providing actionable insights to decision-makers. Employed GCP Dataprep for data cleansing and preparation, ensuring high-quality data for analysis. Utilized SAS and Hive for data analysis and processing, supporting complex data analytics needs. Implemented data security measures and compliance protocols using GCP security tools, ensuring the protection of sensitive financial data. Optimized data storage costs and performance using Snowflake on GCP, balancing efficiency and scalability. Managed Teradata databases on GCP, ensuring robust data warehousing capabilities for the bank. Automated financial reports generation using SQL Database and BigQuery, reducing manual effort and improving report accuracy. Developed predictive models using Pyspark and GCP Machine Learning tools, enhancing customer credit scoring and fraud detection. Collaborated with IT and business teams to understand data needs and deliver tailored data solutions. Ensured data governance and metadata management with Data Catalog, improving data discoverability and compliance. Leveraged Cloud Composer for workflow orchestration, automating data pipeline workflows across various GCP services. Conducted data migration projects using Cloud Storage Transfer Service, ensuring seamless transition to GCP. Utilized Hadoop and Hive on GCP for processing and analyzing large datasets, supporting big data initiatives. Monitored and optimized data processing jobs with GCP's DataProc and BigQuery, ensuring efficient data operations. Documented technical processes and data architecture designs, facilitating knowledge sharing and operational continuity. Engaged in continuous learning and adaptation of new GCP technologies, contributing to the bank's innovation in digital finance solutions. Designed and executed data processing pipelines using Google Dataflow and GCP Dataproc, enhancing data analysis and reporting capabilities for retail operations. Environment: Google Dataflow,GCP, GCS, BigQuery, GCP Dataflow, GCP Dataproc, Cloud Composer, Cloud Pub/Sub, python, shell scripts, Federated Queries, VPC Configuration, Data Catalog. VPN Google-Client, Pub Sub, SSIS, SSAS, SSRS, DATASTAGE, QUALITYSTAGE, Cloud Storage Transfer Service, Cloud Spanner, Cloud SQL, Data Catalog, GCP Databricks. Broadridge, Lake Success, NY AWS data Engineer May 2019 to October 2021 Responsibilities: Deployed and managed AWS services such as EC2, S3, EBS, ELB, RDS, SNS, SQS, VPC, leveraging CloudFormation for infrastructure as code, ensuring seamless scalability and reliability. Developed and maintained data pipelines on AWS using services like AWS Data Pipeline and AWS Glue, facilitating efficient data ingestion, transformation, and loading processes. Implemented robust monitoring and logging solutions utilizing CloudWatch and the ELK Stack to ensure real-time visibility into system performance and data flows. Utilized Terraform to automate infrastructure provisioning and deployment, enabling consistent and repeatable environments across development, testing, and production. Designed and optimized data storage solutions on AWS, including Redshift and S3, for high availability, durability, and performance of large-scale datasets. Collaborated with cross-functional teams to gather requirements and design scalable data architectures that meet business needs and adhere to best practices. Developed custom data processing solutions using Spark, Hive, Pig, and Spark SQL to perform complex analytics on large datasets stored in Hadoop and AWS S3. Implemented real-time data processing pipelines using Kinesis, Spark Streaming, and Kafka, enabling timely insights and decision-making from streaming data sources. Automated routine tasks and workflows using Python, Shell Scripting, and PowerShell, improving operational efficiency and reducing manual errors. Managed source code repositories and version control using GIT, facilitating collaboration and code review among development teams. Configured and administered application servers like JBOSS and WebSphere, ensuring optimal performance and reliability of deployed applications. Integrated CI/CD pipelines using CodeDeploy, CodePipeline, CodeBuild, and CodeCommit to automate build, test, and deployment processes for data applications. Implemented security best practices and compliance standards on AWS resources, including IAM policies, encryption mechanisms, and network access controls. Conducted performance tuning and optimization of AWS Redshift, Spark, and Hadoop clusters to improve query performance and resource utilization. Implemented data governance and lineage tracking mechanisms to ensure data quality, integrity, and lineage across various data sources and transformations. Worked closely with data scientists to operationalize machine learning models and algorithms for scalable and real-time inference on AWS infrastructure. Provided technical guidance and mentorship to junior team members, fostering a culture of learning and continuous improvement. Conducted regular health checks and audits of AWS environments to identify and remediate security vulnerabilities, performance bottlenecks, and cost optimizations. Integrated third-party tools and services such as Splunk and SonarQube for advanced monitoring, log analysis, and code quality assessment. Implemented disaster recovery and high availability strategies for critical data systems and applications running on AWS, ensuring business continuity and resilience. Designed and implemented data archiving and retention policies to manage the lifecycle of data and optimize storage costs on AWS. Collaborated with stakeholders to define data governance policies, data retention guidelines, and compliance requirements for regulatory standards such as GDPR and CCPA. Conducted proof-of-concept evaluations and performance benchmarking of new AWS services and technologies to assess their suitability for specific use cases. Participated in on-call rotation schedules to provide timely support and resolution for production incidents and outages affecting data services on AWS. Documented architectural designs, deployment processes, and operational procedures to ensure knowledge sharing and maintain system documentation up to date. Environment: AWS (EC2, S3, EBS, ELB, RDS, SNS, SQS, VPC, Cloud formation, CloudWatch, ELK Stack), Bitbucket, Ansible, Python, Shell Scripting, PowerShell, GIT, Jira, JBOSS, Terraform, Redshift, Maven, Web sphere, Unix/Linux, Dynamodb, Kinesis, AWS Redshift, AWS S3, AWS Data Pipe Lines, AWS Glue, CodeDeploy, CodePieline, CodeBuild, CodeCommit, Splunk, SonarQube, Spark, Hive, Pig, Spark SQL, Spark Streaming, HBase, Sqoop, Kafka, Cloudera. CBRE, Dallas, TX Azure data engineer September 2017 to April 2019 Responsibilities: Developed and maintained Azure Data Factory pipelines for data ingestion, transformation, and loading (ETL), enhancing data workflow efficiency for real estate listings. Implemented Azure Databricks for data analytics and processing, utilizing Python for scripting, which enabled advanced analytics on property data. Designed and managed Azure SQL databases, ensuring high availability and security for real estate transaction data. Utilized Apache Kafka and Azure Event Hubs for real-time data streaming, improving the timeliness and accuracy of property listings and market reports. Configured and administered Azure Synapse Analytics, enabling scalable analysis over large datasets, including MLS information and real estate market trends. Created Power BI dashboards and reports, providing insights into market dynamics, property valuations, and consumer trends, supporting decision-making processes. Automated data integration processes between Azure platforms and SFDC, streamlining the flow of sales and customer data to enhance CRM strategies. Managed Unix-based scripts for system administration and data processing tasks, ensuring the smooth operation of data pipelines on Azure. Designed and executed SQL Server Integration Services (SSIS) packages for data migration, facilitating the integration of legacy systems into Azure SQL databases. Optimized data storage and retrieval by implementing Azure Data Lake storage solutions, supporting scalable big data analytics. Deployed Azure Service Bus for cross-application communication, enabling reliable data exchange and workflow automation across real estate services. Maintained Azure Databricks GIT Hub repositories for version control of data processing scripts and notebooks, promoting collaboration and code quality. Conducted data modeling and warehousing using Azure Synapse, supporting the structured storage of MLS data for efficient querying and reporting. Implemented T-SQL procedures for complex data manipulation and reporting tasks, enhancing the accuracy and relevancy of real estate analytics. Utilized Hadoop ecosystem technologies like Hive and MapReduce for processing and analyzing large datasets, improving insights on market trends. Integrated Azure SQL with SQL Server 2017, ensuring seamless data synchronization and backup, enhancing data integrity and availability. Developed custom data visualization tools in Tableau and Power BI, offering dynamic reporting capabilities for real estate market analysis. Managed Teradata database solutions for high-volume data warehousing, enabling efficient data management and analytics for CREB's MLS system. Automated data quality checks using Azure Databricks, ensuring the reliability of property listings and market data. Orchestrated data migrations to Azure cloud environments, minimizing downtime and maintaining data integrity across real estate databases. Designed and enforced data governance policies using Azure s security and compliance features, safeguarding sensitive real estate information. Collaborated with real estate analysts and stakeholders to define data requirements, translating business needs into technical specifications for Azure data solutions. Monitored and optimized Azure resource usage, reducing costs while maintaining performance for data operations. Provided training and support to CREB staff on using Azure analytics tools and Power BI, enhancing the organization's data-driven decision-making capabilities. Evaluated and integrated new Azure features and services into CREB's data architecture, staying ahead of technological advancements in real estate data management. Environment: Apache Kafka, Azure, Python, power BI, Unix, SQL Server, Hadoop, Hive, Map Reduce, Teradata, SQL, Azure event hubs, Azure synapse, Azure data factory, Azure Databricks, Azure Databricks GIT Hub, Azure Service Bus, Azure SQL, SQL Server 2017, Tableau, Power BI, SFDC, SQL, T-SQL, Hive. Netenrich Technologies Pvt. Ltd. Hadoop Data Engineer June 2014 to May 2017 Responsibilities: Implemented and maintained Hadoop ecosystems, including HDFS (Hadoop Distributed File System) and YARN (Yet Another Resource Negotiator), to store and process large volumes of structured and unstructured data efficiently. Developed MapReduce programs in Java 1.7 to perform distributed data processing tasks, such as data aggregation, filtering, and transformation, across Hadoop clusters. Designed and optimized data ingestion pipelines using Apache Kafka for real-time data streaming and Hive for batch processing, ensuring seamless data integration from diverse sources. Built and managed HBase databases for storing and retrieving semi-structured data, enabling fast and scalable access to key-value store data within the Hadoop ecosystem. Implemented microservices architecture principles to modularize and containerize Hadoop applications, facilitating easier deployment, scaling, and management in AWS cloud environments. Collaborated with cross-functional Agile teams to gather requirements, design data models, and deliver scalable solutions that meet business objectives and user needs. Developed custom data processing workflows using Apache Spark to perform advanced analytics, machine learning, and graph processing on large datasets in memory. Utilized JDBC (Java Database Connectivity) to establish connections and interact with relational databases from Hadoop applications, enabling seamless data exchange and integration. Implemented data serialization and deserialization techniques using JSON (JavaScript Object Notation) and other formats to optimize data transfer and storage efficiency in Hadoop environments. Performed data wrangling and transformation using Apache Pig to clean, preprocess, and prepare raw data for analysis and consumption by downstream applications and users. Developed and maintained documentation for data engineering processes, including data flows, data lineage, and data dictionaries, to ensure transparency and knowledge sharing within the team. Provided technical support and troubleshooting for production Hadoop clusters, diagnosing and resolving performance issues, data inconsistencies, and job failures promptly. Environment: Hadoop, Aws, Microservices, Java 1.7, MapReduce, Agile, HBase, JSON, Spark, Kafka, JDBC, Hive, JSON, Pig. Keywords: continuous integration continuous deployment business intelligence sthree information technology New York Texas |