Sagarika K - Data Engineer |
[email protected] |
Location: Ypsilanti, Michigan, USA |
Relocation: yes |
Visa: OPT |
Summary:
Over 5 years of IT experience, with a focus on Big Data technologies, the Hadoop ecosystem, and SQL based solutions across the Manufacturing, Financial, and Communication sectors.4 years dedicated to Big Data Analytics using various tools within the Hadoop ecosystem and the Spark Framework. Currently, working extensively with Spark and Spark Streaming frameworks, primarily using Scala as the programming language. Proficient in BI application design utilizing Tableau and Power BI. Core Competencies: Highly skilled in working with Hive, Oracle, SQL Server, SQL, PL/SQL, T-SQL, and managing large-scale databases. Proficient in programming with scripting languages such as Java and Scala. Extensive experience in developing custom UNIX shell scripts for Hadoop and Big Data environments. Developed data pipelines using Pig, Sqoop, and Flume to extract and store data in HDFS, as well as creating Pig Latin Scripts and using HiveQL for data analysis. Strong background in handling Spark Streaming and Apache Kafka to manage real-time data streams. Expertise in integrating Amazon Web Services (AWS) with various application infrastructures. Played a key role in writing Java APIs for Amazon Lambda to manage several AWS services. Experienced in ensuring high availability, fault tolerance, and auto-scaling within AWS CloudFormation. Responsible for supporting production environments and troubleshooting JavaScript (JS) applications in AWS. Utilized AWS S3 for static content storage and retrieval, and AWS CloudFront to enhance latency performance. Configured AWS Security Groups for deploying and managing AWS EC2 instances. Designed and implemented AWS VPC networks, configured security groups, and managed Elastic IPs. Developed automated database tasks using UNIX shell scripting. Created AWS CloudFormation templates to provision custom-sized VPCs, subnets, EC2 instances, ELBs, and security groups. Designed and implemented data models and schemas on SQL Azure. Proficient in Azure Microservices, Azure Functions, and other Azure solutions. Experience working with big data in Azure, including connecting HDInsight to Azure and utilizing Big Data technologies. Hands-on experience with Azure backup and restore services. Involved in setting up Azure Virtual Machines and other Azure Services. Extensive experience in building data processing applications using Teradata, Oracle, SQL Server, and MySQL. Worked with data warehousing and ETL tools, including Informatica, Tableau, and Qlik Replicate. Familiar with implementing security protocols in Hadoop, including integration with Kerberos for authentication and authorization. Well-versed in Agile and Waterfall methodologies, with strong communication skills for client-facing engagements. Education: Masters in Information Systems, Eastern Michigan University, Michigan, USA Bachelors in Electrical and Electronics Engineering, SR Engineering college Hyderabad, India Technical Skills: Big Data Technologies HDFS, MapReduce, Hive, Pig, Sqoop, Flume, Oozie, Zookeeper, Kafka, Cassandra, Apache Spark, Spark Streaming, HBase, Flume, Impala Hadoop Distribution Cloudera, Horton Works, Apache, AWS Languages Java, SQL, PL/SQL, Python, Pig Latin, HiveQL, Scala, Regular Expressions Web Technologies HTML, CSS, JavaScript, XML, JSP, Restful, SOAP Operating Systems Windows (XP/7/8/10), UNIX, LINUX, UBUNTU, CENTOS. Portals/Application servers WebLogic, WebSphere Application server, WebSphere Portal server, JBOSS, TOMCAT Build Automation tools SBT, Ant, Maven Version Control GIT IDE &Build Tools, Design Eclipse, Visual Studio, Junit, IntelliJ, PyCharm. Databases Oracle, SQL Server, MySQL, MS Access, NoSQL Database (HBase, Cassandra, MongoDB), Teradata, Neo4J. PROFESSIONAL EXPERIENCES: Client: Wells Fargo, Des Moines, Iowa, USA January 2023 - January 2024 Role: Data Engineer / Spark Developer Responsibilities: Led the data pipeline process, from initial data ingestion into HDFS to advanced processing and analysis, ensuring data consistency and performance. Engineered Spark APIs to efficiently migrate data from Teradata into HDFS and structured the data using Hive tables for optimized querying. Created and executed Sqoop jobs to transfer data from Oracle databases into Avro format, followed by building Hive tables for further analysis. Enhanced data processing workflows by running and optimizing scripts across Hive, Impala, Hive on Spark, and Spark SQL environments. Utilized Kafka in conjunction with Logstash to efficiently manage and monitor log data streams, ensuring high availability and performance. Conducted performance tuning of Hive by refining design, storage options, and query performance, leading to significant processing improvements. Designed, built, and deployed Docker containers within AWS ECS, streamlining the deployment pipeline with CI/CD automation. Configured AWS ALB to effectively route traffic to specific targets within AWS ECS, enhancing application performance and reliability. Automated deployment processes by writing and implementing YAML scripts in AWS CodeDeploy. Led the migration strategy for applications and databases to Azure, focusing on scalability and cost-efficiency. Managed server provisioning and deployment using Azure Resource Manager Templates and Azure Portal, ensuring smooth operations. Oversaw the migration of on-premises virtual machines to Azure using Azure Site Recovery, minimizing downtime and ensuring data integrity. Developed and optimized Spark scripts for importing large datasets from Amazon S3, significantly improving data processing times. Utilized Scala to build Spark Core and Spark SQL scripts, facilitating faster and more efficient data processing. Implemented Kafka consumer APIs in Scala to manage and consume data streams from Kafka topics, ensuring real-time data availability. Defined program specifications, conducted rigorous testing, and implemented necessary modifications to ensure reliable software performance. Applied complex RDD/Datasets/DataFrame transformations within Scala using Spark Context and Hive Context to enhance data processing capabilities. Environment: HDFS, YARN, MapReduce, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark SQL, Spark Streaming, Eclipse, Informatica, Oracle, Teradata, PL/SQL, UNIX Shell Scripting, Cloudera. Client: Cyient Ltd, India August 2018 - July 2022 Role: Data Engineer Responsibilities: Designed and developed Proofs of Concept (POCs) using Spark and Scala to compare performance against traditional MapReduce and Hive processes, leading to strategic decisions on technology adoption. Created and managed Hive tables, loading and analyzing data using complex Hive queries for deep insights. Developed custom Hive UDFs to extend Hive functionalities, enabling more advanced data processing and analysis. Leveraged JSON and XML SerDe s for efficient serialization and deserialization, ensuring seamless loading of structured data into Hive tables. Implemented solutions to handle message reprocessing in Kafka using offset ID, improving data accuracy and reliability. Validated Azure Resources post-deployment using Pester, ensuring infrastructure met all specified requirements. Designed and implemented robust backup and recovery solutions in Azure, enhancing data protection strategies. Installed, configured, and administered Azure IaaS and Azure AD environments, optimizing cloud infrastructure for performance and security. Managed and monitored Azure Virtual Networks and virtual machines through the Windows Azure Portal, ensuring smooth operations. Developed and executed Sqoop jobs to transfer data from RDBMS to HDFS and Hive, enabling efficient data storage and retrieval. Transformed dynamic XML data for ingestion into HDFS, ensuring compatibility with downstream processing. Implemented Spark scripts using Scala and Spark SQL to access Hive tables, accelerating data processing. Managed data loading from UNIX file systems into HDFS, ensuring data availability for processing. Worked on serverless deployments via AWS CLI, streamlining the deployment of Elastic Beanstalk applications across environments. Designed, built, and deployed Docker images on AWS ECS, integrating CI/CD pipelines for automated deployments. Optimized Hive metastore integration with Spark SQL through Hive Context and SQL Context, improving data query performance. Scheduled and validated daily jobs using Control-M, ensuring reliable job execution and monitoring. Environment: Spark SQL, HDFS, Hive, Pig, Apache Sqoop, Java (JDK SE 6, 7), Scala, Shell scripting, Linux, MySQL, Oracle Enterprise DB, PostgreSQL, IntelliJ, Oracle, Subversion, Control-M, Teradata, Agile Methodologies. Keywords: continuous integration continuous deployment javascript business intelligence sthree database active directory information technology microsoft procedural language Idaho |