Swathi - Sr. Data Engineer / Cloud Data Engineer |
[email protected] |
Location: , Connecticut, USA |
Relocation: |
Visa: GC |
Name: Swathi Sakinala C2C/CTH
Email Id: [email protected] Visa: GC PROFESSIONAL SUMMARY: Over 10+ years of professional experience involving project development, implementation, deployment, and maintenance using Big Data technologies in designing and implementing complete end-to-end Hadoop based data analytical solutions using HDFS, MapReduce, Spark, Scala, Yarn, Kafka, PIG, HIVE, Sqoop, Oozie, HBase. Hands on expertise with AWS Databases such as RDS(Aurora), Redshift, DynamoDB and Elastic Cache (Memcached & Redis). Involved in designing and deploying multi-tier applications using all the AWS services like (EC2, Route53, S3, RDS, Dynamo DB, SNS, SQS, IAM) focusing on high-availability, fault tolerance, and auto-scaling in AWS Cloud Formation. Extensive experience with cloud platforms such as AWS, Azure, or Google Cloud Platform. Experience in using SDLC methodologies like Waterfall, Agile Scrum for design and development. Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing Partitioning and Bucketing, Writing, and Optimizing the HiveQL queries. Experience in implementing Azure data solutions, provisioning storage account, Azure Data Factory, SQL server, SQL Databases, SQL Data warehouse, Azure Data Bricks and Azure Cosmos DB. Experience developing iterative Algorithms using Spark Streaming in Scala and Python to builds near real-time dashboards, Experience in managing multiple databases across multiple servers. Experience with migrating data to and from RDBMS and unstructured sources into HDFS using Sqoop. Excellent Programming skills at a higher level of abstraction using Scala, AWS, and Python. Experience in job workflow scheduling and monitoring tools like Oozie and good knowledge on Zookeeper. Profound understanding of Partitions and Bucketing concepts in kaf and designed both Managed and External tables in Hive to optimize performance. Proficient in designing, developing, and maintaining scalable and secure cloud-based data solutions. Experienced in working on DevOps /Agile operations process (CI/CD) and tools area (Code review, unit test automation, Build & Release automation Environment, Incident and Change Management) including various tools (Git, Jenkins). Understanding of Developing the Custom reports using workday report writer tool and deploying it into workday tenant, Also Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra. Experience in exporting and importing data from different databases and files to SQL Server 2016/2014/2008/2005 using DTS and SSIS. Experience in creating Triggers, Stored procedures, Views, CTE, Tables and Cursors using T-SQL. Experience in Database Programming and Development in SQL Server, Oracle, Sybase and MySQL databases. Proficient in Python Scripting and worked in stats function with NumPy, visualization using Matplotlib and Pandas for organizing data. Involved in loading the structured and semi structured data into spark clusters using Spark SQL and Data Frames Application programming interface (API) Accomplished complex HiveQL queries for required data extraction from Hive tables and written Hive User Defined Functions (UDF's) as required. Excellent knowledge in using Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance. Proficient in working with major cloud platforms like AWS, Azure, or Google Cloud, leveraging their services and features for optimal data storage, processing, and analysis. Proficient in converting Hive/SQL queries into Spark transformations using Data frames and Data sets. Scala and Java for data cleansing, filtering and data aggregation. Also possess detailed knowledge of MapReduce framework. Established secure and efficient data pipelines through Azure Logic Apps, facilitating seamless data movement across different systems. Implemented and managed Azure SQL Database instances, optimizing performance and ensuring high availability for critical data assets. Experience working with NoSQL databases like Cassandra and HBase and developed real-time read/write access to very large datasets via HBase. Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries. Developed Spark Applications that can handle data from various RDBMS (MySQL, Oracle Database) and Streaming sources. Proficient SQL experience in querying, data extraction/transformations and developing queries for a wide range of applications. Developed serverless data processing solutions using AWS Lambda functions, minimizing infrastructure overhead and improving the agility of data processing workflows. Capable of processing large sets (Gigabytes) of structured, semi-structured or unstructured data. Extensive Shell/Python scripting experience for Scheduling and Process Automation. Good exposure to Development, Testing, Implementation, Documentation and Production support. Technical Skills: Big Data Eco System HDFS, MapReduce, Hive, Yarn, Pig, Sqoop, HBase, Kafka Connect, Impala, Stream set, Spark, Zookeeper, NiFi, Amazon Web Services. Programming Languages C#, ASP.NET, .Net Core, T SQL Software Methodologies Agile, SDLC Waterfall. Databases MySQL, MS SQL SERVER, Oracle, DB2, SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), Azure SQL ETL/BI SSIS, SSRS, SSAS, Azure ADT, Informatica Version control VSTS, GIT, SVN Web Development JavaScript, Node.js, HTML, CSS, .Net Core MVC, aurelia Operating Systems Windows (XP/7/8/10), Linux (Unix, Ubuntu), Mac OS. Cloud Technologies Azure, Snowflake, Amazon Web Services, Azure Data Bricks. Professional Experience: Client: Windstream, Hartford, CT Feb 2021 - Present Role: Sr. Data Engineer / Cloud Data Engineer Responsibilities: Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity. Heavily involved in testing Snowflake to understand best possible way to use the cloud resources Played key role in Migrating SQL database objects into Snowflake environment. Created Sessions and extracted data from various sources, transformed data according to the requirement and loading into data warehouse. Implemented Token based authentication to secure the ASP.NET Core Web API and provide authorization to different users. Implemented and optimized data pipelines on Google Cloud Platform using services such as Cloud Storage, BigQuery, Dataflow, and Pub/Sub. Used Informatica Power Center for (ETL) extraction, transformation and loading data from heterogeneous source systems into target database. Developed and optimized data processing workflows using Azure Data Factory, ensuring timely and accurate ETL operations. Demonstrated expertise in leveraging core AWS services such as Amazon S3, EC2, Lambda, Glue, and EMR for scalable and efficient data processing. Designed and implemented robust and scalable data pipelines in the cloud, utilizing tools such as Apache Airflow or AWS Glue, to efficiently process, transform, and move data between various systems. Analyzed business requirements and worked closely with the various application teams and business teams to develop ETL procedures that are consistent across all applications and systems. Performed ETL (extract, transform, and load) duties to include requirements validation, code development, source control, unit testing, and version deployment scripting. Converted various informatica mappings into SSIS packages. Hands in experience in working with Continuous Integration and Deployment (CI/CD) using Jenkins, Docker. Queried multiple databases like Snowflake, UDB and MySQL for data processing. Developed ETL pipelines in and out data warehouse using combination of Python and Snowflake Snow SQL. Writing SQL quires against Snowflake. Designed and deployed Azure Data Lake Storage to manage and organize large volumes of structured and unstructured data. Built real-time data processing pipelines using Cloud Dataflow and Apache Beam, ensuring timely and accurate data updates. Orchestrated the setup and management of data warehouses on Amazon Redshift, optimizing query performance and ensuring reliable access to critical business data. Extensively used Data Null and SAS procedures such as Print, Report, Tabulate, Freq, Means, Summary and Transpose for producing ad-hoc and customized reports and external files. Troubleshoot and resolve data processing issues and proactively engaged in data modelling discussions. Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming, and loading data into data warehouse. Translated the business requirements into workable functional and non-functional requirements at detailed production level using Workflow Diagrams, Sequence Diagrams, Activity Diagrams and Use Case Modelling with help of Erwin. Implemented Azure Data Factory security features, including Azure Managed Identity and Role-Based Access Control (RBAC), to ensure data integrity and compliance. Worked on developing Kafka Producers and Kafka Consumers for streaming millions of events per second on streaming data. Established comprehensive monitoring solutions using AWS CloudWatch, ensuring proactive identification and resolution of performance bottlenecks in data pipelines. Developed and optimized data processing workflows using Azure Data Factory, ensuring timely and accurate ETL operations. Designed and implemented ETL pipelines between from various Relational Data Bases to the Data Warehouse using Apache Airflow. And also, good understanding with machine learning concepts and integration of ML models into data pipelines. Integrated on-premises and cloud-based data sources using Azure Data Factory, ensuring a unified and real-time view of organizational data. Prepared dashboards using Tableau for summarizing Configuration, Quotes, Orders and other e-commerce data. Configured EC2 instances and configured IAM users and roles and created S3 data pipe using Boto API to load data from internal data sources. Experienced in handling large-scale data using technologies such as Apache Spark, Hadoop, or similar frameworks, ensuring high-performance data processing and analytics. Hands on experience with Alteryx software for ETL, data preparation for EDA and performing spatial and predictive analytics, also with big data technologies like Apache Spark, Hadoop, or Apache Flink. Involved in data modeling session, developed technical design documents and used the ETL DataStage Designer to develop processes for extracting, cleansing, transforms, integrating and loading data into data warehouse database. Designed and implemented Azure Synapse Analytics (formerly SQL Data Warehouse) to support complex analytical queries and reporting needs. Creating Spark clusters and configuring high concurrency clusters using Databricks to speed up the preparation of high-quality data and creating Databricks notebooks using SQL, Python and automated notebooks using jobs. Involved in creating Hive tables, and loading and analyzing data using hive queries Developed Hive queries to process the data and generate the data cubes for visualizing Implemented Selecting the appropriate AWS service based on data, compute, database, or security requirements and defined and deployed monitoring, metrics, and logging systems on AWS. Implemented real-time data streaming solutions using AWS Kinesis, ensuring timely and accurate insights for decision-making processes. Automated data pipeline workflows and orchestration using Cloud Composer, ensuring reliable and scheduled execution of ETL processes. Familiarity with data warehousing solutions like Amazon Redshift, Google BigQuery, or Azure Synapse Analytics. In-depth knowledge of cloud-based data storage solutions, including Amazon S3, Azure Data Lake Storage, or Google Cloud Storage. Client: Brown & Brown Insurance, Atlanta, GA Aug 2019 Jan 2021 Role: Sr. Data Engineer Responsibilities: Managed Logical and Physical Data Models in ER Studio Repository based on the different subject area requests for integrated model. Developed Data Mapping, Data Governance, and transformation and cleansing rules involving OLTP, ODS. Optimizing and tuning the Redshift environment, enabling queries to perform up to 100 xs faster for Tableau and SAS Visual Analytics. Understanding the business requirements and designing the ETL flow in DataStage as per the mapping sheet, Unit Testing and Review activities. Written Python Scripts, mappers to run on Hadoop distributed file system (HDFS) and performed troubleshooting, fixed and deployed many Python bug fixes of the two main applications that were a main source of data for both customers and internal customer service team. Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, Map Reduce Frameworks, HBase, and Hive. Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design. Exposure to Full Lifecycle (SDLC) of Data Warehouse projects including Dimensional Data Modeling. Worked with building data warehouse structures, and creating facts, dimensions, aggregate tables, by dimensional modeling, Star and Snowflake schemas. Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. Develop complex SQL queries, stored procedures and SSIS packages. Developed data pipeline using Spark, Hive and HBase to ingest customer behavioural data and financial histories into Hadoop cluster for analysis. Developed shell scripts for dynamic partitions adding to hive stage table, verifying JSON schema change of source files, and verifying duplicate files in source location. Worked on CICD Automation using tools like Jenkins, Salt stack, Git, Vagrant, Docker, Elastic Search, Grafana. Identified and tracked the slowly changing dimensions (SCD I, II, III & Hybrid/6) and determined the hierarchies in dimensions. Worked on data integration and workflow application on SSIS platform and responsible for testing all new and existing ETL data warehouse components. Designing Star schema and Snow Flake Schema on Dimensions and Fact Tables and worked with Data Vault Methodology Developed normalized Logical and Physical database models. Transformed Logical Data Model to Physical Data Model ensuring the Primary Key and Foreign key relationships in PDM, Consistency of definitions of Data Attributes and Primary Index considerations. Build a program with Python and apache beam and execute it in cloud Dataflow to run Data validation between raw source file and Big query tables. Strong background in Data Warehousing, Business Intelligence and ETL process (Informatica, AWS Glue) and expertise on working on Large data sets and analysis Building a Scala and spark based configurable framework to connect common Data sources like MYSQL, Oracle, Postgres, SQL Server, Salesforce, Big query and load it in Big query. Extensive Knowledge and hands-on experience implementing PaaS, IaaS, SaaS style delivery models inside the Enterprise (Data centre) and in Public Clouds using like AWS, Google Cloud, and Kubernetes etc. Provided Best Practice document for Docker, Jenkins, Puppet and GIT. Expertise in implementing DevOps culture through CI/CD tools like Repos, Code Deploy, Code Pipeline, GitHub. Install and configured Splunk Enterprise environment on Linux, Configured Universal and Heavy forwarder. Developed various Shell Scripts for scheduling various data cleansing scripts and loading process and maintained the batch processes using Unix Shell Scripts. Client: GM Financial, Naperville, IL Jun 2017 July 2019 Role: Data Analyst / Data Engineer Responsibilities: Worked on Google Cloud Platform (GCP) services like compute engine, cloud load balancing, cloud storage and cloud SQL. Created database objects like tables, views, procedures, and functions using SQL to provide definition, structure and to maintain data efficiently Created dashboards and interactive charts using Tableau to provide insights for managers and stakeholders and enable decision-making for market development Worked on designing ETL pipelines to retrieve the dataset from MySQL and MongoDB into AWS S3 bucket, managed bucket and objects access permission Performed data cleaning and wrangling using Python with a cluster computing framework like Spark Ability to manage multiple project tasks with changing priorities and tight deadlines in Agile environment Employed statistical analysis with R to examine hypothesis assumptions and choose features for machine learning Worked with cross-functional team, designed, developed and implemented a BI solution for marketing strategies Implemented Feature Engineering in Spark to tokenize text data, transform features with scaling, normalization and imputation Involved in building machine learning pipelines to do customer segmentation with Spark, clustered with PCA and K-means, and assisted the Data Scientist team to implement association rules mining Developed presentations using MS PowerPoint for internal and external audiences Collaborated with the Engineer team to design and maintain MySQL databases for storing and retrieving customer review data Employed SQL to build ETL Pipelines that filter, aggerate and join various tables to retrieve the desired data from MySQL databases Ingested data, explored, cleaned and integrated data from MySQL and MongoDB databases on AWS EC2 using Python and Hadoop to perform initial investigation, discover patterns, and check assumptions Provided BI Analysis for the marketing team to review impact on key metrics in relation to the project Used R to query the data, run statistical analysis and create reports or dashboards Prepared project progress reports and status reports and submitted to the management team on an ongoing basis. Built compelling visualizations and dashboards using Tableau to deliver actionable insights Employed feature engineering pipelines with Python to do normalization and scaling for numerical features, and tokenizing for categorial features, implemented PCA to reduce the dimensions Contributed in building Machine Learning models with scikit-learn library in Python, like Logistic Regression model, SVMs model, Random Forest model, and Naive Bayes model Worked on Spark SQL where the task is to fetch the NOTNULL data from two different tables and loads into a lookup table. Client: KLDiscovery, Eden Prairie, MN Jul 2013 May 2017 Role: Data Analyst Responsibilities: Designed ETL jobs for extracting data from heterogeneous source systems, transform and finally load into the Data Marts. Performed In-Death analysis of systems and business processes of Medicare Part D as per CMS rules and procedures. Performed & demonstrated sample Static Report builds and provide visual capabilities of SSRS 2005 product. Involved in the development of SSIS Packages & SSRS reports using BIDS. Created and executed SQL Server Integration Service (SSIS) packages to populate data from the various data sources, created packages for different data loading operations for different applications. Created system DBA Stored procedures, triggers, queries. Participated on claims testing team to review benefit changes for accuracy and check various non-participating fee schedules. Built business requirements into the Medicare Advantage (MA) requirements database and created the Project Requirements Document for the three functional areas. Participated in SSIS and SSRS requirement gathering and analysis. Apply Teradata Fast Load utility to load large volumes of data into empty Teradata tables in the Datamart for the Initial Load Apply Teradata Fast Export utility to export large volumes of data from Teradata tables and views for processing and reporting needs. Developed dozens of complex SSIS and SSRS reports in SQL Server 2005 and 2008. Involved in defining the source to target ETL data mappings, business rules and data definitions. Lead Business Intelligence reports development efforts by working closely with MicroStrategy, Teradata, and ETL teams Reported and analyzed all application defects, user issues and resolution status to the higher manager using Mercury Test Director. Prepared Test plans which include an introduction, various test strategies, test schedules, QA teams role, test deliverables, etc. Provided technical and functional consulting to corporate and divisional users for Cognos. Performed initial manual testing of the application as part of sanity testing Analyzed Business Requirements and segregated them into high level and low level Use Cases, Activity Diagrams / State Chart Diagrams using Rational Rose according to UML methodology thus defining the Data Process Models. Designed and developed use cases, activity diagrams, and sequence diagrams using UML. Conducted Functional Walkthroughs, User Acceptance Testing (UAT), and supervised the development of User Manuals for customers. Established traceability matrix using Rational Requisite Pro to trace completeness of requirements in different SDLC stages. Performed GAP analysis for the modules in production, conducted feasibility study and performed impact analysis for proposed enhancements, Troubleshot the designed jobs using the DataStage Debugger. Conducted status report meetings with the business and the IT team on a weekly basis, as the most important aspect of the project is Data Mapping, Used Query Analyzer, Execution Plan to optimize SQL Queries Designed Physical Data Model using Erwin 4.1 for Projection and Actual database Worked with object modelers, worked with Business Analysts. Involved in defining the source to target ETL data mappings, business rules and data definitions. Performed extensive requirement analysis including data analysis and gap analysis. Worked on project life cycle and SDLC methodologies including RUP, RAD, Waterfall and Agile. Created Error Files and Log Tables containing data with discrepancies to analyze and re-process the data. Developed business process models in RUP to document existing and future business process. Education: Bachelors in Computer s Science from JNTUH, 2012 Keywords: csharp continuous integration continuous deployment quality analyst machine learning javascript business intelligence sthree database active directory rlang information technology green card microsoft Connecticut Georgia Idaho Illinois Massachusetts Minnesota Keywords: csharp continuous integration continuous deployment quality analyst machine learning javascript business intelligence sthree database active directory rlang information technology green card microsoft Connecticut Georgia Idaho Illinois Massachusetts Minnesota |