Prajakta - Data Engineer |
[email protected] |
Location: Plano, Texas, USA |
Relocation: any |
Visa: H1b |
SUMMARY
A passionate Data Engineer with a strong Statistic background and 10 years of professional experience working in Power BI, ETL, SQL, Azure, Scala, Spark, Python etc., Proficient in predictive modelling, Data processing, Databricks and Data mining. Helping companies with valuable insights using Statistical methods. 10 years+ of IT experience in the field of Data Engineering, analysis, modelling, development, and Project Management Skilled experience in Python with proven expertise in using new tools and technical developments. Good amount of experience in Azure SQL DB, Data Factory, Databricks (PySpark & Spark SQL), Data Lake, Analysis Services & PowerShell Strong skills in analysing user requirements and translating them into effective business solutions through BI applications. Proven expertise in Microsoft Azure and BI tools, including Power BI, Azure SQL Database, and Azure Synapse Analytics (SQL Data Warehouse). High expertise in creating different visualizations in Power BI using Slicers, Lines charts, Pie chart, Maps, Bar chart, Gauge, Donut chart, Tree maps, KPI, Scorecards based on business determined requirements Worked with various transformations like Normalizer, expression, rank, filter, group, aggregator, lookups, joiner, sequence generator, sorter, SQLT, stored procedure, Update strategy, Source Qualifier, Transaction Control, Union, CDC. Strong experience in Business and Data Analysis, Data Profiling, Data Migration, Data Conversion, Data Quality, Data Integration and Metadata Management Services and Configuration Management Extensively worked on Spark using Scala on cluster for computational (analytics), installed it on top of Hadoop performed advanced analytical application by making use of Spark with Hive and SQL/Oracle/Snowflake. Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Pyspark concepts. Used Scala set to develop Scala coded spark projects and executed using spark-submit. Comprehensive knowledge and experience in process improvement, normalization/de-normalization, data extraction, data cleansing, data manipulation Managed end-to-end Salesforce implementation, including customization, configuration, and integration to meet business needs. Having good experience in writing Python Lambda functions and calling the API s. Administered user profiles, roles, security settings, and data access in Salesforce. Designed and implemented workflows, approval processes, and automation to streamline business processes. Data Management and Migration Strong experience in interacting with stakeholders/customers, gathering requirements through interviews, workshops, and existing system documentation or procedures, defining business processes, identifying, and analyzing risks using appropriate templates and analysis tools. Expertise in creating complex SSRS reports against OLTP, OLAP databases. Experience with Web Services, SoapUI tool, XML, Validating request and response XML, SOAP and RESTFUL Web Service calls. Real-time data processing using Azure Event Hub and then reading data through. Extensively worked on various GCP infrastructure design and implementation strategies and experienced in designing, architecting and implementing data warehouse solutions on GCP. Experience in Rest API data modelling (Dimensional & Relational) concepts like Star-Schema Modelling, Snowflake Schema Modelling, and Fact and Dimension tables. Generated periodic reports based on the statistical analysis of the data from various time frame and division using SQL Server Reporting Services (SSRS). Ability to effectively adapt to rapidly changing technology and apply it to business needs. Self-motivated and ability to learn new tools. Excellent problem-solving skills with strong technical background and good interpersonal skills. Quick learner and excellent team player, ability to meet deadlines and work under pressure. TECHNICALSKILLS Cloud Technologies Azure Data Factory, Azure Synapse, Azure SQL DB, Azure Data Lake Storage, Azure Data Bricks, Rest API, Azure DevOps, AWS, AWS Kubernates, GCP migration Programming languages T-SQL, MySQL, Java, Python, C#, DAX Web Technologies HTML, XML, CSS. Big Data Technologies HDFS, Hive, Oozie, Sqoop, Pyspark, Scala, Azure, ADF, DataBricks, Hadoop Ecosystem, Spark, Airflow, ETLs, Kafka. Operating Systems Windows 10/8/7, Linux. Databases MS SQL Server 2008R2, 2012 and 2014,2016, Oracle 11g, MySQL, PostgreSQL, MongoDB. SQL Server Tools SSMS, Visual Studio MSBI(SSIS ,SSRS,SSAS) 2010,2017. Source Control Tools GIT. Reporting Tools SSRS, Microsoft Power BI, Tableau, Excel, Adobe Analytics Project Management Tools: Jira, Confluence Others: Dataiku,SharePoint,MATLABSimulink,QTP,HPQualityCenter(ALM),ArcGIS,SPSS,JMP PROFESSIONALEXPERIENCE HCL Technology, California June 2023-Till Date Role: AWS Data Engineer Responsibilities: Designed azure data factory pipelines to get the data from Legacy systems. Created Databricks notebooks to implement business rules and load the into azure delta table Parse the log data into structured format based on user requirement Involved in the Development using Spark SQL with Python. Generated various efficient Spark scripts to handle the 15mins data to monthly data. Developed several complex SQL Scripts in SQL Server and T- SQL (DDL and DML) in constructing Tables, Normalization/ De-normalization Techniques on database Tables. Created and Updated Clustered and Non-Clustered Indexes to keep up the SQL Server Performance. Migrated data pipelines from Hadoop/MapR platform to AWS/Kubernetes Developed ETL Packages utilizing SSIS and Rest API put away methodology for information stacking. Scheduled and observed the ETL Packages. Used AWS Glue for transformations and AWS Lambda to automate the process. Designed infrastructure for AWS application and workflow using Terraform and had done implementation and continuous delivery of AWS infrastructure using Terraform. Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators. Experience in GCP, GCS, Cloud functions, BigQuery. Migrated on premise database structure to Confidential Redshift data warehouse Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. Experience in moving data between GCP and Azure using Azure Data Factory. Experience in building power bi reports on Azure Analysis services for better performance. Utilized T-SQL proclamations to compose complex questions. Gained experience in Database Backup, Recovery and Disaster Recovery procedures. Used AWS EMR to transform and move large amounts of data into and out of AWS S3. Design and Develop ETL Processes in AWS Glue to migrate data from external sources like S3, Parquet/Text Files into AWS Redshift. Used AWS EMR to transform and move large amounts of data into and out of other AWS data stores and databases, such as Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB. Create develop and test environments of different applications by provisioning Kubernetes clusters on AWS using Docker, Ansible, and Terraform. Creating the sqoop jobs to fetch data daily from Oracle system to Hive table. Analyzing the table indexes and partitions selections for data and access. Experienced in managing and reviewing Spark log files for troubleshoot & debug. Utilized Stored Procedures extensively. Developed new put away strategies, adjusted and tuned the existing ones with the end goal of excellent performance. Experience with Snowflake Multi - Cluster Warehouses. Experience in Splunk reporting system. Was responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark. Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs using CloudWatch. Understanding of SnowFlake cloud technology. Experience with Snowflake cloud data warehouse and AWS S3 bucket for int egrating data from multiple source system which include loading nested JSON formatted data into snowflake table. Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP. Led data migration efforts, ensuring seamless transition of data from legacy systems to Salesforce. De-duplicated and cleansed data to maintain data integrity and improve system performance. Utilized data loader tools to import/export data between Salesforce and external sources. Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc. Created Lambda functions to run the AWS Glue job based on the AWS S3 events. Customized Salesforce objects, fields, page layouts, and record types to align with business requirements. Developed custom Visualforce pages, Lightning components, and Apex triggers to extend platform functionality. Communicate/negotiate and worked directly with the business analysts, product owners to define mapping specifications based on data analysis. Perform ETL (extract, transform, and load) data imports, exports, and modifications from legacy System to New System. Infosys, Connecticut Dec 2022-Jan 2023 Role: ETL Developer/Data Engineer Responsibilities: Integrated third-party applications using REST/SOAP APIs, enhancing overall system capabilities. Reports and Dashboards: Created and customized reports and dashboards to provide real-time insights into sales, marketing, and customer service performance. Designed and Developed ETL jobs to extract data from Salesforce replica and load it in data mart in Redshift. Utilized Salesforce's reporting features to track key metrics, analyse trends, and inform strategic decision-making. Conducted training sessions to on board new users and improve adoption of Salesforce within the organization. Provided ongoing user support, troubleshooting issues, and assisting with user-related inquiries. Sales Cloud and Service Cloud Implemented and configured Sales Cloud to optimize lead management, opportunity tracking, and sales forecasting. Experience in implementing AWS lambda to run servers without managing them and to trigger run code by S3 & SNS. Use Lambda functions and Step Functions to trigger Glue Jobs and orchestrate the data pipeline. Create data ingestion modules using AWS Glue for loading data in various layers in S3 and reporting using Athena and QuickSight. Worked on implementing Data warehouse solutions inAWS Redshift, worked on various projects to migrate data from one database toAWS Redshift, RDS, ELB, EMR, Dynamo DB and S3 In-depth knowledge of Snowflake Database, Schema and Table structures. Define virtual warehouse sizing for Snowflake for different type of workloads. Configured Service Cloud to enhance case management, support ticket routing, and customer communication. Implemented Database Stored Procedures, joined Local and Remote tables, updated Data in Transactions. Designed, Developed and Deployed reports in MS SQL Server environment utilizing SSRS 2008 R2 Generated such SSRS reports as Cascade sort reports, Tabular reports, and Drill down reports Performed data validation and cleansing of staged input records before loading into Data Warehouse Customized and created reports utilizing Microsoft SQL Reporting Services (SSRS) Developed all the data visualizations in Power BI requested by the business users Designed and developed the Power BI solution to create reports, visualizations, and dashboards as per the business requirements and published enterprise. Mount Sinai, New York Feb 2022 - Sept 2022 Role: Data Engineer Responsibilities: Designed azure data factory pipelines to get the data from Legacy systems. Created Databricks notebooks to implement business rules and load the into azure delta table Parse the log data into structured format based on user requirement Involved in the Development using Spark SQL with Python. Generated various efficient Spark scripts to handle the 15mins data to monthly data. Involved in tuning of spark jobs using spark configurations, RDDs. Performed significant role in upgrading the system to Spark 2.0 with Data frames and optimizing the jobs to best utilization of Tungsten Engine Integrated lambda with SQS and DynamoDB with step functions to iterate through list of messages and updated the status into DynamoDB table. Created ETL pipelines for migrating existing data from SQL Server/Oracle to AWS S3 buckets. Transferred unstructured xml data into standardized CSV formats for integration into downstream systems. Validated the data integrity of large datasets in Hive and Athena and collaborated with the development team on ETL tasks to maintain data integrity and verify pipeline stability. Prepared documentation and analytic reports, effectively summarizing results, analysis, and conclusions to stakeholders. Creating different AWS Lambda functions and API Gateways, to submit data via API Gateway that is accessible via Lambda function. Orchestrated and migrated CI/CD processes using Cloud Formation and Terraform, packer Templates and Containerized the infrastructure using Docker, which was setup in OpenShift, AWS and VPCs. Developed Automation scripts using UNIX/Python which incorporates the business process for data processing. Creating the sqoop jobs to fetch data daily from Oracle system to Hive table. Analyzing the table indexes and partitions selections for data and access. Integrated Salesforce with other business systems, such as ERP or marketing automation platforms, to ensure data consistency and flow. Responsible for Building Cloud Formation templates for SNS, SQS, Elastic search, Dynamo DB, Lambda, EC2, VPC, RDS, S3, IAM, Cloud Watch services implementation and integrated with Service Catalog. Evaluated and implemented AppExchange apps to extend Salesforce's capabilities, addressing specific business needs. Wrote scripts and indexing strategy for a migration to Confidential Redshift from SQL Server and MySQL databases. Implemented security best practices, including role hierarchy, sharing rules, and data encryption, to maintain data security and compliance. Supported compliance efforts by configuring Salesforce to adhere to industry-specific regulations (e.g., GDPR, HIPAA). Experienced in managing and reviewing Spark log files for troubleshoot & debug. Designed and developed both Managed and External tables using Spark SQL. Worked closely on increasing system performance by reducing the I/O by identifying the process gaps and tuning the queries. Creating the shell /python scripts to fetch the log file from NFS servers on hourly basis. Attend the daily status call working with the business users closely. Creating the analytical views for ARCADIA dashboard to fetch the results quickly. DXC Technology, India Jan 2017 Jan 2020 Role: Data Engineer Responsibilities: Worked as Data Analyst for requirements gathering, business analysis and project coordination. Worked with other Data Analysis team to gathering the Data Profiling information. Responsible for the analysis of business requirements and design implementation of the business solution. Using SQL, performed data analysis and data validation from multiple data sources of T-Mobile and Orange customer databases to build a single optimized process for combined customer base. Wrote complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process. Worked with multiple development teams to integrate, upgrade, and monitor multiple T-Mobile and Orange tools to streamline porting process. Analysed, improved, and created documentation for multiple processes for customer s account changes, which led to improving customer loss rate by 7%. Hands-on experience on working with AWS services like Lambda function, Athena, DynamoDB, Step functions, SNS, SQS, S3, IAM etc. Successfully increased customer insights and conversations by 36% by developing root cause analysis reports by using previous customer conversations and associated CRM data Performed AB testing on recently developed and integrated tools using training and testing datasets Experience in Agile, Waterfall technologies Sets and runs effective recurring status meetings with the product, portfolio, and delivery managers Identifying and providing remediation on vulnerabilities for specific applications in use by various teams. HTC GLOBAL SERVICES, India Jun 2013 Dec 2016 ROLE: Data Analyst Responsibilities: Data collection, analysis, present and communicate the data insights. Created impactful dashboards in Tableau, Power BI for data reporting and identified valuable insights through which we oversaw a 22% increment in sales Providing daily reports to overcome backlogs, go over KPIs with leaders, detecting gaps in the process flow, gaining business knowledge on the Paper side. Building reports and dashboards on Looker, Salesforce and Power BI. Automated daily workable lists for various teams through SQL Queries and Python. Keywords: csharp continuous integration continuous deployment business intelligence sthree database information technology golang microsoft procedural language Delaware |