Resume View

Home

andys sharath - data engineer or hadoop bigdata engineer

Location: Dallas, Texas, USA

Relocation:

Visa:

Resume file: Krishna-1_1743558891588.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.

Krishna Chaitanya. S
Sr Data Engineer

Andy
Ofice :: andy@msitus.com
Gmail :: andyssharath@gmail.com
Office # 5155170120
Direct # 9404407328

OBJECTIVE
Accomplished 8+ years' experience in cloud technology with a proven track record of leveraging Azure, AWS, and Google Cloud Platform to drive business agility and efficiency. Collaborative team player with excellent communication and problem-solving skills. Passionate about harnessing the power of cloud platforms to help organizations achieve their goals. Adept at working closely with stakeholders to understand requirements and implement cloud solutions that deliver measurable results. Committed to staying current with the latest cloud technologies and best practices to provide innovative solutions.

Professional Summary
Assisted in implementing highly available and scalable cloud solutions on the Google Cloud Platform, Microsoft Azure, and AWS (EC2, S3, VPC, IAM, Lambda, etc.).
Enthusiastic about writing automated scripts with Ansible, Terraform, PowerShell, GIT, and Python.
Supported in creating CI/CD pipelines using Azure DevOps, Jenkins, and AWS CodePipeline to deploy apps.
Competent in overseeing Git-based collaborative development and version control, guaranteeing project traceability and code integrity.
In order to ensure alignment with business objectives and demands, I worked with cross-functional teams to gather requirements and build customized data integration solutions.
Knowledgeable about using a variety of data visualization technologies, including Tableau and Power BI, to produce insights that are practical and support well-informed decision-making.
Expertise in Infrastructure as Code (IaC) tools like Terraform and CloudFormation, with a strong understanding of containerization technologies (Docker, Kubernetes).
Proven experience with monitoring and logging tools such as CloudWatch, Dynatrace, Splunk, and Solarwinds.

Certifications:

Microsoft Certified: Azure Data Engineer Associate
Certification number: I303 - 9328

TECHNICAL SKILLS :

Big Data Ecosystem HADOOP, MAPREDUCE, SQOOP,KAFKA, IMPALA
CI/CD Tools Azure DevOps, Jenkins, AWS
CodePipeline, GitHub Actions
Programming Languages Python, PowerShell, Bash, C, HTML,
JAVA, JSON.
Cloud Technologies AWS(EC2,S3,VPC,IAM,Lambda,RDS, CloudFormation), Google Cloud
Platform (Cloud Storage, BigQuery,
Cloud SQL, Cloud Functions, Cloud
Pub/Sub), Microsoft Azure (Azure
Storage, Azure Database, Azure Data
Factory)
Automation Tools Ansible, Terraform, CloudFormation
NoSQL Databases MongoDB, DynamoDB,CouchDB, CouchBase, Cassandra, HBase,
ETL/BI DATA STAGE, SSIS/SSRS/SSAS PACKAGES, TABLEAU, POWER BI
Operating System WINDOWS 7/8/10, UNIX, LINUX, MAC OS
Tools PyCharm, Eclipse, Visual Studio, SQL*Plus, SQL Developer, TOAD, SQL Navigator, Query Analyzer, SQL Server Management Studio, SQL Assistance, Eclipse
Database Tools MySQL, PostgreSQL, SQL, Mongo

Work Experience :

Client : Apollo Medical, St Louis, Missouri Jan 2022 - Present Role: Cloud Data Engineer

Key Responsibilities:
Designed and developed ETL pipelines using Google Cloud Storage, BigQuery, Cloud Functions, Dataflow, and Dataproc.
Worked on Docker-based containers for utilizing Apache Airflow on Google Cloud Platform.
Created Python scripts using Google Cloud client libraries to automate infrastructure provisioning tasks such as creating Cloud Storage buckets, creating Dataproc clusters, and submitting jobs to run Spark on Dataproc clusters.
Worked with Google Cloud Platform and created Dataproc clusters with Spark for analyzing raw data processing and accessing data from Cloud Storage buckets.
Developed and maintained ETL workflows using Informatica Power Center to extract, transform, and load data from various sources.
Worked on developing ETL streams using Databricks. Collaborated with stakeholders to gather requirements and design data integration solutions tailored to business needs.
Conducted performance tuning and optimization of Informatica mappings and workflows for improved efficiency and scalability.
Provided production support and troubleshooting for ETL processes to ensure data integrity and reliability.
Developed python code for different tasks, dependencies, SLA watcher and time sensor for each job for workflow management and automation using Airflow tool.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala, and Python.
Designed and implemented highly performant data ingestion pipelines from multiple sources using Apache Spark
Developed PySpark script to setup the data pipeline.
Key contributor in building the complete workflows, triggers, PySpark jobs, crawlers, Lambda functions.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, PySpark, and Python.
Worked as a Big Data/Hadoop Developer with Hadoop Ecosystems components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
Developed and optimized SQL queries for data extraction, transformation, and analysis in relational databases.
Implemented and maintained NoSQL databases such as MongoDB and Cassandra for efficient storage and retrieval of unstructured data.
Managed PostgreSQL databases, including schema design, performance tuning, and query optimization.
Configured and maintained MySQL databases, ensuring data integrity, security, and high availability for critical applications.
Hands-on experience with Snowflake utilities, Snow SQL, Snow Pipe, Big Data model techniques using Python.
ETL pipelines in and out of data warehouse using combination of Python and Snowflakes Snow SQL Writing SQL queries against Snowflake.
Created interactive and insightful data visualizations using Tableau for business stakeholders to make informed decisions.
Developed dynamic dashboards and reports in Power BI to provide actionable insights into key performance metrics.
Utilized Git for version control and collaborative development of data engineering projects, ensuring code integrity and team productivity.
Managed project tasks and tracked progress using Jira, facilitating effective project management and team coordination.
ENVIRONMENT: Google Cloud Storage, BigQuery, Airflow, Boto3, Informatica power Center, Databricks, Spark RDD, Scala, Python, PySpark, Hadoop Ecosystem, HBase, Sqoop, Zookeeper, Oozie, Hive, Pig, Cloudera, SQL, NoSQL, MongoDB, Cassandra, PostgreSQL, MySQL, Snowflake, Snow SQL, Snow Pipe, Big Data, Tableau, Power BI, Git, Jira.

Client: Fifth Third Bank, Cincinnati, Ohio Apr 2018 Dec 2021 Role: Sr Data Engineer

Key Responsibilities:
Helped in setting up storage accounts, Azure Data Factory, SQL servers, SQL databases, and SQL data warehouses for the implementation of Azure data solutions.
Setup and maintained Azure and AWS highly accessible container-based clusters using Kubernetes and ECS to handle workloads and secure storage systems in a multi-cloud setting.
SQL Server stored procedures were created to standardize DML operations like inserting, updating, and deleting data from the database.
Using Dynatrace and CloudWatch, monitored and optimized Azure and AWS resources for connection and performance, leading to an increase in application performance and a decrease in resource waste.
Created technical requirements for team members, runbooks, ops manuals, revised presentations, and formal papers.
Closely worked with developers, product analysts, stakeholders, and end users to understand features and technical implementations.
Implemented Azure Functions and AWS Lambda to execute REST API activities and orchestrate data transfers to Azure Blob Storage and AWS S3.
Managed multiple cloud accounts with Terabytes of data and adhered to company standards of confidentiality, security, and privacy. Utilized Azure Data Factory to schedule and monitor job performance, ensuring timely execution and efficient data processing.
Leveraged Azure Monitor to track and analyze log activities across various Azure services, maintaining visibility and ensuring compliance.
Employed Azure Synapse Analytics (formerly SQL Data Warehouse) as a high-performance database and data warehouse solution for advanced analytics.
Implemented Azure Functions to execute REST API activities and orchestrate data transfers to Azure Blob Storage.
Designed and implemented complex ETL workflows using Informatica power Center to extract, transform, and load data from heterogeneous sources.
Developed data mappings and transformations to ensure accurate and efficient data processing in Informatica.
Optimized ETL processes for performance and scalability, enhancing data processing efficiency and reducing load times.
Conducted data quality checks and error handling mechanisms in ETL pipelines to ensure data integrity and reliability.
Developed tools using Python, Shell scripting, XML to automate some of the menial tasks.
Involved in converting Hive queries into Spark transformations using Spark RDDs, Python and Scala.
Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
Developed analytical components using Kafka and Spark Stream.
Developed POC using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
Installed and Configured Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
Developed stored procedures in SQL Server to standardize DML transactions such as insert, update and delete from the database.
Created SSIS package to load data from Flat files, Excel and Access to SQL server using connection manager.
Develop all the required stored procedures, user defined functions and triggers using T-SQL and SQL.
Produced report using SQL Server Reporting Services (SSRS) and creating various types of reports.
Implemented data warehousing solutions using Snowflake to store and analyze large volumes of structured and semi-structured data.
Conducted data modeling and schema design in Snowflake to optimize data storage and query performance for analytics purposes.
Developed interactive dashboards and visualizations in Tableau and Power BI to provide insights into data trends and performance metrics.
Managed version control and collaborated on code repositories using Git to ensure consistency and traceability in data engineering projects.
Utilized Jira for project management and issue tracking, facilitating effective communication and task prioritization within the team.

ENVIRONMENT ::Azure, PowerShell, Windows, Kubernetes, Dynatrace, SQL, Global Scape, Azure DevOps, Azure Data Factory, Azure Monitor, Azure Synapse Analytics, Azure Functions, Informatica Power Center, Python, Shell scripting, XML, Spark RDDs, Scala, Spark-SQL/Streaming, Kafka, Apache Hadoop, Hive, Pig, HBase, Zookeeper, Sqoop, SQL Server, SSIS, T-SQL, SSRS, Snowflake, Tableau, Power BI, Git, Jira.

Client :Brivo, Bethesda, Maryland Jan 2017 - Mar 2018

Role :Azure Data Engineer

Brivo is the global leader in mobile, cloud-based security access control for commercial real estate, multifamily residential, and large distributed enterprises. Our comprehensive product ecosystem and open API provide businesses with powerful digital tools to increase security automation, elevate employee and tenant experience, and improve the safety of all people and assets in the built environment. Having created the category over twenty years ago, our building access platform is now the digital foundation for the largest collection of customer facilities in the world.

Key Responsibilities:

Analyze, create, and construct modern data solutions that allow data visualization utilizing the Azure PaaS service. Determine the impact of the new implementation on existing business processes by understanding the present status of the application in production.
Using a mix of Azure Data Factory, T-SQL, Spark SQL, and U-SQL, extract, transform, and load data from sources systems to Azure Data Storage services. Azure Data Lake Analytics is a service provided by Azure. Ingestion of data into Azure Data Lake, Azure Storage, Azure SQL, Azure DW and processing in Azure Databricks.
Pipelines were created in ADF utilizing Linked Services/Datasets/Pipeline/ to extract, transform, and load data from a variety of sources, including Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backwards compatibility.
Developed Spark applications for data extraction, transformation, and aggregation from numerous file formats using Pyspark and Spark-SQL, then analyzed and transformed the data to reveal insights about consumer usage trends.
The Spark databricks cluster is responsible for predicting cluster size, monitoring, and troubleshooting.Experienced in Spark application performance adjustments, including setting the optimal Batch Interval time, parallelism level, and memory tuning.
UDFs in Scala and Pyspark were written to address unique business requirements. Developed JSON Scripts for installing the Pipeline in Azure Data Factory (ADF), which uses the SQL Activity to process the data.
Practical experience writing SQL Scripts for automation purposes.Using Visual Studio Team Services, I created Build and Release for numerous projects (modules) in a production environment (VSTS).
Using Business Intelligence Development Studio, created SIS Packages to transfer data from flat files, Excel, and SQL Server. Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task, and Send Mail task are all frequently used SIS transformations.
To guarantee that useful and correct data was reported effectively, I performed data cleansing, enrichment, mapping, and automated data validation processes.
Excellent in High Level Design of ETL DTS Packages & SIS Packages for integrating data using OLE DB connection from heterogeneous sources like (Excel, CV, Oracle, flat file, Text Format Data) by using multiple transformations provided by SSIS such as Data Conversion, Conditional Split, Bulk Insert, Merge and union all. experience working with SSAS to deploy analysis services projects and create cubes, named queries, data source views, and dimensions.
BI reporting solutions in Power BI and Reporting Services were provided (SSRS).
Connected to several data sources, including Excel and SQL Server, to create reports and charts, and used the Power BI tool to build drill-down capability.
Using Power Bl, create dashboards for Fundraising and Biomed, as well as Organization 360.Using the Data Analysis Expression language, I created sophisticated computed metrics (DAX).
Participates in data analysis for source and target systems and has a solid grasp of data warehousing principles, staging tables, dimensions, facts, and star and snowflake schemas.
Reengineering business processes to maximize the use of IT resources. Integration of several data sources, including COBOL files, delimited flat files, SQL Server, Oracle 10g, and XML files.
Transformed data from various sources like excel and text files into reporting database to design most analytical reporting system. Initiated the data modelling sessions, to design and build/append appropriate data mart models to support the reporting needs of applications. Involved in Data Extraction from Oracle and Flat Files using SQL Loader Designed and developed mappings using Informatica.
Created Hive tables, loaded them with data, then wrote hive queries that ran in map reduce internally. Partitioning, Dynamic Partitions, and Buckets were implemented in HIVE.
SQOOP was used to develop the code for importing and exporting data into HDFS and Hive. SQOOP was used extensively to import and export data from HDFS to DB2 Database systems, as well as putting data into HDI. According to the requirements, I wrote HIVE and PIG scripts.
I have worked with Node.JS where we set up the application using Node and most of the forms, application dynamic pages were developed using Node by the team and I was supporting the development of the same.
According to the requirements, I designed and developed managed/external tables in HIVE. Involved in the creation of HIVE UDFs. Turnover and promotion of code to QA, as well as creation of CR and CRQ for the release Using Python, implement a proof-of-concept to transfer map reduction workloads to Spark DD transforms.
Had the chance to analyze several technologies in order to construct many Proof of Concept streaming / batch applications (Kafka) to bring in data from multiple sources, convert it, and load it into target systems, which were then successfully implemented in production. Build apps utilizing Cloudera/Hortonworks Hadoop distributions with a lot of experience.
To access the database, I created complicated SQL queries and used JDBC connection. To create the reports for presales and secondary sales predictions, I used SQL queries.
Environments: Azure services (Azure Data Lake, Azure Storage, Azure SQL, Azure DW ) , Azure Databricks,Azure Data Factory, Pyspark, Spark-SQL, Scala, Power Bl, HIVE, SQOOP, PIG, Hadoop, KAFKA.
Keywords: cprogramm continuous integration continuous deployment quality analyst javascript business intelligence sthree database information technology

To remove this resume please click here or send an email from andyssharath@gmail.com to usjobs@nvoids.com with subject as "delete" (without inverted commas)

andyssharath@gmail.com;5127

Enter the captcha code and we will send and email at andyssharath@gmail.com
with a link to edit / delete this resume
Captcha Image: