Bhavana - Data Enginner |
[email protected] |
Location: Anderson, Indiana, USA |
Relocation: Yes |
Visa: GC EAD |
BHAVANA SHETTY
https://www.linkedin.com/in/bhavana-shetty-646b12283/ Professional Summary: 9+ years of IT Industry experience in the analysis designing testing maintaining database ETL, Bigdata, Cloud, and Data warehouse applications. Experience in designing, developing, execute and maintaining data extraction, transformation, and loading for multiple corporate Operational Data Store, Data warehousing, and Data mart systems. Ability to collaborate with peers in both business and technical areas, to deliver optimal business process solutions, in line with corporate priorities. Strong experience in interacting with stakeholders/customers, gathering requirements through interviews, workshops, and existing system documentation or procedures, defining business processes, identifying, and analyzing risks using appropriate templates and analysis tools. Strong in ETL, Data warehousing, Operations Data Store concepts, data marts and OLAP technologies. Fashioned on different libraries related to Data science and Machine learning like Pandas, NumPy, SciPy, Matplotlib, Seaborn, Bokeh, nltk, Scikit - learn, OpenCV, TensorFlow, Theano and Keras. Excellence in handling Big Data Ecosystems like Apache Hadoop, MapReduce, Spark, HDFS Architecture, Cassandra, HBase, Sqoop, Hive, Pig, MLlib, ELT. Experience with the use of AWS services includes RDS, Networking, Route 53, IAM, S3, EC2, EBS and VPC and administering AWS resources using Console and CLI. Hands on experience on building the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using NoSQL and SQL from AWS & Big Data technologies (Dynamo, Kinesis, S3, HIVE/Spark). Experience in Data Modeling with expertise in creating Star & Snow-Flake Schemas, FACT and Dimensions Tables, Physical and Logical Data Modeling using Erwin and Embarcadero. Worked in supporting integrated testing of interfaces and validation of data mapping, data migration, as well as data conversion activities conducted during pre-go-live and post go-live activities. Experience building and optimizing & big data, data pipelines, architectures, and data sets. (HiveMQ, Kafka, Cassandra, S3, Redshift) Vigorous Perspicacity and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications. Super-eminent understanding of AWS (Amazon Web Services), S3, Amazon RDS, Apache Spark RDD, process and concepts. Developing Logical Data Architecture with adherence to Enterprise Architecture. Extensive experience in designing, developing, and publishing visually rich and intuitively interactive Tableau workbooks and dashboards for executive decision making. Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL. Involved in all phases of software development life cycle in Agile, Scrum and Waterfall management process. Practical understanding in migrating on premise ETLs to Google Cloud Platform (GCP) using cloud native tools such as BIG query, Cloud Data Proc, Google Cloud Storage, Composer. Worked on multi cloud on GCP and Azure, and AWS. A self-motivated exuberant learner and adequate with challenging projects and work in ambiguity to solve complex problems independently or in the collaborative team. Technical Skills: Languages Python, SQL Database MS SQL, AWS Redshift, MongoDB, HBase, PostgreSQL, Teradata, Snowflake Big data Ecosystem Hadoop, PySpark, Kafka, Sqoop, Map Reduce, Yarn, Spark Streaming, MapReduce, Spark SQL, Hive, Pig Data warehouse, ETL SQL Server, Hive, Informatica NoSQL Databases Cassandra, MongoDB, HBase Data visualization Tableau, Spotfire, Excel, Power BI Cloud Technologies AWS, Glue, RDS, Kinesis, DynamoDB, Redshift Cluster Azure, AWS EMR. Python Libraries Pandas, NumPy, Matplotlib, Scikit-learn Source Control Visual Source Safe, Team Foundation Server (TFS), GitHub. Defect Tracker ALM, Jira. SDLC Agile, Design patterns. Scripting Languages Shell Scripting, Python. Others MS Excel, MS-Word, Win Merge, 7-zip, Notepad++, Active Directories. Operating System Linux, Windows, MacOS Professional Experience: Client: Truist, NC. Sep 2021 - till date Role: Sr. Data Engineer Responsibilities: Maintained and developed complex SQL queries, views, functions, and reports that qualify customer requirements on Snowflake. Performed analysis, auditing, forecasting, programming, research, report generation, and software integration for an expert understanding of the current end-to-end BI platform architecture to support the deployed solution. Hands on experience in resolving incident tickets related to Hadoop components like Hbase, Yarn, Hive, Kafka, and identifying root cause analysis. Developed JSON scripts for deploying the Azure Data Factory (ADF) Pipeline, which uses the SQL Activity to process the data. Advanced and developed test plans to ensure successful delivery of a project. Employed performance analytics predicated on high-quality data to develop reports and dashboards with actionable insights. Worked with the ETL team to document the transformation rules for Data migration from OLTP to Warehouse environment for reporting purposes. Design Setup maintain Administrator the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse. Pr cised Development and implementation of several types of sub-reports, drill down reports, summary reports, parameterized reports, and ad-hoc reports using Tableau. Acquainted with parameterized sales performance reports, done the reports every month and distributed them to respective departments/clients using Tableau. Used Spark-SQL to load JSON data and create schema RDD and loaded it into Hive Tables and handled structured data using Spark SQL. Worked on Snowflake schemas, data modeling and elements, and source-to-destination mappings, interface matrix, and design elements. Built Data Pipelines using Apache NIFI to analyze structured data by pulling it from Splunk and created Hive tables. Analyze data quality issues using Snow SQL by building an analytics warehouse on top of Snowflake. Helped individual teams set up repositories on Bit Bucket to maintain their code, and helped set up jobs that could leverage the CI/CD environment. Implemented a batch process to load the heavy volume data loading using Apache Dataflow framework using Nifi in Agile development methodology. Worked on ingestion of applications/files from one Commercial VPC to OneLake. Performed data wrangling to clean, transform and reshape the data utilizing panda s library. Analyzed data using SQL, Scala, Python, Apache Spark and presented analytical reports to management and technical teams. Implemented Kafka Security Features using SSL and without Kerberos. Further with more grain-fines Security I set up Kerberos to have users and groups this will enable more advanced security features. Using Linked Services/Datasets/Pipelines, Pipelines were created in ADF to extract, transform, and load data from a variety of sources, including Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool, and backward. Creating the High Level and Low-Level design document as per the business requirement and working with offshore team to guide them on design and development. Continuously monitoring for the processes which are taking longer than expected time to execute and tune the process. Used a variety of sources, including SQL Server, Excel, Oracle, SQL Azure to import data into Power BI. Optimized current pivot tables reports using Tableau and proposed an expanded set of views in the form of interactive dashboards using line graphs, bar charts, heat maps, tree maps, trend analysis, Pareto charts and bubble charts to enhance data analysis. Monitor system life cycle deliverables and activities to ensure that procedures and methodologies are followed, and that appropriate complete documentation is captured. Technical Environment: SQL, Snowflake, Python (Scikit -Learn/ Keras/ SciPy/ NumPy/ Pandas/ Matplotlib/ NLTK/ Seaborn), Tableau, Hive, Databricks, Airflow, PostgreSQL, Azure, JIRA, GitHub, NIfi. Kafka. Client: Harbourvest, MA. Sep 2018 Aug 2021 Role: Data Engineer Responsibilities: Worked in the Data transformation team for an internal application called NEXUS where I was responsible for all the Data Development, Data Analysis. Data Quality and Data Maintenance efforts of the application which my team was highly focusing on. Worked in Production Environment which involves building CI/CD pipeline using Jenkins with various stages starting from code checkout from GitHub to Deploying code in specific environment. Created various Boto scripts to maintain application in the cloud environment and automated it. Analyzed the Incident, Change and Job data from snowflake and created a dependency tree-based model on the occurrence of incident for every application service present internally. Helped businesspeople to minimize the manual work they were doing and created python scripts like LDA sourcing, OneLake, SDP. S3, Databricks, Data bench, Snowflake to get the cloud metrics and make their efforts easier. Merged with D4 Rise Analytics team and helped them to remove all the manual work they were doing on Incident data by generating metrics pulling data from various sources like ServiceNow, Snowflake, ARROW job data and few other API calls and created an Incident Dashboard with lot of intelligence built in it. Designed and implemented by configuring Topics in new Kafka cluster in all environments. Used Spark SQL to migrate the data from hive to python using pyspark library. Utilized AWS EMR to transform and transfer huge volumes of data between Amazon Simple Storage Service (Amazon S3) and Amazon DynamoDB, among other AWS data storage and databases. Designed and developed a security framework that provides fine-grained access to objects in AWS S3 using AWS Lambda, DynamoDB. I have used Hadoop technologies like spark and hive Including using the pyspark library to create spark dataframes and converting them to normal pandas dataframes for analysis. Created complex SQL Stored Procedures, Triggers, HIVE queries and User Defined Functions to manage incoming data as well as to support existing applications. Queried and analyzed large amounts of data on Hadoop HDFS using Hive and Spark Lead the interns project by getting them started on AWS, best practices, suggesting other simplified methods and monitoring their work weekly. Created the AWS cloud environment for the application, Designed the Network model for the application and worked very intensively in getting the application setup with various components in the AWS Cloud from the user interface to back-end. Built a prototype for a project containing various kinds of dashboard metrics for all the applications to have a central metric tracker. Built a process to get the usage of One Lake usage for all the LOB s for higher level VPs that includes Kibana data source, EFK, Kinesis streams and created AWS Lambda functions which automates in daily manner. Built interactive dashboards and stories using Tableau Desktop for accuracy in report generation applying advanced Tableau functionality: parameters, actions, and tooltip changes. Worked in testing an internal log builder application which was to be implemented all over the company. Technical Environment: Python 3.x, Tableau (9.x/10.x), Hadoop, HDFS, PySpark, Teradata, PostgreSQL, AWS, Jenkins, SQL, Snowflake, JIRA, GitHub, Agile/ SCRUM, Kafka. Client: Merck, Rahway, NJ Mar 2016 - Aug 2018 Role: Data Engineer Responsibilities: Developed pipeline using Hive (HQL) to retrieve the data from Hadoop cluster, SQL to retrieve data from Oracle database and used ETL for data transformation. Analyzed and gathered business requirements from clients, conceptualized solutions with technical architects, and verified approach with appropriate stakeholders, developed E2E scenarios for building the application. Derived data from relational databases to perform complex data manipulations and conducted extensive data checks to ensure data quality. Performed Data wrangling to clean, transform and reshape the data utilizing NumPy and Pandas library. We have worked with datasets of varying degrees of size and complexity including both structured and unstructured data and Participated in all phases of Data mining, Data cleaning, Data collection, variable selection, feature engineering, developing models, Validation, Visualization and Performed Gap analysis. Optimized lot of SQL statements and PL/SQL blocks by analyzing the execute plans of SQL statement and created and modified triggers, SQL queries, stored procedures for performance improvement. Implemented Predictive analytics and machine learning algorithms in Databricks to forecast key metrics in the form of designed dashboards on to AWS (S3/EC2) and Django platform for the company s core business. Participated in features engineering such as feature generating, PCA, Feature normalization and label encoding with Scikit-learn preprocessing. Data Imputation using variant methods in Scikit-learn package in Python. Used Sqoop to move data from oracle database into hive by creating a delimiter separated files and using these files in an external location to be used as an external table in hive and further moving the data into refined tables as parquet format using hive queries. Used Teradata utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems. Developed spark programs using Scala APIs to compare the performance of spark with HIVE and SQL. Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning, and buckets. Evaluated the performance of Databricks environment by converting complex Redshift scripts to spark SQL as part of new technology adaptation project. Lead engagement planning: developed and managed Tableau implementation plans for the stakeholders, ensuring timely completion and successful delivery according to stakeholder expectations. Managed workload and utilization of the team. Coordinated resources and processes to achieve Tableau implementation plans. Technical Environment: R, Python, ETL, Agile, Data Quality, R Studio, Tableau, Data Governance, Supervised & Unsupervised Learning, Java, NumPy, SciPy, Hadoop, Sqoop, HDFS, Spark SQL, Pandas, PostgreSQL, AWS (EC2, RDS, S3), Matplotlib, Scikit-learn, Shiny. Client: Edurun Virtuoso Services Pvt Ltd, India. Dec 2012- July 2014 Role: SQL Developer Responsibilities: Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization. Worked on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop. Loaded all datasets into Hive from Source CSV files using Spark and Cassandra from Source CSV files using Spark Created environment to access Loaded Data via Spark SQL, through JDBC ODBC (via Spark Thrift Server). Developed real time data ingestion/ analysis using Kafka / Spark-streaming. Configured Hive and written Hive UDF's and UDAF's Also, created Static and Dynamic with bucketing as required. Worked on writing Scala programs using Spark on Yarn for analyzing data. Managing and scheduling Jobs on a Hadoop cluster using Oozie. Created Hive External tables and loaded the data into tables and query data using HQL. Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data. Developed Oozie workflow for scheduling and orchestrating the ETL process and worked on Oozie workflow engine for job scheduling. Managed and reviewed the Hadoop log files using Shell scripts. Migrated ETL jobs to Pig scripts to do transformations, even joins and some pre-aggregations before storing the data onto HDFS. Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables. Real time streaming, performing transformations on the data using Kafka and Kafka Streams. Built NiFidataflow to consume data from Kafka, make transformations on data, place in HDFS & exposed port to run Spark streaming job. Developed Spark Streaming Jobs in Scala to consume data from Kafkatopics, made transformations on data and inserted to HBase. Implemented Spark using Scala and SparkSQL for faster testing and processing of data. Experience in managing and reviewing huge Hadoop log files. Collected the logs data from web servers and integrated in to HDFS using Flume. Expertise in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization. Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting. Worked with Avro Data Serialization system to work with JSON data formats. Used Amazon Web Services (AWS) S3 to store large amounts of data in identical/similar repository. Technical Environment: Spark, Spark SQL, Spark Streaming, Scala, Kafka, Hadoop, HDFS, Hive, Oozie, Pig, Nifi, Sqoop, AWS (EC2, S3, EMR), Shell Scripting, HBase, Jenkins, Tableau, Oracle, MySQL, Teradata and AWS. Keywords: continuous integration continuous deployment business intelligence sthree active directory rlang information technology golang microsoft procedural language Massachusetts New Jersey North Carolina |