Manvitha Kancharla - Big Data/Hadoop Developer |
[email protected] |
Location: Dallas, Texas, USA |
Relocation: Yes |
Visa: H1B |
PROFESSIONAL SUMMARY:
8+ years of professional experience in IT industry, involving experience with Big Data tools in developing applications using Apache Hadoop/Spark echo systems. Hands on experience in installing, configuring and architecting Hadoop and Hortonworks clusters and services - HDFS, MapReduce, Yarn, Pig, Hive, Oozie, Flume, HBase, Spark, Sqoop, Flume and Oozie. Experience in writing Spark applications using Python and Scala. Experience in developing Java UDFs for Hive and Pig. Experience in NoSQL DBs like HBase, MongoDB and Cassandra and wrote advanced query and sub-query. Develop Scala UDF S to process the data for analysis. Experience in developing and executing manual and automated tests in different platforms using Python, Pytest/Unit test/Robot and the Selenium library. Extensive working experience on Confluent Kafka and Kafka components and real time message systems. Experience in real time analytics with Apache Spark RDD, Data Frames and Streaming API. Responsible for writing MapReduce programs. In depth knowledge of MapReduce, Sqoop, Hive, Impala, Oozie, Kudu, Pig and Spark/Scala. Schedule all Hadoop/Hive/Sqoop/HBase jobs using Oozie. Experience in loading data to Hive partitions and created buckets in Hive and developed MapReduce jobs to automate transfer the data from HBase. Set up clusters in Amazon EC2 and S3 including the automation of setting & extending the clusters in AWS Enable Data Sharing of production data in in lower environments in Snowflake Implement Spark Streaming jobs in Scala by developing RDD's (Resilient Distributed Datasets) and used Pyspark and Spark-shell accordingly. Use Spark Data Frames API over Cloudera platform to perform analytics on Hive data. Designed and implemented a process to migrate all existing databases from one version of Postgresql to another without downtime. Experience in integrating Hadoop with Kafka, experienced in uploading Clickstream data from to HDFS. Hands on experience on Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL. Expert in utilizing Kafka for messaging and publishing subscribe messaging system. Publish report that shows snowflake utilization of credits, cost by warehouse, user/group that can be sent to stakeholders every week Experience with Docker and Kubernetes on multiple cloud providers, from helping developers build and containerize their application (CI/CD) to deploying either on public or private cloud. Experience in utilizing DataProc and DataFlow components provided by GCP (Google Cloud Platform) for streaming and batch applications. Established backups using zero-copy clones in snowflake for all databases (RAW & DW Layers) DevOps Practice for Micro Services using Kubernetes as Orchestrator. Create templates and wrote Shell scripts (Bash), Ruby, Python and PowerShell for automating tasks. Good knowledge and hands on Experience in monitoring tools like Splunk, Nagios. Hands on experience with integrating Rest API to cloud environment to access resources. TECHNICAL SKILLS: Hadoop Technologies Apache Hadoop, Cloudera Hadoop Distribution (HDFS and Map Reduce) Technologies HDFS, YARN, MapReduce, Hive, Pig, Sqoop, NiFi, Big Query, Flume, Spark, Kafka, Zookeeper, and Oozie. Data Visualization Tableau, Tableau Desktop, Tableau online, Tableau Server Power BI, QlikView, Databricks, Seaborn, Plotly, ggplot2. NOSQL Databases HBase, MongoDB, Cassandra. Programming Languages Python, SQL, Java, Scala. Web Technologies HTML, J2EE, CSS, JavaScript, AJAX, Servlet, JSP, DOM, XML. Application Servers Web Logic, Web Sphere, JBoss. Cloud Computing tools Amazon AWS. Build Tools Jenkins, Maven, ANT, JIRA Databases MySQL, Oracle, DB2, MS SQL Server, SQL, Snowflake, Spark SQL, Databases SQL-Server, MS Access. Business Intelligence Tools Splunk, Talend. Development Methodologies Agile/Scrum, Waterfall. Development Tools Microsoft SQL Studio, Toad, Eclipse, NetBeans. Operating Systems WINDOWS, MAC OS, UNIX, LINUX. . WORK EXPERIENCE: OtterBox, Colorado Oct 2020 - Till Date Big Data Hadoop Developer Responsibilities: Develop micro-services using Python scripts in Spark Data Frame APIs for the semantic layer. Develop Spark scripts by using Scala as per the requirement. Develop predictive analytic using Apache Spark Scala APIs. Creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume. Used Oozie to orchestrate the MapReduce jobs that extract the data on a timely manner. Involve in designing Kafka for multi data center cluster and monitoring it. Involve in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala. Develop Python Spark streaming scripts to load raw files and corresponding. Implemented PySpark logic to transform and process various formats of data like XLS, XLS, JSON, and TXT. Elaborated Python Scripts to fetch/get S3 files using Boto3 module. Involve in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive. Use NiFi for transferring data from source to destination and Responsible for handling batch as well as Real-time Spark jobs through NiFi. Designing and building multi-terabyte, end-to-end Data Warehouse infrastructure on Snowflake for large-scale data handling Millions of records every day. Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data. Assist with design resolution by working with couchbase engineering support team assist app teams during stress testing application with couchbase. Involved in End-to-End migration of 1000+ Objects from SQL server/Netezza to Snowflake. Managed and supported a large Postgresql database with over 100TB of data, including backups, restores, monitoring, and performance tuning. Created roles and access level privileges and taken care of Snowflake Admin Activity end to end. Built the complete data ingestion pipeline using NiFi which POST s flow file through invoke HTTP processor to our Micro services hosted inside the Docker containers. Performed DBA tasks on Snowflake Data Warehouse to design the Table creations, loading data, tuning quarry performance of Systems. Involve in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way. Design and implemented MapReduce based large-scale parallel relation-learning system. Process Multiple Data sources input to same Reducer using Generic Writable and Multi Input format. Perform data profiling and transformation on the raw data using Pig and Python. Visualize the HDFS data to customer using BI tool with the help of Hive ODBC Driver. Create Hive Generic UDF's to process business logic that varies based on policy. Move Relational Data base data using Sqoop into Hive Dynamic partition tables using staging tables. Monitoring Cluster using Cloudera manager. Implement MapReduce counters to gather metrics of good records and bad records. Built data governance processes, procedures, and control for Data Platform using Nifi. Use Cloud Shell SDK in GCP to configure the services DataProc, Google Cloud Storage and Big Query. Environment: Spark, Kafka, Hive, Flume, Scala, Python, Java, MapReduce, HDFS, NiFi, Hive, couchbase, Postgresql, Snowflake ,Pig, Cloudera, PySpark. Walmart Labs, Sunnyvale, CA Oct 2019 Sep 2020 Hadoop Developer Responsibilities: Developed Spark streaming application to pull data from cloud to Hive table. Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark. Used Spark SQL to process the huge amount of structured data. Developed various Big Data workflows using custom MapReduce, Pig, Hive and Sqoop. Developed Python code to gather the data from HBase (Cornerstone) and designs the solution to implement using PySpark. Used all Snowflake utilities such as SnowSQL, SnowPipe, Python, Tasks, Streams, Time travel, Optimizer, Metadata Manager, data sharing, and stored procedures. Involved in designing Kafka for multi data center cluster and monitoring it. Creating Oozie workflows and coordinator jobs for recurrent triggering of Hadoop jobs such as Java map-reduce, Pig, Hive, Sqoop as well as system specific jobs (such as Java programs and shell scripts) by time (frequency) and data availability. Experience on snowflake with snow pipe, streams and tasks. Loading the data from multiple Data sources like (SQL, DB2, and Oracle) into HDFS using Sqoop and load into Hive tables. Assist with design couchbase high availability and failover capabilities to meet application requirement. Developed Spark Streaming Jobs in Scala to consume data from Kafka Topics, made transformations on data and inserted to HBase. Setup Active Directory integration & Single Sign On for Snowflake cloud. Analyzed large data sets by running Hive queries, and Pig scripts. Cascade Jobs introduced to make the data Analysis more efficient as per the requirement. Built re-usable Hive UDF libraries which enabled various business analysts to use these UDF s in Hive querying. Create Admin account, Users, granted Group Level Permission, and created test account on Snowflake platform. Developed Python scripts to clean the raw data. Firewall rules opened for EKS, S3, EC2, Talend, Informatica, EDC, Tableau, Looker, and OKTA to Snowflake. Developed Simple to complex MapReduce Jobs using Hive and Pig. Worked on NoSQL databases including HBase, MongoDB, and Cassandra. Performed data analysis in Hive by creating tables, loading it with data and writing hive queries which will run internally in a MapReduce way. Leveraged cloud and GPU computing technologies for automated machine learning and analytics pipelines, such as AWS, GCP Performed Data Migration to GCP Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase, NoSQL database and Sqoop. Enable Private Link between AWS VPC and Snowflake. Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's Designed a data analysis pipeline in Python, using Amazon Web Services such as S3, EC2 and Elastic MapReduce Identify databases, layers, overall architecture on snowflake and implement in Snowflake Worked with g-cloud function with Python to load Data in to Big query for on arrival csv files in GCS bucket Process and load bound and unbound Data from Google pub/sub topic to Big query using cloud Dataflow with Python. Maintained Hadoop Cluster on AWS EMR. Used AWS services like EC2 and S3 for small data sets processing and storage Extracted files from MongoDB through Sqoop and placed in HDFS and processed. Involved in creating Hive tables, and loading and analyzing data using hive queries. Used FLUME to export the application server logs into HDFS. Environment: Hadoop, HDFS, Sqoop, Hive, Pig, MapReduce, Spark, Scala, Snowflake, Kafka, AWS, HBase, couchbase, Postgresql, MongoDB, Cassandra, Python, NoSQL, Flume, Oozie. Florida Blue Jacksonville FL July 2016 - Aug 2019 Hadoop Developer Responsibilities: Developed micro-services using Python scripts in Spark Data Frame APIs for the semantic layer. Involved in complete Implementation lifecycle, specialized in writing custom MapReduce, Pig and Hive. Have been using NiFi for transferring data from source to destination and Responsible for handling batch as well as Real-time Spark jobs through NiFi. Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data. Designed and implemented a process to migrate all existing databases from one version of Postgresql to another without downtime. Create Archive schema in snowflake for storing deleted/overwritten data for BI Analytical layers Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data. Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark. Built the complete data ingestion pipeline using NiFi which POST s flow file through invoke HTTP processor to our Micro services hosted inside the Docker containers. Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in MapReduce way. Designed and implemented MapReduce based large-scale parallel relation-learning system. Monitoring Cluster using Cloudera manager. Develop predictive analytic using Apache Spark Scala APIs. Implemented MapReduce counters to gather metrics of good records and bad records. Built data governance processes, procedures, and control for Data Platform using Nifi. Creating real time data streaming solutions and batch style large scale distributed computing applications using Apache Spark, Spark Streaming, Kafka and Flume. Finalize Schema in Snowflake with Appropriate Roles and Permission Used Oozie to orchestrate the MapReduce jobs that extract the data on a timely manner. Responsible for importing real time data to pull the data from sources to Kafka clusters. Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala. Developed Python Spark streaming scripts to load raw files and corresponding. Built scripts to load PySpark processed files into Redshift DB and used diverse PySpark logics. Developed MapReduce programs to cleanse the data in HDFS obtained from heterogeneous data sources. Processed metadata files into AWS S3 and Elastic search cluster. Create process to automate external tables creation and generate schema in Snowflake automatically Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data. Involved in scheduling Oozie workflow engine to run multiple Hives and Pig jobs and used Oozie Operational Services for batch processing and scheduling workflows dynamically. Created multi-node Hadoop and Spark clusters in AWS instances to generate terabytes of data and stored it in AWS HDFS. Included migration of existing applications and development of new applications using AWS cloud services. Developed Python Scripts to get the recent S3 keys from Elastic search. Uploaded click stream data from Kafka to HDFS, HBase, and Hive by integrating with Storm. Extracted data from SQL Server to create automated visualization reports and dashboards on Tableau. Environment: Java, HDP, PySpark, Scala, Jenkins, NiFi, Spark, Map Reduce, Python, Postgresql, Talend, Hive, Pig, Zookeeper, Kafka, Snowflake, HBase, Flume, Sqoop, Oozie, AWS. HomeBridge Financial Services Edison, NJ Aug 2015 - May 2016 Data Analyst Responsibilities: Worked for HomeBridge Secondary Marketing project with mortgage loan processing system. Analyzed existing requirements with end users and documented them for Tableau development. Performed metadata updates as part of metadata governance to keep the datasets updated with the required changes. Designed Tableau Reports using different files and SQL Server database. Worked with tableau prep to combine, clean and shape data by building flow with steps like join, cleaning, pivoting and aggregation. Worked on defining relationships and prepared the data based on the business scenario. Created Filters (on system channels like Wholesale, Retail, Correspondent) on Tableau reports. Created and modified some of the existing reports (Payment Frequency like Monthly, Bi-Weekly) and dashboards. Worked on data calculations, Payment Types (Fixed Rate, ARM-Adjustable-Rate Mortgage, Temporary buydown ARM (3/1,5/1, 7/1 etc.)). Worked with Tableau Forecasting option for the future numbers based on the historical data. Created sets, parameters and calculated sets for preparing views in Tableau. Created Tableau dashboards using bar charts, scattered plots, geographical maps, Gantt charts using show me functionality. Created detailed level summary report using Trend lines, Statistics, groups, and hierarchies. Combined single individual views into Interactive Dashboards by using Tableau Desktop and presented to Business users. Created customized dashboards layouts with Device Designer. Data validation by using SQL queries on tables and Views based on the requirements. Optimize SQL queries for effective data retrieval and meets the analytical requirements of business Environment: MS SQL Server, MySQL, Microsoft excel, windows, Tableau Desktop, Tableau server, Tableau Prep, Power point, JIRA. Tekway Inc - Hyderabad, India Mar 2014 Aug 2014 Data Analyst Responsibilities: Created custom reports and dashboards for analyzing the information provided by the user. Worked as team member and performed the roles of Data analyst in the organization. Interacted with various business team members to gather the requirements and documented the requirements. Maintaining, enhancing the business intelligence backend data like to generate business unit key analytics and metrics for the leadership team. Evaluated company data entry systems and prepared recommendations for system-wide efficiency improvement. Execute analytics projects that involve data collection, filter, and clean data to interpret and analyze data to provide reports. Performed data queries using SQL and prepared reports on a timely basis for payroll and project costs. Compile and validate data reinforce and maintain compliance with corporate standards. Developed optimized data collection and qualifying procedures. Created many parameters, quick filters, sets and shapes for better navigation in Tableau dashboard. Scheduled Full/Incremental refreshes of dashboards depending on the business requirements for data sources on Tableau Server. Environment: Power point, Jupyter notebook, Tableau, Microsoft excel, Json, Python, SQL. Keywords: continuous integration continuous deployment business intelligence sthree database information technology microsoft California Florida New Jersey |