Sri Juvvadi - Senior Data Engineer |
[email protected] |
Location: Hampton, Virginia, USA |
Relocation: YES |
Visa: H1B |
Sri Juvvadi- Data Engineer Email: [email protected] Ph.No: (219) 448-1841
PROFESSIONAL SUMMARY: Focusing more in Data Engineering (ETL) and Machine learning with around 10 years of experience. Hands on experience with SQL, Python, PySpark and R. Worked with containerized platforms Docker and Kubernetes. Hands on experience with Hadoop and AWS stack. Streaming and processing tooling and framework experience including Spark Streaming, Apache Nifi, Kafka, and kinesis technologies. Deployed most of the applications and Data pipelines using Gitlab CICD. Also, good exposure on Jenkins and Terraform. Hands on experience in implementation and developing pipelines through scripts in ETL Streamsets. Extensive experience in building batch and steaming data pipelines using cutting edge technologies (Docker, Kubernetes, Hadoop and AWS). Developed predictive models using Topic Extraction, Decision Tree and Random Forest. Also, good understanding on concepts like Logistic Regression, Social Network Analysis, Cluster Analysis, and Neural Networks. Architect and build pipeline solutions to integrate data from multiple heterogeneous systems using Streamsets data collectors and Azure Experienced in building monitoring solutions to validate machine learning models. Excellent communication skills. Successfully working in fast-paced multitasking environment both independently and in collaborative team, a self-motivated enthusiastic learner. TOOLS AND TECHNOLOGIES: Languages Shell, SQL, Python, PySpark, Hive, and R Packages NumPy, Pandas, PyOd, Spacy, Matplotlib, Seaborn, Bokeh, Beautiful Soup, Selenium, Scikit-learn, TensorFlow, PyTorch. Container technologies AWS EKS, Docker and Kubernetes. Cloud Platforms AWS and Google. PROFESSIONAL EXPERIENCE: BCBST Sep 2021- Till date Role: Data Engineer Description: I have been working closes with entire process in building data pipelines and I was included in brainstorming sessions and exchanging the inputs with the associated teams. I was responsible in creating tables on-demand on S3 using Lambda and simple python Functions and AWS Glue using Python and Spark. Coordinated with team and Developed framework to generate Daily adhoc, Report s and Extracts from enterprise data and automated using Step Functions. Creating tables on-demand on S3 using Lambda and simple python Functions and AWS Glue using Python. Work with Business Analysts and Data Modelers to understand requirements. Analysis of system and user needs to document system requirements, which includes requirements, workflows, and process flow. Dealing with large amounts of data and querying using S3 and Athena Data Warehouse Working with the users very closely in troubleshooting the issues and fixing them and documenting the issues for future references. Creating the technical documentation for the project need and delivering them on demand. Implementing Rest Service and APIs in Python by providing data security, Provide leadership and supervision on all data exchanges. Creating and monitoring the alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using CloudWatch. Implemented and Created Batch job Service in GCP as per the need of the Stakeholders. Scheduled and automated the pipelines in GCP and transformed the data as per the business need. Worked with various teams and gathered the information to implement the logic. Created and loaded the data into Google Big Query for data visualization. Extracted the data from various competitor websites and loaded into GCS and organised the data in systematic manner. Ensure appropriate compliance is maintained, and that standards such as minimum necessary, and appropriate file formats are utilized. Participate in multiple phases of project life cycle development including requirement gathering, software design, development, and testing of computer applications. Provide technical evaluation estimates on technology initiatives. Formulate cutting-edge product breakthroughs, defining architecture and product functionality. Participation in all phases of development lifecycle and post-implementation. Anthem.Inc Dec 2018- Oct 2020 Role: Data Engineer Description: As a team we developed a machine learning platform and Data pipelines to validate Machine learning models. Also developed an ETL pipeline using cutting edge technologies to see model performance on monthly, daily and hourly basis. Roles & Responsibilities: Designed and developed data pipeline for machine learning platform across RDBMS, Nosql, and Cloud environments. Designed and developed batch and streaming pipelines for models and applications developed in SAS, Python and R. Designed and developed data pipeline using Spark, pySpark, Docker, Kubernetes, Logstash, Hadoop and AWS Stack. Integrated data from Cloudera Big data stack, Hadoop, Hive, Hbase, MongoDB. Build Streamsets pipeline to accommodate change. Worked with Kafka to integrate data from multiple topics to database. Manage RESTful API, integrate with Streamsets to move data. Automated daily ETL process using Apache Airflow and Crontab. Responsible for sending quality data thru secure channel to downstream systems using role base access control and Streamsets. Worked on a disruptive product that s still in its early stages in Redshift Understanding of structured data sets, data pipelines, ETL tools, data reduction, transformation and aggregation technique, Knowledge of tools such as DBT, DataStage. I have given some solutions to Solve challenging problems that will revolutionize database computing in the cloud. Building a product that will leverage the scale of resources available in the cloud. Set up a GCP Firewall rules in order to allow or deny traffic to and from the VM's instances based on specified configuration and used GCP cloud CDN (content delivery network) to deliver content from GCP cache locations drastically improving user experience and latency. Experience in providing highly available and fault tolerant applications utilizing orchestration technologies like Kubernetes and Apache Mesos on Google Cloud Platform. Working for a company that s a recognized leader in the cloud computing space, Be involved in the fast growing, big data space. Worked on optimizing the database and data warehouses including and Postgres, MySQL, Hbase, HDFS, Cassandra, Snowflake, DynamoDB, and additional cloud environments. Created python packages to automate read and write into databases (Which execute SQL scripts). Built batch and streaming pipelines using Apache YARN, Apache Nifi, Apache Kafka, Apache Flume, Apache Hive, HDFS and PySpark. Created YML files for Docker and Kubernetes, configured to GitLab CI/CD pipeline. Created a mechanism (Data pipeline) to collect data from application running in Docker and Kubernetes to store data in Hadoop and AWS. Used QlikView, Kibana, Tableau and Apache superset for visualization. Involved in design and features planning with Stakeholders, Managers, Project directors, Project owner and scrum masters. Done some POC on new technologies. Followed agile methodology. Suggested and implemented new technologies. S&P Global May 2018 Nov 2018 Role: Data Engineer Description: As a team we designed and developed a search engine. Developed data pipelines which extract and consume data from applications. Built complex mechanism to generate related data using metadata from source. Roles & Responsibilities: Designed database architecture and developed data platform. Designed and developed data pipelines. Automated based on the requirement. Designed and developed real time ETL data pipelines by connected to models developed in SAS and Python for Analytics and Monitoring. Developed complex data pipelines using AWS resource S3, EC2, Lambda, Glue, EMR, Step Function and RDS. Developed Topic extraction (Topic Modelling) model using LDA and other customized NLP techniques. Developed and automated data normalizing model using Python libraries. Developed process to generate missing data from source files. Designed and developed data cooking model to normalize and transform raw data. Gathering requirements from SME, Project Manager and Business analysis. Established AWS Bastion host for secure data transaction between application and database team. Represented team in few summits. Created POC on Maria DB Audit Plugin, identified few performance issues then escalated to Maria DB research and Develop team. Cloud Big Data Technologies Dallas, Texas May 2017- April 2018 Role: Data Engineer (Machine Learning) Description: Involving in prestige s 360-degree profiles assembling project. Analyzing multi-dimensional, conducting Network optimization, Predictive analytics and Social Analytics. Roles & Responsibilities: Designed and developed steaming data pipelines using Apache Kafka, Flume, Hive and Pig. Built complex ETL process and stored process data in Hive and HDFS. Transformed raw text data from blogs and social networking sites provided from third party vendors for sentimental analysis, information extraction and information retrieval. Performed data profiling to learn the behaviour of various features such as traffic pattern, location, time, Date and Time etc. Good understanding on machine leaning techniques like supervised and unsupervised leaning. Vanguard, PA Dec 2016 - May 2017 Role: Data Scientist / Data Engineer Description: Their mission is to understand and deliver 100% with our commitment to the clients, it helps in achieving Growth, Innovation and Excellence. It offers Staff Augmentation, IT consulting, Software, and mobile development services to all clients at most of the overseas locations with highly skilled consultants having vast experience in their relevant fields. Roles & Responsibilities: Designing and developing an e-commerce website using CMS, HTML and CSS. Analysed sales of past few years using data mining techniques and created few dashboards. Developing database which contains Tables, Stored Procedures, Functions, Views, Triggers and Indexes in SQL SERVER and connecting to existing CMS system. Created a POC on image classification model. Understating sales from past few years, using Machine leaning techniques. Extracted data from third party application and conducted data preprocessing and data mining using R and Python. Cloud Big Data Technologies- Hyd, IND Mar 2014- Nov 2015 Role: SQL developer / DBA Description: Cloud Big Data provides data-specific IT services across multiple domains, industries and verticals. Our organization has expanded its resources and knowledge base by focusing on the delivery of the very latest technologies and methodologies that successfully meet customers IT challenges. Roles & Responsibilities: Participated in analysis, design, development, testing, and implementation of various financial Systems using Oracle, Developer and PL/SQL. Define database structure, mapping and transformation logic. Creation of External Table scripts for loading the data from source for ETL (Extracting Transforming and Loading) Jobs. Wrote UNIX Shell Scripts to run database jobs on server side. Developed new and modified existing packages, Database triggers, stored procedure and other code modules using PL/SQL in support of business requirements. EDUCATION: Master of Science in Engineering Technology Jan 16 Dec 17 Trine University, IN, USA Bachelor's in computer science and Engineering Jun 10 Aug 14 JNTU, Hyderabad, India Keywords: continuous integration continuous deployment sthree database rlang information technology procedural language Pennsylvania |