Home

Satish - Data Engineer
[email protected]
Location: Chicago, Illinois, USA
Relocation: Anywhere
Visa: H1B
Name: Satish
PH: 8042151668
Sr. Data Engineer

Data Engineer with over 5 + years of experience in Data warehousing, Data engineering, Feature engineering, big data, ETL/ELT, and Business Intelligence. As a big data architect and engineer, specializing in AWS and Azure frameworks, Cloudera, Hadoop Ecosystem, Spark/Py Spark/Scala, Data bricks, Hive, Redshift, Snowflake, relational databases, tools like Tableau, Airflow, DBT, Presto/Athena, and Data DevOps Frameworks/Pipelines with strong Programming/Scripting skills in Python, Expertise on designing and developing the big data Analytics platforms for Retail, Logistics, Healthcare and Banking Industries using Big Data, Spark, Real-time streaming, Kafka, Data Science, Machine Learning, NLP and Cloud.
Professional Summary
Experience in the end-end process from requirements gathering to implementation using software development methodologies such as Agile Software Development, Scrum, Test Driven Development (TDD), Data Pipeline Design, development, and implementation using Continuous Integration and Continuous Deployment (CI/CD).
Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra.
Experience writing Machine Learning algorithms (Regression Models, Decision Trees, Naive Bayes, Neural Networks, Random Forest, Gradient Boosting, SVM, KNN, Clustering.
Good Knowledge on architecture and components of Spark, and efficient in working with Spark Core, Spark SQL, Spark streaming and expertise in building PySpark and Spark-Scala applications for interactive analysis, batch processing and stream processing.
Experience in batch processing and writing programs using Apache, Spark for handling real-time analytics and real Streaming of data.
Good understanding of Zookeeper and Kafka for monitoring and managing Hadoop jobs and used Cloudera CDH4, CDH5 for monitoring and managing Hadoop cluster.
Experience in Analysis, Design, Development and Big Data in Scala, Spark, Hadoop, Pig and HDFS environment.
Data Streaming from various sources cloud (AWS, Azure) on - premises by using the tools Spark Flume.
Experience in Amazon AWS, Google Cloud Platform and Microsoft Azure cloud services.
Used Azure Data Lake, Azure Data Factory, Azure Machine Learning, Azure Databricks
AWS cloud experience using EC2, S3, EMR, RDS, Redshift, AWS Sagemaker, Glue.
Built machine learning solutions using PySpark for large sets of data on Hadoop ecosystem.
Adept in statistical programming languages like Python and R including Big-Data technologies like Hadoop, HDFS, Spark and Hive.
Having good knowledge of tools like Snowflake, SSIS, SSAS, SSRS to design warehousing applications.
Experience in data mining, including predictive behavior analysis, Optimization and Customer Segmentation analysis using SAS and SQL.
Experience in Applied Statistics, Exploratory Data Analysis and Visualization using matplotlib, Tableau, Power BI, Google Analytics.
Technical Skills
Hadoop Distributions Cloudera, AWS EMR and Azure Data Factory.
Languages Scala, Python, SQL, Python, Hive QL, KSQL.
IDE Tools Eclipse, IntelliJ, pycharm.
Cloud platform AWS, Azure
AWS Services VPC, IAM, S3, Elastic Beanstalk, CloudFront, Redshift, Lambda, Kinesis, DynamoDB, Direct Connect, Storage Gateway, EKS, DMS, SMS, SNS, and SWF
Reporting and ETL Tools Tableau, Power BI, Talend, AWS GLUE.
Databases Oracle, SQL Server, MySQL, MS Access, NoSQL Database (Hbase, Cassandra, Mongo DB)
Big Data Technologies Hadoop, HDFS, Hive, Pig, Oozie, Sqoop, Spark, Machine Learning, Pandas, NumPy, Seaborn, Impala, Zookeeper, Flume, Airflow, Informatica, Snowflake, DataBricks, Kafka, Cloudera
Machine Learning
And Statistics Regression, Random Forest, Clustering, Time-Series Forecasting, Hypothesis,
Explanatory Data Analysis
Containerization Docker, Kubernetes
CI/CD Tools Jenkins, Bamboo, GitLab CI, uDeploy, Travis CI, Octopus
Operating Systems UNIX, LINUX, Ubuntu, CentOS.
Other Software Control M, Eclipse, PyCharm, Jupyter, Apache, Jira, Putty, Advanced Excel, TOAD, Oracle SQL developer, MS Office, FTP, Control-M, SQL Assistant, Rally, JIRA, GitHub, JSON
Frameworks Django, Flask, WebApp2

Professional Experience
________________________________________
Role Data Engineer
Client : Change Heakth, Nashville,TN Jan 2022 to Present
Project Description:
The purpose of this Project is to provide information on the Center for Drug Evaluation and Research s strategic informatics capabilities and guidance on how to apply them in addressing CDER user and system needs.
CDER s Strategic Capabilities Map depicts the capabilities that OBI is bringing to CDER users through the implementation of their use cases in order to support their business needs. The map captures the three strategic capabilities: (1) Electronic Submissions, (2) Integrated Data Management, (3) Business Intelligence Publishing.
Key Contributions
Worked in Creating data pipeline of gathering, cleaning and optimizing data using Hive, Spark.
Creating and sustaining an optimal data pipeline architecture
Responsible for loading data from the internal server and the Snowflake data warehouse into S3 buckets.
Created the infrastructure needed for optimal data extraction, transformation, and loading from a wide range of data sources.
In the Hadoop/Hive environment with Linux for big data resources, developed Spark/Scala, Python for regular expression (regex) project.
________________________________________
Responsibilities:
Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries. Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
Developed spark workflows using Scala to pull the data from AWS and apply transformations to the data.
Developed MapReduce/Spark Python for machine learning & predictive analytics Hadoop on AWS.
Worked on Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie.
Developed Spark Applications by using Python and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources.
Worked on Migrating MapReduce programs into Spark transformations using Spark and Scala.
Extracted the data from HDFS using Hive and performed data analysis using Spark with Scala, PySpark, Redshift for feature selection and created nonparametric models in Spark.
Worked on RDS databases like MySQL server and NOSQL databases like MongoDB, HBase.
Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau.
Developed Tableau visualizations and dashboards using Tableau Desktop.
Environment: Python, R, SQL, Hive, Spark, AWS, Hadoop, NoSQL, Cassandra, SQL Server AWS, HDFS, PySpark, Tableau, Mongo DB, Postgres SQL, Redshift, Hbase, Sqoop, Airflow, Oozie.
________________________________________
Data Engineer/Python Developer
Avon Technologies Pvt Ltd Hyd India Jan 2019 to Aug 2021
Key Contributions:
Worked in Creating data pipeline of gathering, cleaning and optimizing data using Hive, Spark.
Creating and sustaining an optimal data pipeline architecture
Responsible for loading data from the internal server and the Snowflake data warehouse into S3 buckets.
Created the infrastructure needed for optimal data extraction, transformation, and loading from a wide range of data sources.
In the Hadoop/Hive environment with Linux for big data resources, developed Spark/Scala, Python for regular expression (regex) project.
Using Numerical equation, built application and do 2D- finite Element analysis using python language
Used Django database API's to access database objects.
________________________________________
Responsibilities:
Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB.
Creating and writing aggregation logic on Snowflake Datawarehouse tables.
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in AWS and coordinate task among the team.
Recreated and maintained existing Access Database artifacts in Snowflake.
Developed AWS Athena extensively to ingest structured data from S3 into various systems such as RedShift or to generate reports.
Consumed Kafka messages and curated using Python send the data into multiple targets Redhshift, Athena and S3 buckets. Used AWS Quicksight for visualization.
Implemented a 'server less' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets
Worked datasets stored in AWS S3 buckets, used spark data frames to perform preprocessing in Glue.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
Used Python and Pandas library, built data analysis graph for documentation and record.
Worked with Blender script, built add-on and customization of blender interface.
Used SPSS tool for statistical programming and computational techniques for large data sets and quantitative analysis.
Used SPSS tool for Visualization techniques for summarizing complex data analysis using Python.
Using Numerical equation, built application and do 2D- finite Element analysis using python language
Used Django database API's to access database objects.
Generated graphical reports using python package Numpy and MatPlotLib.
Representation of the system in hierarchy form by defining the components, subcomponents using Python and developing set of library functions over the system based on the user needs.
Development of Python APIs to dump the array structures in the Processor at the failure point for debugging. Extracted the actual data of HTML format and predicted raw data; interpreted and stored predicted one in well-organized JSON files. Wrote programs to parse excel files with data validations.
Used Python and Django to interface with the jQuery UI and manage the storage and deletion of content.
Environment: Python2.7, C#, Macros Oracle DB, Debian, Apache Server, pandas Django, MySQL, Linux, HTML, GIT, CSS, JavaScript. R, Machine Learning, SQL, SQL server, Tableau, Hive, Teradata, Unit, AWS.

EDUCATION:
Masters in Computer Science from Lewis University, Aug 2021 May 2023.
Bachelors in Computer Science from Osmania University, Aug 2014 Dec 2018.

Master s Course Work
Analysis Tools: - Advanced Excel (VBA, VLOOKUP, Macros), Power BI, Tableau, Adobe Analytics (Omniture), Google Analytics,
Programming/Database: - Python, R, Advanced SQL, MySQL, PostgresSQL, snowflake, C, C++, HTML, Java
Core Skills: - Data Warehousing, Data Mining, Data Visualization, Requirement Gathering, Project management, ETL,Python
Software Skills: - NetBeans, Android Studio, RStudio, Hadoop, STATA
Cloud : AWS,Azure
Keywords: cprogramm cplusplus csharp continuous integration continuous deployment user interface business intelligence sthree database rlang trade national microsoft Tennessee

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];1504
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: