Satish - Data Engineer |
[email protected] |
Location: Chicago, Illinois, USA |
Relocation: Anywhere |
Visa: H1B |
Name: Satish
PH: 8042151668 Sr. Data Engineer Data Engineer with over 5 + years of experience in Data warehousing, Data engineering, Feature engineering, big data, ETL/ELT, and Business Intelligence. As a big data architect and engineer, specializing in AWS and Azure frameworks, Cloudera, Hadoop Ecosystem, Spark/Py Spark/Scala, Data bricks, Hive, Redshift, Snowflake, relational databases, tools like Tableau, Airflow, DBT, Presto/Athena, and Data DevOps Frameworks/Pipelines with strong Programming/Scripting skills in Python, Expertise on designing and developing the big data Analytics platforms for Retail, Logistics, Healthcare and Banking Industries using Big Data, Spark, Real-time streaming, Kafka, Data Science, Machine Learning, NLP and Cloud. Professional Summary Experience in the end-end process from requirements gathering to implementation using software development methodologies such as Agile Software Development, Scrum, Test Driven Development (TDD), Data Pipeline Design, development, and implementation using Continuous Integration and Continuous Deployment (CI/CD). Good understanding and knowledge of NoSQL databases like MongoDB, HBase and Cassandra. Experience writing Machine Learning algorithms (Regression Models, Decision Trees, Naive Bayes, Neural Networks, Random Forest, Gradient Boosting, SVM, KNN, Clustering. Good Knowledge on architecture and components of Spark, and efficient in working with Spark Core, Spark SQL, Spark streaming and expertise in building PySpark and Spark-Scala applications for interactive analysis, batch processing and stream processing. Experience in batch processing and writing programs using Apache, Spark for handling real-time analytics and real Streaming of data. Good understanding of Zookeeper and Kafka for monitoring and managing Hadoop jobs and used Cloudera CDH4, CDH5 for monitoring and managing Hadoop cluster. Experience in Analysis, Design, Development and Big Data in Scala, Spark, Hadoop, Pig and HDFS environment. Data Streaming from various sources cloud (AWS, Azure) on - premises by using the tools Spark Flume. Experience in Amazon AWS, Google Cloud Platform and Microsoft Azure cloud services. Used Azure Data Lake, Azure Data Factory, Azure Machine Learning, Azure Databricks AWS cloud experience using EC2, S3, EMR, RDS, Redshift, AWS Sagemaker, Glue. Built machine learning solutions using PySpark for large sets of data on Hadoop ecosystem. Adept in statistical programming languages like Python and R including Big-Data technologies like Hadoop, HDFS, Spark and Hive. Having good knowledge of tools like Snowflake, SSIS, SSAS, SSRS to design warehousing applications. Experience in data mining, including predictive behavior analysis, Optimization and Customer Segmentation analysis using SAS and SQL. Experience in Applied Statistics, Exploratory Data Analysis and Visualization using matplotlib, Tableau, Power BI, Google Analytics. Technical Skills Hadoop Distributions Cloudera, AWS EMR and Azure Data Factory. Languages Scala, Python, SQL, Python, Hive QL, KSQL. IDE Tools Eclipse, IntelliJ, pycharm. Cloud platform AWS, Azure AWS Services VPC, IAM, S3, Elastic Beanstalk, CloudFront, Redshift, Lambda, Kinesis, DynamoDB, Direct Connect, Storage Gateway, EKS, DMS, SMS, SNS, and SWF Reporting and ETL Tools Tableau, Power BI, Talend, AWS GLUE. Databases Oracle, SQL Server, MySQL, MS Access, NoSQL Database (Hbase, Cassandra, Mongo DB) Big Data Technologies Hadoop, HDFS, Hive, Pig, Oozie, Sqoop, Spark, Machine Learning, Pandas, NumPy, Seaborn, Impala, Zookeeper, Flume, Airflow, Informatica, Snowflake, DataBricks, Kafka, Cloudera Machine Learning And Statistics Regression, Random Forest, Clustering, Time-Series Forecasting, Hypothesis, Explanatory Data Analysis Containerization Docker, Kubernetes CI/CD Tools Jenkins, Bamboo, GitLab CI, uDeploy, Travis CI, Octopus Operating Systems UNIX, LINUX, Ubuntu, CentOS. Other Software Control M, Eclipse, PyCharm, Jupyter, Apache, Jira, Putty, Advanced Excel, TOAD, Oracle SQL developer, MS Office, FTP, Control-M, SQL Assistant, Rally, JIRA, GitHub, JSON Frameworks Django, Flask, WebApp2 Professional Experience ________________________________________ Role Data Engineer Client : Change Heakth, Nashville,TN Jan 2022 to Present Project Description: The purpose of this Project is to provide information on the Center for Drug Evaluation and Research s strategic informatics capabilities and guidance on how to apply them in addressing CDER user and system needs. CDER s Strategic Capabilities Map depicts the capabilities that OBI is bringing to CDER users through the implementation of their use cases in order to support their business needs. The map captures the three strategic capabilities: (1) Electronic Submissions, (2) Integrated Data Management, (3) Business Intelligence Publishing. Key Contributions Worked in Creating data pipeline of gathering, cleaning and optimizing data using Hive, Spark. Creating and sustaining an optimal data pipeline architecture Responsible for loading data from the internal server and the Snowflake data warehouse into S3 buckets. Created the infrastructure needed for optimal data extraction, transformation, and loading from a wide range of data sources. In the Hadoop/Hive environment with Linux for big data resources, developed Spark/Scala, Python for regular expression (regex) project. ________________________________________ Responsibilities: Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries. Involved in developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS. Developed spark workflows using Scala to pull the data from AWS and apply transformations to the data. Developed MapReduce/Spark Python for machine learning & predictive analytics Hadoop on AWS. Worked on Creating, Debugging, Scheduling and Monitoring jobs using Airflow and Oozie. Developed Spark Applications by using Python and Implemented Apache Spark data processing Project to handle data from various RDBMS and Streaming sources. Worked on Migrating MapReduce programs into Spark transformations using Spark and Scala. Extracted the data from HDFS using Hive and performed data analysis using Spark with Scala, PySpark, Redshift for feature selection and created nonparametric models in Spark. Worked on RDS databases like MySQL server and NOSQL databases like MongoDB, HBase. Designed and implemented end-to-end systems for Data Analytics and Automation, integrating custom visualization tools using R, Tableau. Developed Tableau visualizations and dashboards using Tableau Desktop. Environment: Python, R, SQL, Hive, Spark, AWS, Hadoop, NoSQL, Cassandra, SQL Server AWS, HDFS, PySpark, Tableau, Mongo DB, Postgres SQL, Redshift, Hbase, Sqoop, Airflow, Oozie. ________________________________________ Data Engineer/Python Developer Avon Technologies Pvt Ltd Hyd India Jan 2019 to Aug 2021 Key Contributions: Worked in Creating data pipeline of gathering, cleaning and optimizing data using Hive, Spark. Creating and sustaining an optimal data pipeline architecture Responsible for loading data from the internal server and the Snowflake data warehouse into S3 buckets. Created the infrastructure needed for optimal data extraction, transformation, and loading from a wide range of data sources. In the Hadoop/Hive environment with Linux for big data resources, developed Spark/Scala, Python for regular expression (regex) project. Using Numerical equation, built application and do 2D- finite Element analysis using python language Used Django database API's to access database objects. ________________________________________ Responsibilities: Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB. Creating and writing aggregation logic on Snowflake Datawarehouse tables. Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in AWS and coordinate task among the team. Recreated and maintained existing Access Database artifacts in Snowflake. Developed AWS Athena extensively to ingest structured data from S3 into various systems such as RedShift or to generate reports. Consumed Kafka messages and curated using Python send the data into multiple targets Redhshift, Athena and S3 buckets. Used AWS Quicksight for visualization. Implemented a 'server less' architecture using API Gateway, Lambda, and Dynamo DB and deployed AWS Lambda code from Amazon S3 buckets Worked datasets stored in AWS S3 buckets, used spark data frames to perform preprocessing in Glue. Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift. Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification. Used Python and Pandas library, built data analysis graph for documentation and record. Worked with Blender script, built add-on and customization of blender interface. Used SPSS tool for statistical programming and computational techniques for large data sets and quantitative analysis. Used SPSS tool for Visualization techniques for summarizing complex data analysis using Python. Using Numerical equation, built application and do 2D- finite Element analysis using python language Used Django database API's to access database objects. Generated graphical reports using python package Numpy and MatPlotLib. Representation of the system in hierarchy form by defining the components, subcomponents using Python and developing set of library functions over the system based on the user needs. Development of Python APIs to dump the array structures in the Processor at the failure point for debugging. Extracted the actual data of HTML format and predicted raw data; interpreted and stored predicted one in well-organized JSON files. Wrote programs to parse excel files with data validations. Used Python and Django to interface with the jQuery UI and manage the storage and deletion of content. Environment: Python2.7, C#, Macros Oracle DB, Debian, Apache Server, pandas Django, MySQL, Linux, HTML, GIT, CSS, JavaScript. R, Machine Learning, SQL, SQL server, Tableau, Hive, Teradata, Unit, AWS. EDUCATION: Masters in Computer Science from Lewis University, Aug 2021 May 2023. Bachelors in Computer Science from Osmania University, Aug 2014 Dec 2018. Master s Course Work Analysis Tools: - Advanced Excel (VBA, VLOOKUP, Macros), Power BI, Tableau, Adobe Analytics (Omniture), Google Analytics, Programming/Database: - Python, R, Advanced SQL, MySQL, PostgresSQL, snowflake, C, C++, HTML, Java Core Skills: - Data Warehousing, Data Mining, Data Visualization, Requirement Gathering, Project management, ETL,Python Software Skills: - NetBeans, Android Studio, RStudio, Hadoop, STATA Cloud : AWS,Azure Keywords: cprogramm cplusplus csharp continuous integration continuous deployment user interface business intelligence sthree database rlang trade national microsoft Tennessee |