Home

Ram Ganesh - Data Engineer
[email protected]
Location: Frisco, Texas, USA
Relocation: Yes
Visa: GC
Ram
Senior Cloud Data Engineer
Email: [email protected]
Phone: 832-779-0997

AWS Data Engineering professional with solid foundational skills and proven implementation tracks in various data platforms. Self-motivated with a strong adherence to personal accountability in both individual and team scenarios with 10 years of experience in AWS Data Engineering, Data Pipeline Design, Development, and Implementation as a Sr. Data Engineer/Data Developer and Data Modeler.

SUMMARY:
Implemented simple to complex transformations on Streaming Data and Datasets. Worked on analyzing Hadoop cluster and different big data analytic tools including Hive, Spark, Python, Sqoop, Flume, and Oozie.
Developed Spark Streaming by consuming static and streaming data from different sources.
Used Spark Streaming to stream data from external sources using Kafka service and was responsible for migrating the code base from Cloudera Platform to Amazon EMR and evaluated Amazon ecosystems components like RedShift, and Dynamo DB. Having good knowledge of NoSQL databases like Dynamo DB, Mongo DB, and Cassandra. Setting up and administering DNS system in AWS cloud using Route53.
Utilized Apache Spark with Python (PySpark) to develop Big Data Analytics and Machine learning programs, executed machine learning use cases under Spark ML and Mllib.
Used Scala sbt to develop Scala-coded spark projects and executed them using spark-submit.
Collaborated with Architects to design a Spark model for the existing MapReduce model and migrated them to Spark models using Scala. Worked on writing Scala Programs using Spark-SQL in performing aggregations.
Developed Web Services in play framework using Scala in building stream data Platform.
Worked with Apache Spark which provides the fast engine for large data processing integrated with Scala.
Experience working with SparkSQL and creating RDDs using PySpark. Extensive experience working with ETL of large datasets using PySpark in Spark on HDFS. Data Extraction of Adobe data within AWS Glue using PySpark.
Developed and deployed Spark application using Pyspark to compute popularity score for all the contents using an algorithm and load the data into Elastic Search for App content management team to consume. Working on a real-time analytics solution using Cloud PubSub, Data Flow, BigQuery, and Looker.
SASPsycopg, embedpy, numpy, and Beautiful Soup.
Experience in Designing, Architecting, and implementing scalable cloud-based web applications using AWS
Experience in several AWS Services including EC2, VPC, IAM, S3, RDS, ELB, Route 53, Cloud Watch, Cloud Formation Templates, Cloud Front, Cloud Trail, AWS CDK, ALB/NLB
possess a solid working knowledge of PyArrow, a Python library that offers capabilities for interacting with the in-memory columnar data format Arrow.
Using PyArrow for high-performance, effective data manipulation and exchange in Python.
knowledge of utilizing PyArrow to read and write data in a variety of formats, including Parquet, CSV, JSON, and Apache Arrow files.
Migrating big data workloads (HDFS, Hive, Spark etc) to Cloud DataProc for processing, while storing the data in Cloud Storage.
Used Apache Arrow (PyArrow) to transfer spark Datasets/Data frames between JVM process to Python.
Created Mock Data using Pandas and Pyarrow and some other external libraries in python for testing in sandbox and non-prod systems.
Using Flume, Kafka, and Spark streaming to ingest real-time or near real-time data in HDFS.
Analyzed data and provide insights with R Programming and Python Pandas
Educational Details:
Bachelor s in computer science and Engineering Year: 2014
Jawaharlal Nehru Technological University
PROFESSIONAL EXPERIENCE:________________________________________
Client: Wolters Kluwer, Indianapolis, IN Aug 2021 - Till Date
Role: Senior AWS Data Engineer
Project Description:
Wolters Kluwer is a global provider of professional information, software solutions, and services for clinicians, nurses, accountants, lawyers, and tax
This client offers online banking, insurance, investments, Tax, credit cards, and loans, as well as mortgage and wealth management services. This project aims to design, develop, and maintain fully-fledged and functioning platforms with databases or servers, develop and customize the User Interface and applications, maintain the cloud infrastructure, and schedule the projects on an event-based. Involved in providing continuous integration and built stories and performed development using agile methodology.
________________________________________
Responsibilities:
Designing and developing complex data pipelines using AWS, Sqoop, Spark, and databases for data ingestion, data analysis, and transformation.
Performing data cleansing, data transformation, and data manipulation using Python. Importing data from various databases to S3 buckets using Python.
Working in AWS environment- S3, AWS Glue, EC2, RDS, EMR, Lambda, Oracle, PostgreSQL, Mongo DB, Dynamo DB, and Snowflake.
Using CLI commands to move files from the local server to the S3 bucket.
Designed and developed a data migration plan to process all the third-party data to AWS and executed it successfully.
Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.
Worked on AWS Data Pipeline to configure data loads from S3 to into Redshift.
Used JSON schema to define table and column mapping from S3 data to Redshift.
Designed and implemented a fully operational production grade largescale data solution on Snowflake.
Experience in using Snowflake Clone and Time Travel.
Participates in the development improvement and maintenance of snowflake database applications.
ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.
Developed PySpark code for AWS Glue jobs and integrated them with AWS data lake and other cloud services.
Developed end-to-end data integration solutions using Talend, enabling seamless data movement and transformation across various sources and destinations.
Designed and implemented ETL workflows in Talend to extract, transform, and load data from diverse data sources into target systems.
Utilized Talend's extensive library of connectors to integrate data from databases, cloud services, APIs, and flat files.
Created complex data transformation jobs using Talend's graphical interface, enhancing data quality and ensuring consistency.
Created Python script to load data from Hive tables to Oracle database and Tableau is connected to these tables to create visualizations. Applied different HDFS formats and structures to speed up analytics.
Created AWS Glue jobs to move data from S3 buckets to RedShift DB and queried structured files in S3 buckets using Athena.
Created Hive partitioning and bucket tables in HDFS and performed data validation between ETL and Hive tables.
Created tables to hold metadata information of all flat files and developed pass-through mappings to extract data from various sources and load the information to staging tables.
ETL pipelines in and out of data warehouse using combination of Python and Snowflakes SnowSQL Writing SQL queries against Snowflake.
Experience with Snowflake cloud data warehouse and AWS S3 bucket for integrating data from multiple source system which include loading nested JSON formatted data into snowflake table.
Proficient in designing, developing, and deploying microservices-based architectures to create scalable and modular software solutions.
Extensive experience in breaking down monolithic applications into microservices, enhancing agility and enabling independent development and deployment.
Developed RESTful APIs for microservices, enabling seamless communication between services and facilitating integration with external systems.
Leveraged containerization technologies like Docker to package microservices and their dependencies, ensuring consistency across development, testing, and production environments.
Implemented data validation and error handling mechanisms in Talend jobs, minimizing data integrity issues and improving data accuracy.
Optimized Talend jobs for performance and scalability, utilizing parallel processing and data partitioning techniques.
Demonstrated strong problem-solving skills, investigating and resolving complex issues in a timely manner to restore services and meet SLA requirements.
Migrating an entire Oracle database to BigQuery and using Power BI for reporting.
Implemented version control and documentation practices within Talend projects, ensuring traceability and knowledge sharing.
Utilized Spark SQL API in PySpark to extract and load data and perform SQL queries.
Worked on developing a Pyspark script to encrypt the raw data by using Hashing algorithms concepts on client-specified columns.
Utilized GitHub and Docker for the runtime environment for the CI/CD system to build, test, and deploy.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
Proficient in conceptual, logical, and physical data modeling techniques, ensuring effective representation of data structures.
Designed and maintained data models for various database systems, including relational, NoSQL, and data warehousing platforms.
Collaborated with stakeholders to gather requirements and translate them into comprehensive data models that meet business needs.
Developed ER diagrams, entity-relationship models, and schema diagrams to visualize data relationships and structures.
Utilized data modeling tools such as ERwin, ER/Studio, or PowerDesigner to design, document, and maintain data models.
Worked on scalable distributed data system using the Hadoop ecosystem in AWS EMR and MapR (MapR data platform).
Writing code that optimizes the performance of AWS services used by application teams and provides Code-level application security for clients (IAM roles, credentials, encryption, etc.).
Environment:
AWS (Lambda, S3, EC2, Redshift, EMR), Redshift, Teradata 15, Python 3.7, PyCharm, Jupyter Notebooks, Big Data, PySpark, Hadoop, Hive, HDFS, Kafka, Airflow, Snowflake, MongoDB, PostgreSQL, SQL, Tableau, Docker, GitHub, Git.
________________________________________

E-Trade Financial Corp - San Francisco, CA Jan 2020 to Jul 2021
GCP Data Engineer
Description: E-Trade is a leading financial services company and pioneer in the online brokerage industry. Having executed the first-ever electronic trade by an individual investor has long been at the forefront of the digital revolution, focused on delivering complete and easy-to-use solutions for traders, investors and stock plan participants aims to enhance the financial independence of traders and investors through a powerful digital offering and professional guidance.
________________________________________
Responsibilities:
Using the g-cloud function with Python to load Data into Bigquery for on-arrival CSV files in the GCS bucket.
Write a program to download a SQL Dump from their equipment maintenance site and then load it in the GCS bucket. On the other side load this SQL dump from the GCS bucket to MYSQL (hosted in Google Cloud SQL) and load the Data from MYSQL to Bigquery using Python, Scala, spark, and Dataproc.
Process and load bound and unbound Data from Google pub/subtopic to Bigquery using cloud Dataflow with Python.
Used apache airflow in GCP composer environment to build data pipelines and used various airflow operators like bash operator, Hadoop operators, and python callable and branching operators.
Develop and deploy the outcome using spark and Scala code in the Hadoop cluster running on GCP.
Implemented automation and monitoring of production pipelines and performed routine issue resolutions, root cause analysis, and troubleshooting of production pipelines.
Engaged with Google Support to fix Dataflow and Pub/Sub pipeline scalability issues.
Application monitoring was set up using a Twitter monitoring infrastructure (MonViz) which integrates data from Dataflow, Airflow, Pub/Sub, and Flume into a central monitoring dashboard.
Troubleshooting and performance optimization of Apache Beam / Google DataFlow-based pipelines.
Deployed Airflow (Python Executable PED-based deployment) on Kubernetes Pods and CloudSQL-based database. Performed performance and scalability tests on log pipelines that leverage many Hadoop components including Flume, HDFS, and custom sinks (Google/PubSub).
Utilized orchestration tools such as Kubernetes to manage and deploy microservices containers, ensuring high availability, scalability, and efficient resource utilization.
Developed microservices using languages like Java, Python, or Node.js, choosing the most suitable language for specific use cases and business requirements.
Implemented API gateways to manage authentication, routing, and load balancing for microservices, enhancing security and performance.
Designed normalized and denormalized schemas based on performance, scalability, and reporting requirements.
Developed data models for data warehousing, including star and snowflake schemas, supporting efficient analytics.
Collaborated with database administrators, developers, and analysts to ensure data models align with technical and business requirements.
Created data dictionaries and metadata repositories to provide comprehensive documentation for data models.
Developed expertise in translating complex business rules into data model designs, ensuring data integrity and accuracy.
Developed tools using Python, Pyspark, Shell scripting, and XML to automate some of the menial tasks. Interfacing with supervisors, artists, systems administrators, and production to ensure production deadlines are met.
Designing and building multi-terabyte, full end-to-end Data Warehouse infrastructure from the ground up on Confidential Redshift for large-scale data handling Millions of records every day.
Build data pipelines in airflow in GCP for ETL-related jobs using different airflow operators.
Experience in Dataproc, GCS, Cloud functions, and BigQuery.
Created ETL workflows in Talend to extract, transform, and load data from various sources into Google Cloud services such as BigQuery and Google Cloud Storage.
Utilized Talend's connectors to integrate data from on-premises sources and cloud-based applications with Google Cloud Platform resources.
Developed complex data transformation jobs within Talend, enhancing data quality and consistency while adhering to GCP best practices.
Experience in moving data between GCP and Azure using Azure Data Factory.
Experience in building power bi reports on Azure Analysis services for better performance.
Used cloud shell SDK in GCP to configure the services Data Proc, Storage, and BigQuery Coordinated with the team and developed a framework to generate Daily Adhoc reports and Extracts from enterprise data from BigQuery.
Led and managed complex database projects involving Amazon RDS PostgreSQL, with over years of hands-on experience as a subject matter expert (SME) in the field.
Managed PostgreSQL databases in cloud environments, particularly Amazon RDS, by effectively configuring instances, monitoring performance metrics, and scaling resources as needed.
Optimized Talend jobs for GCP by leveraging parallel processing, partitioning, and other performance-enhancing techniques.
Led the implementation of backup and disaster recovery strategies for PostgreSQL databases, ensuring data integrity and business continuity in case of unforeseen events.
Demonstrated expertise in advanced PostgreSQL features such as partitioning, replication, and data distribution, optimizing database structures for efficient data storage and retrieval.
Hands-on experience in using all the big data-related services in the Google Cloud Platform.
Built reports for monitoring data loads into GCP and driving reliability at the site level
Build data pipelines in airflow in GCP for ETL-related jobs using different airflow operators both old and newer operators.
Implementing and Managing ETL solutions and automating operational processes.
Optimizing and tuning the Redshift environment, enabling queries to perform up to 100x faster for Tableau and SAS Visual Analytics.
partition tables in parquet file format
Develop spark job with partitioned RDD (like hash, range, custom) for faster processing
Develop near real-time data pipeline using flume, Kafka, and spark stream to ingest client data from their weblog server and apply transformation
sWrite Scala program for spark transformation in Dataproc.
Environment:
GCP, Bigquery, Gcs Bucket, G-Cloud Function, Apache Beam, Cloud Dataflow, Cloud Shell, Gsutil, Dataproc, VM Instances, Cloud SQL, Mysql, Posgres, SQL Server, Python, Scala, Spark, Hive, Spark-SQL.________________________________________
Molina health care, California, CA May 2018 - Dec 2019
Role: Data Engineer
________________________________________
Responsibilities:
Experience in working with Azure cloud platforms (HDInsight, Databricks, Data Lake, Blob, Data Factory, Synapse, SQL DB, SQL DWH).
Experience analyzing data from Azure data storage using Databricks for deriving insights.
Performed data cleansing and applied transformations using Databricks and Spark data analysis.
Designed and automated Custom-built input adapters using Spark, Sqoop, and Oozie to ingest and analyze data from RDBMS to Azure Data Lake.
Monitored Spark cluster using Log Analytics and Ambari Web UI. Transitioned log storage from MS SQL to Cosmos DB and improved the query performance.
Involve in creating database objects like tables, views, stored procedures, triggers, packages, and functions using T-SQL to provide structure and maintain data efficiently.
Extensive experience in working with SQL, with strong knowledge of T-SQL (MS SQL Server)
Created Automated ETL jobs in Talend and pushed the data to Azure SQL data warehouse.
Used Azure Synapse to manage processing workloads and served data for BI and prediction needs.
Built ETLs to load the data from Presto, PostgreSQL, Hive, SQL Server to Snowflake using Apache Airflow, Python, and Spark.
Developed data warehouse model in Snowflake for over 100 datasets using whereScape.
Creating Reports in Looker based on Snowflake Connections
Hands-on experience with Snowflake utilities, SnowSQL, Snow Pipe, Big Data model techniques using Python / Java.
ETL pipelines in and out of the data warehouse using a combination of Python and Snowflake Snow SQL Writing SQL queries against Snowflake.
Developed Spark Scala scripts for mining data and performed transformations on large datasets to provide real-time insights and reports.
Managed resources and scheduling across the cluster using Azure Kubernetes Service.
Involved in building an Enterprise DataLake using Data Factory and Blob storage, enabling other teams to work with more complex scenarios and ML solutions.
Used Azure DataFactory, SQL API and Mongo API and integrated data from MongoDB, MS SQL, and cloud (Blob, Azure SQL DB).
Created, and provisioned multiple Databricks clusters needed for batch and continuous streaming data processing and installed the required libraries for the clusters.
Experience working on Healthcare data, developing data pre-processing pipelines for data like DICOM and NON-DICOM images of XRAYS, CT-SCANS, etc.
Built ETLs to load the data from Presto, PostgreSQL, Hive, SQL Server to Snowflake using Apache Airflow, Python, and Spark.
Developed data warehouse model in Snowflake for over 100 datasets using whereScape.
Creating Reports in Looker based on Snowflake Connections
Hands-on experience with Snowflake utilities, SnowSQL, SnowPipe, Big Data model techniques using Python / Java.
ETL pipelines in and out of the data warehouse using a combination of Python and Snowflake SnowSQL Writing SQL queries against Snowflake.
Worked on SnowSQL and Snowpipe
Converted Talend Joblets to support the Snowflake functionality.
Created Snowpipe for continuous data load.
Created data sharing between two Snowflake accounts.
Used Apache Airflow with Python and Unix to submit the Spark batch jobs in EMR Cluster.
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
Migrate data from traditional database systems to Azure databases.
Design and implement migration strategies for traditional systems on Azure (Lift and shift/Azure
Design and architect various layers of the Data Lake.
________________________________________
Client: Creator Technologies Pvt Ltd, Hyderabad, India June 2014 Nov 2017
Role: Data & Analytics Engineer May 2013 June 2014(Internship)
Project Description:
Creator Technologies is a technology, IT consulting organization specializing in outsourced product engineering services.________________________________________
Responsibilities:
Working on processing big volumes of data using different big data analytic tools including Spark Hive, SQOOP, Pig, Flume, Apache Kafka, PySpark, OOZIE, HBase, Python, and Scala.
Developed ETL Applications using Hive, Spark, and Impala & Sqoop for Automation using Oozie. Used Pig as an ETL tool to do transformations, event joins, and some pre-aggregations before storing the data onto HDFS.
Used Spark Streaming to divide streaming data into batches as input to the Spark engine for batch
Created Physical Data Model from the Logical Data Model using Compare and Merge Utility in ER/Studio and worked with the naming standards utility.
Developed workflow in Oozie & NiFi to automate the tasks of loading the data into HDFS.
Creating Hive tables, dynamic partitions, and buckets for sampling, and working on them using HiveQL.
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL.
Migrate data from traditional database systems to Azure databases.
Design and implement migration strategies for traditional systems on Azure (Lift and shift/Azure
Migrate, other third-party tools.
Experience in DWH/BI project implementation using Azure Data Factory.
Process and load bound and unbound Data from Google pub/subtopic to Bigquery using cloud Dataflow with Python.
Create firewall rules to access Google Data proc from other machines.
Write Scala program for spark transformation in Dataproc.
Worked on publishing interactive data visualizations dashboards, reports /workbooks on Tableau, and SAS Visual Analytics.
Responsible for Designing Logical and Physical data modeling for various data sources on Confidential Redshift.
Experience in transferring Streaming data, data from different data sources into HDFS and NoSQL databases using Apache Flume. Cluster coordination services through Zookeeper.
Build cluster on AWS environment using EMR using S3, EC2, and Redshift
Involved in complete SDLC of the project including requirements gathering, design documents, development, testing, and production environments. Packaged Spark development environment into a custom vagrant box. Designed data flow running using SPARK & SPARK SQL.
Keywords: continuous integration continuous deployment machine learning user interface javascript business intelligence sthree database rlang information technology microsoft California Connecticut

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];417
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: