Data Engineer - Cloudera Data Platform (CDP) at Remote, Remote, USA |
Email: [email protected] |
From: ayush, Scalable Systems [email protected] Reply to: [email protected] Job Description: Data Engineer - Cloudera Data Platform (CDP) Charlotte, NC ~ Delaware, IAResponsibilities Design and develop robust data pipelines to efficiently extract, transform, and load (ETL) data from various sources into Hadoop and SQL Server environments. Utilize PySpark to ingest data from diverse systems into Hadoop and SQL Server regions. Perform database migrations between MS SQL Server and Hadoop platforms. Optimize database query performance, troubleshoot issues, and ensure data integrity on OLTP and OLAP systems. Collaborate with team members to understand project requirements, provide guidance, and develop optimal solutions. Leverage scripting languages (Batch, Shell, Python) for automation and data processing tasks. Implement CICD pipelines for database changes using GitHub, Jenkins, and Liquibase. Contribute to a positive team environment and foster knowledge sharing.Required Skills Strong proficiency in Big Data technologies including Spark, Scala, PySpark, and Hadoop (Hortonworks). Deep understanding of RDBMS concepts and MS-SQL with expertise in query optimization and performance tuning. Experience with Cloudera Data Platform (CDP) is a significant advantage. Proficiency in scripting languages such as Batch, Shell, and Python. Solid foundation in data engineering principles and best practices. Ability to work effectively in an Agile/Scrum development environment. Excellent problem-solving, analytical, and communication skills.Desired Skills Knowledge of Scala programming language. Experience with data migration from Hadoop to CDP. Familiarity with version control systems (Git, SVN). Understanding of cloud platforms (AWS, Azure, GCP).Education and Experience Bachelor's degree in Computer Science, Engineering, or a related field. 4-6 years of experience in data engineering or a similar role. Keywords: Data Engineer, Big Data, Hadoop, Spark, Scala, PySpark, MS-SQL, RDBMS, Cloudera Data Platform (CDP), ETL, data pipeline, data migration, query optimization, performance tuning, Agile, Scrum, Jira, Git, SVN, Liquibase, Python, scripting, cloud platforms. Keywords: microsoft North Carolina Data Engineer - Cloudera Data Platform (CDP) [email protected] |
[email protected] View all |
Wed Aug 14 00:47:00 UTC 2024 |