Job Details

Home

Immediate job opening ::Hadoop Platform Engineer Quick Closure Need 10+ at Remote, Remote, USA

Email: [email protected]

From:

Princy Jain,

Maintec

[email protected]

Reply to: [email protected]

Role: Hadoop Platform Engineer

Locations:

Plano, TX / Jersey City, NJ / New York City, NY / Atlanta, GA / Newark, DE / Charlotte, NC

Duration: Contract/Fulltime

(Hybrid role from Day1)

Client: Bank Of America

Mandatory experience: 10+ Years or more.

Job Description:

As a Hadoop Platform Engineer, you will

be responsible for designing, implementing, and managing our company's Hadoop infrastructure and data ecosystem. You will collaborate with cross-functional teams to understand data requirements, optimize data pipelines, and ensure the reliability and performance of our Hadoop clusters. You will also be responsible for administering and monitoring the Hadoop environment, troubleshooting issues, and implementing security measures.

Required Skills:

Platform Engineering:

Cluster Management:

Expertise

in design, implement, and maintain Hadoop clusters in large volume, including components such as HDFS, YARN, and MapReduce.

Collaborate with data engineers and data scientists to understand data requirements and

optimize data pipelines.

Administration and Monitoring:

Experience in administering and monitoring Hadoop clusters to ensure high availability, reliability, and performance.

Experience in troubleshooting and resolving issues related to Hadoop infrastructure, data ingestion, data processing, and data storage.

Security Implementation:

Experience in Implementing and managing security measures within Hadoop clusters, including authentication, authorization, and encryption.

Backup and Disaster Recovery:

Collaborate with cross-functional teams to define and implement backup and disaster recovery strategies for Hadoop clusters.

Performance Optimization:

Experience in

optimizing Hadoop performance through fine-tuning configurations,

capacity planning, and implementing performance monitoring and tuning techniques.

Automation and DevOps Collaboration:

Work with DevOps teams to automate Hadoop infrastructure provisioning, deployment, and management processes.

Technology Adoption and Recommendations:

Stay up to date with the latest developments in the Hadoop ecosystem.

Recommend and implement

new technologies and tools that enhance the platform.

Documentation:

Experience in documenting Hadoop infrastructure configurations, processes, and best practices.

Technical Support and Guidance:

Provide

technical guidance and support to other team members and stakeholders.

Admin:

User Interface Design:

Relevant for designing interfaces for tools within the Hadoop ecosystem that

provide self-service capabilities, such as Hadoop cluster management interfaces or job scheduling dashboards.

Role-Based Access Control (RBAC):

Important for controlling access to Hadoop clusters, ensuring that users have

appropriate permissions to perform self-service tasks.

Cluster Configuration Templates:

Useful for

maintaining consistent configurations across Hadoop clusters, ensuring that users follow best practices and guidelines.

Resource Management:

Important for

optimizing resource utilization within Hadoop clusters, allowing users to manage resources dynamically based on their needs.

Self-Service Provisioning:

Pertinent for features that enable users to provision and manage nodes within Hadoop clusters independently.

Monitoring and Alerts:

Essential for

monitoring the health and performance of Hadoop clusters, providing users with insights into their cluster's status.

Automated Scaling:

Relevant for automatically adjusting the size of Hadoop clusters based on workload demands.

Job Scheduling and Prioritization:

Important for managing data processing jobs within Hadoop clusters efficiently.

Self-Service Data Ingestion:

Applicable to features that

facilitate users in ingesting data into Hadoop clusters independently.

Query Optimization and Tuning Assistance:

Relevant for providing users with tools or guidance to

optimize and tune their queries when interacting with Hadoop-based data.

Documentation and Training:

Important for creating resources that help users understand how to use self-service features within the Hadoop ecosystem effectively.

Data Access Control:

Pertinent for controlling access to data stored within Hadoop clusters, ensuring proper data governance.

Backup and Restore Functionality:

Applicable to features that allow users to perform backup and restore operations for data stored within Hadoop clusters.

Containerization and Orchestration:

Relevant for deploying and managing applications within Hadoop clusters using containerization and orchestration tools.

User Feedback Mechanism:

Important for continuously improving self-service features based on user input and experience within the Hadoop ecosystem.

Cost Monitoring and Optimization:

Applicable to tools or features that help users

monitor and optimize costs associated with their usage of Hadoop clusters.

Compliance and Auditing:

Relevant for ensuring compliance with organizational policies and auditing user activities within the Hadoop ecosystem.

Data Engineering:

ETL (Extract, Transform, Load) Processes:

Proficiency

in designing and implementing ETL processes for ingesting, transforming, and loading data into Hadoop clusters.

Experience with tools like Apache

NiFi

Data Modeling and Database Design:

Understanding of data modeling principles and database design concepts.

Ability to design and implement effective data storage structures in Hadoop.

SQL and Query Optimization:

Strong SQL skills for data extraction and analysis from Hadoop-based data stores.

Experience in

optimizing SQL queries for efficient data retrieval.

Streaming Data Processing:

Familiarity with real-time data processing and streaming technologies, such as Apache Kafka and Spark Streaming.

Experience in designing and implementing streaming data pipelines.

Data Quality and Governance:

Knowledge of data quality assurance and governance practices.

Implementing measures to ensure data accuracy, consistency, and integrity.

Workflow Orchestration:

Experience with workflow orchestration tools (e.g., Apache Airflow) to manage and schedule data processing workflows.

Automating and orchestrating data pipelines.

Data Warehousing Concepts:

Understanding of data warehousing concepts and best practices.

Integrating Hadoop-based solutions with traditional data warehousing systems.

Version Control:

Proficiency

in version control systems (e.g., Git) for managing and tracking changes in code and configurations.

Collaboration with Data Scientists:

Collaborate effectively with data scientists to understand analytical requirements and support the deployment of machine learning models.

Data Security and Compliance:

Implementing security measures within data pipelines to protect sensitive information.

Ensuring compliance with data security and privacy regulations.

Data Catalog and Metadata Management:

Implementing data catalog solutions to manage metadata and enhance data discovery.

Enabling metadata-driven data governance.

Big Data Technologies Beyond Hadoop:

Familiarity with other big data technologies beyond Hadoop, such as Apache Flink or Apache Beam.

Data Transformation and Serialization:

Expertise

in data serialization formats (e.g., Avro, Parquet) and transforming data between formats.

Data Storage Optimization:

Optimizing data storage strategies for cost-effectiveness and performance.

Desired Skills:

Problem-Solving and Analytical Thinking:

Strong analytical and problem-solving skills to troubleshoot complex issues in Hadoop clusters.

Ability to analyze data requirements and

optimize data processing workflows.

Collaboration and Teamwork:

Collaborative mindset to work effectively with cross-functional teams, including data engineers, data scientists, and DevOps teams.

Ability to provide technical guidance and support to team members.

Adaptability and Continuous Learning:

Ability to adapt to changes in technology and industry trends within the Hadoop ecosystem

and willingness to continuously learn and upgrade skills to stay current.

Performance Monitoring and Tuning:

Proactive approach to performance monitoring and tuning, ensuring

optimal performance of Hadoop clusters.

Ability to analyze and address performance bottlenecks.

Security Best Practices:

knowledge of security best practices within the Hadoop ecosystem.

Capacity Planning:

Skill in

capacity planning to anticipate and scale Hadoop clusters according to data processing needs.

Automation and Scripting:

Strong scripting skills for automation (e.g., Python, Ansible) beyond shell scripting.

Familiarity with configuration management tools for infrastructure automation.

Monitoring and Observability:

Experience in setting up comprehensive monitoring and observability tools for Hadoop clusters.

Ability to proactively identify and address potential issues.

Networking Skills:

Understanding of networking concepts relevant to Hadoop clusters.

Skills:

Technical Proficiency:

Experience with Hadoop and Big Data technologies, including Cloudera CDH/CDP, Data Bricks, HD Insights, etc.

Strong understanding of core Hadoop services such as HDFS, MapReduce, Kafka, Spark, Hive, Impala, HBase, Kudu, Sqoop, and Oozie.

Proficiency

in RHEL Linux operating systems, databases, and hardware administration.

Operations and Design:

Operations, design,

capacity planning, cluster setup, security, and performance tuning in large-scale Enterprise Hadoop environments.

Scripting and Automation:

Proficient in shell scripting (e.g., Bash,

KSH) for automation.

Security Implementation:

Experience in setting up, configuring, and managing security for Hadoop clusters using Kerberos with integration with LDAP/AD.

Problem Solving and Troubleshooting:

Expertise

in system administration and programming skills for storage capacity management, debugging, and performance tuning.

Collaboration and Communication:

Collaborate with cross-functional teams, including data engineers, data scientists, and DevOps teams.

Provide technical guidance and support to team members and stakeholders.

Skills:

On-prem instance

Hadoop config, performance, tuning

Ability to

manage very large clusters and understand scalability

Interfacing with multiple teams

Many teams have self service capabilities, so should have this experience managing this with multiple teams across large

clusters.Hands-on and strong understanding of Hadoop architecture

Experience with Hadoop ecosystem components - HDFS, YARN, MapReduce & cluster management tools like Ambari or Cloudera Manager and related technologies.

Proficiency

in scripting, Linux system administration, networking, and troubleshooting skills

Qualifications:

Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).

Strong experience in designing, implementing, and administering Hadoop clusters in a production environment.

Proficiency

in Hadoop ecosystem components such as HDFS, YARN, MapReduce, Hive, Spark, and HBase.

Experience with cluster management tools like Apache Ambari or Cloudera Manager.

Solid understanding of Linux/Unix systems and networking concepts.

Strong scripting skills (e.g., Bash, Python) for automation and troubleshooting.

Knowledge of database concepts and SQL.

Experience with data ingestion tools like Apache Kafka or Apache

NiFi.

Familiarity with data warehouse concepts and technologies.

Understanding of security principles and experience implementing security measures in Hadoop clusters.

Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex issues.

Excellent communication and collaboration skills to work effectively with cross-functional teams.

Relevant certifications such as Cloudera Certified Administrator for Apache Hadoop (CCAH) or Hortonworks Certified Administrator (HCA) are a plus.

Thanks and Regards

[email protected]

Keywords: active directory Delaware Georgia New Jersey New York North Carolina Texas

[email protected]
View all

Thu Feb 22 19:57:00 UTC 2024

To remove this job post send "job_kill 1145613" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

princy@maintec.com wrote:
From:

Princy Jain,

Maintec

princy@maintec.com

Reply to:   princy@maintec.com

Role: Hadoop Platform Engineer

Locations:

Plano, TX / Jersey City, NJ / New York City, NY / Atlanta, GA / Newark, DE / Charlotte, NC

Duration: Contract/Fulltime

(Hybrid role from Day1)

Client: Bank Of America

Mandatory experience: 10+ Years or more.

Job Description:

As a Hadoop Platform Engineer, you will

be responsible for designing, implementing, and managing our company's Hadoop infrastructure and data ecosystem. You will collaborate with cross-functional teams to understand data requirements, optimize data pipelines, and ensure the reliability and performance of our Hadoop clusters. You will also be responsible for administering and monitoring the Hadoop environment, troubleshooting issues, and implementing security measures.

Required Skills:

Platform Engineering:

Cluster Management:

Expertise

in design, implement, and maintain Hadoop clusters in large volume, including components such as HDFS, YARN, and MapReduce.

Collaborate with data engineers and data scientists to understand data requirements and

optimize data pipelines.

Administration and Monitoring:

Experience in administering and monitoring Hadoop clusters to ensure high availability, reliability, and performance.

Experience in troubleshooting and resolving issues related to Hadoop infrastructure, data ingestion, data processing, and data storage.

Security Implementation:

Experience in Implementing and managing security measures within Hadoop clusters, including authentication, authorization, and encryption.

Backup and Disaster Recovery:

Collaborate with cross-functional teams to define and implement backup and disaster recovery strategies for Hadoop clusters.

Performance Optimization:

Experience in

optimizing Hadoop performance through fine-tuning configurations,

capacity planning, and implementing performance monitoring and tuning techniques.

Automation and DevOps Collaboration:

Work with DevOps teams to automate Hadoop infrastructure provisioning, deployment, and management processes.

Technology Adoption and Recommendations:

Stay up to date with the latest developments in the Hadoop ecosystem.

Recommend and implement

new technologies and tools that enhance the platform.

Documentation:

Experience in documenting Hadoop infrastructure configurations, processes, and best practices.

Technical Support and Guidance:

Provide

technical guidance and support to other team members and stakeholders.

Admin:

User Interface Design:

Relevant for designing interfaces for tools within the Hadoop ecosystem that

provide self-service capabilities, such as Hadoop cluster management interfaces or job scheduling dashboards.

Role-Based Access Control (RBAC):

Important for controlling access to Hadoop clusters, ensuring that users have

appropriate permissions to perform self-service tasks.

Cluster Configuration Templates:

Useful for

maintaining consistent configurations across Hadoop clusters, ensuring that users follow best practices and guidelines.

Resource Management:

Important for

optimizing resource utilization within Hadoop clusters, allowing users to manage resources dynamically based on their needs.

Self-Service Provisioning:

Pertinent for features that enable users to provision and manage nodes within Hadoop clusters independently.

Monitoring and Alerts:

Essential for

monitoring the health and performance of Hadoop clusters, providing users with insights into their cluster's status.

Automated Scaling:

Relevant for automatically adjusting the size of Hadoop clusters based on workload demands.

Job Scheduling and Prioritization:

Important for managing data processing jobs within Hadoop clusters efficiently.

Self-Service Data Ingestion:

Applicable to features that

facilitate users in ingesting data into Hadoop clusters independently.

Query Optimization and Tuning Assistance:

Relevant for providing users with tools or guidance to

optimize and tune their queries when interacting with Hadoop-based data.

Documentation and Training:

Important for creating resources that help users understand how to use self-service features within the Hadoop ecosystem effectively.

Data Access Control:

Pertinent for controlling access to data stored within Hadoop clusters, ensuring proper data governance.

Backup and Restore Functionality:

Applicable to features that allow users to perform backup and restore operations for data stored within Hadoop clusters.

Containerization and Orchestration:

Relevant for deploying and managing applications within Hadoop clusters using containerization and orchestration tools.

User Feedback Mechanism:

Important for continuously improving self-service features based on user input and experience within the Hadoop ecosystem.

Cost Monitoring and Optimization:

Applicable to tools or features that help users

monitor and optimize costs associated with their usage of Hadoop clusters.

Compliance and Auditing:

Relevant for ensuring compliance with organizational policies and auditing user activities within the Hadoop ecosystem.

Data Engineering:

ETL (Extract, Transform, Load) Processes:

Proficiency

in designing and implementing ETL processes for ingesting, transforming, and loading data into Hadoop clusters.

Experience with tools like Apache

NiFi

Data Modeling and Database Design:

Understanding of data modeling principles and database design concepts.

Ability to design and implement effective data storage structures in Hadoop.

SQL and Query Optimization:

Strong SQL skills for data extraction and analysis from Hadoop-based data stores.

Experience in

optimizing SQL queries for efficient data retrieval.

Streaming Data Processing:

Familiarity with real-time data processing and streaming technologies, such as Apache Kafka and Spark Streaming.

Experience in designing and implementing streaming data pipelines.

Data Quality and Governance:

Knowledge of data quality assurance and governance practices.

Implementing measures to ensure data accuracy, consistency, and integrity.

Workflow Orchestration:

Experience with workflow orchestration tools (e.g., Apache Airflow) to manage and schedule data processing workflows.

Automating and orchestrating data pipelines.

Data Warehousing Concepts:

Understanding of data warehousing concepts and best practices.

Integrating Hadoop-based solutions with traditional data warehousing systems.

Version Control:

Proficiency

in version control systems (e.g., Git) for managing and tracking changes in code and configurations.

Collaboration with Data Scientists:

Collaborate effectively with data scientists to understand analytical requirements and support the deployment of machine learning models.

Data Security and Compliance:

Implementing security measures within data pipelines to protect sensitive information.

Ensuring compliance with data security and privacy regulations.

Data Catalog and Metadata Management:

Implementing data catalog solutions to manage metadata and enhance data discovery.

Enabling metadata-driven data governance.

Big Data Technologies Beyond Hadoop:

Familiarity with other big data technologies beyond Hadoop, such as Apache Flink or Apache Beam.

Data Transformation and Serialization:

Expertise

in data serialization formats (e.g., Avro, Parquet) and transforming data between formats.

Data Storage Optimization:

Optimizing data storage strategies for cost-effectiveness and performance.

Desired Skills:

Problem-Solving and Analytical Thinking:

Strong analytical and problem-solving skills to troubleshoot complex issues in Hadoop clusters.

Ability to analyze data requirements and

optimize data processing workflows.

Collaboration and Teamwork:

Collaborative mindset to work effectively with cross-functional teams, including data engineers, data scientists, and DevOps teams.

Ability to provide technical guidance and support to team members.

Adaptability and Continuous Learning:

Ability to adapt to changes in technology and industry trends within the Hadoop ecosystem

and willingness to continuously learn and upgrade skills to stay current.

Performance Monitoring and Tuning:

Proactive approach to performance monitoring and tuning, ensuring

optimal performance of Hadoop clusters.

Ability to analyze and address performance bottlenecks.

Security Best Practices:

knowledge of security best practices within the Hadoop ecosystem.

Capacity Planning:

Skill in

capacity planning to anticipate and scale Hadoop clusters according to data processing needs.

Automation and Scripting:

Strong scripting skills for automation (e.g., Python, Ansible) beyond shell scripting.

Familiarity with configuration management tools for infrastructure automation.

Monitoring and Observability:

Experience in setting up comprehensive monitoring and observability tools for Hadoop clusters.

Ability to proactively identify and address potential issues.

Networking Skills:

Understanding of networking concepts relevant to Hadoop clusters.

Skills:

Technical Proficiency:

Experience with Hadoop and Big Data technologies, including Cloudera CDH/CDP, Data Bricks, HD Insights, etc.

Strong understanding of core Hadoop services such as HDFS, MapReduce, Kafka, Spark, Hive, Impala, HBase, Kudu, Sqoop, and Oozie.

Proficiency

in RHEL Linux operating systems, databases, and hardware administration.

Operations and Design:

Operations, design,

capacity planning, cluster setup, security, and performance tuning in large-scale Enterprise Hadoop environments.

Scripting and Automation:

Proficient in shell scripting (e.g., Bash,

KSH) for automation.

Security Implementation:

Experience in setting up, configuring, and managing security for Hadoop clusters using Kerberos with integration with LDAP/AD.

Problem Solving and Troubleshooting:

Expertise

in system administration and programming skills for storage capacity management, debugging, and performance tuning.

Collaboration and Communication:

Collaborate with cross-functional teams, including data engineers, data scientists, and DevOps teams.

Provide technical guidance and support to team members and stakeholders.

Skills:

On-prem instance

Hadoop config, performance, tuning

Ability to

manage  very large clusters and understand scalability

Interfacing with multiple teams

Many teams have self service capabilities, so should have this experience managing this with multiple teams across large

clusters.Hands-on and strong understanding of Hadoop architecture

Experience with Hadoop ecosystem components - HDFS, YARN, MapReduce & cluster management tools like Ambari or Cloudera Manager and related technologies.

Proficiency

in scripting, Linux system administration, networking, and troubleshooting skills

Qualifications:

Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent work experience).

Strong experience in designing, implementing, and administering Hadoop clusters in a production environment.

Proficiency

in Hadoop ecosystem components such as HDFS, YARN, MapReduce, Hive, Spark, and HBase.

Experience with cluster management tools like Apache Ambari or Cloudera Manager.

Solid understanding of Linux/Unix systems and networking concepts.

Strong scripting skills (e.g., Bash, Python) for automation and troubleshooting.

Knowledge of database concepts and SQL.

Experience with data ingestion tools like Apache Kafka or Apache

NiFi.

Familiarity with data warehouse concepts and technologies.

Understanding of security principles and experience implementing security measures in Hadoop clusters.

Strong problem-solving and troubleshooting skills, with the ability to analyze and resolve complex issues.

Excellent communication and collaboration skills to work effectively with cross-functional teams.

Relevant certifications such as Cloudera Certified Administrator for Apache Hadoop (CCAH) or Hortonworks Certified Administrator (HCA) are a plus.

Thanks and Regards

princy@maintec.com

Keywords: active directory Delaware Georgia New Jersey New York North Carolina Texas

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,