Job Details

Home

Big Data Lead software engineer at Remote, Remote, USA

Email: [email protected]

Hello,

Greetings from Blueverse systems

Hope you are doing well.

This is Venkat from Blueverse systems
.

Review the below job description and let me know your interest by replying to this email with an updated resume.

Job Title: Lead Big Data Engineer

Visa: USC, GC only.

Multiple Locations: AZ, CA, MN, NC, NY, NJ, PA, VA, TX ( Onsite)

Experience: 13

What I need to submit:

Required Items:

One manager reference

Please send one reference I can speak to on your behalf. These must be from managers or supervisors youve reported most recently or within the past 3 years

Please provide them in the following format: Manager Name, Manager Title, LinkedIn Profile, Phone Number, Work Email Address
.

Top Skills' Details

1) Experience building/standing up a big data on-prem solution (Hadoop, Cloudera, Hortonworks), data lake or data warehouse or similar solution.

2) All roles require 100% hands on experience and strong data foundations. These skills MUST be on prem NOT in the cloud, if you have both thats fine but they need these skills from an on-prem big data environment.

3) Big Data Platform (data lake) and data warehouse engineering experience. Preferably with Hadoop stack: HDFS, Hive, SQL, Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Atlas, Flink, Kafka, Cloudera Manager, Airflow, Impala, Hive, HBase, Tez, Hue, and a variety of source data connectors. Solid hands-on software engineer who can design and code Big Data pipeline frameworks (as a software product - Cloudera ideally) not just a data engineer implementing spark jobs, or a team lead for data engineers. Building self-service data pipelines that help automate the controls to help build the data pipeline and ingest data into the ecosystem (data lake) and transform it for different consumption to support GCP, Hadoop on Premise, bring in massive volumes of Cybersecurity data , validating data and data quality.

4. solid PySpark Developer experience working with Spark Core, Spark Streaming, Spark Optimizations know how to optimize the code, PySpark API. Experience writing PySpark code. PySpark with solid Hadoop data lake foundations

5. Airflow experience - just using it and developing workflows in Airflow.

Job Description

The Big Data Lead software engineer is responsible for owning and driving the technical innovation along with big data technologies. The individual is a subject matter expert technologist with strong Python experience and deep hands-on experience building data pipelines for the Hadoop platform as well as Google cloud. This person will be part of successful Big Data implementations for large data integration initiatives. The candidates for this role must be willing to push the limits of traditional development paradigms typically found in a data-centric organization while embracing the opportunity to gain subject matter expertise in the cyber security domain.

In this role you will

Lead the design and development of sophisticated, resilient, and secure engineering solutions for modernizing our data ecosystem that typically involve multiple disciplines, including big data architecture, data pipelines, data management, and data modeling specific to consumer use cases.

Provide technical expertise for the design, implementation, maintenance, and control of data management services especially end-to-end, scale-out data pipelines.

Develop self-service, multitenant capabilities on the cyber security data lake including custom/of the shelf services integrated with the Hadoop platform and Google cloud, use API and messaging to communicate across services, integrate with distributed data processing frameworks and data access engines built on the cluster, integrate with enterprise services for security, data governance and automated data controls, and implement policies to enforce fine-grained data access

Build, certify and deploy highly automated services and features for data management (registering, classifying, collecting, loading, formatting, cleansing, structuring, transforming, reformatting, distributing, and archiving/purging) through Data Ingestion, Processing, and Consumption stages of the analytical data lifecycle.

Provide the highest technical leadership in terms of design, engineering, deployment and maintenance of solutions through collaborative efforts with the team and third-party vendors.

Design, code, test, debug, and document programs using Agile development practices.

Review and analyze complex data management technologies that require in depth evaluation of multiple factors including intangibles or unprecedented factors.

Assist in production deployments, including troubleshooting and problem resolution.

Collaborate with enterprise, data platform, data delivery, and other product teams to provide strategic solutions, influencing long range internal and enterprise level data architecture and change management strategies.

Provide technical leadership and recommendation into the future direction of data management technology and custom engineering designs.

Collaborate and consult with peers, colleagues, and managers to resolve issues and achieve goals.

10+ years of Big Data Platform (data lake) and data warehouse engineering experience. Preferably with Hadoop stack: HDFS, Hive, SQL, Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Atlas, Flink, Kafka, Cloudera Manager, Airflow, Impala, Hive, HBase, Tez, Hue, and a variety of source data connectors. Solid hands-on software engineer who can design and code BigData pipeline frameworks (as a software product - Cloudera ideally) not just a data engineer implementing spark jobs, or a team lead for data engineers. Building self-service data pipelines that help automate the controls to help build the data pipeline and ingest data into the ecosystem (data lake) and transform it for different consumption to support GCP, Hadoop On Premise, bring in massive volumes of Cybersecurity data , validating data and data quality. Reporting consumption advanced analytics, data science and ML.

- 3+ years of hands-on experience designing and building modern, resilient, and secure data pipelines, including movement, collection, integration, transformation of structured/unstructured data with built-in automated data controls, and built-in logging/monitoring/alerting, and pipeline orchestration managed to operational SLAs. Preferably using Airflow Custom Operator (at least 1 year of experience customizing within it), DAGS, connector plugins. - Python, spark, PySpark - working with APIs to integrate different services, Google big data services, Cloud data proc, data store, BigQuery, cloud composer Google data services. On prem Apache Airflow streaming tool core orchestrator. Kafka for streaming services getting data sourced from and then spark streaming.

Python, spark, APIs to integrate different services, GCP services

Building self-service data pipelines supports GCP, Hadoop On Premise, bring in massive volumes of Cybersecurity data , validating data and data quality. Reporting consumption advanced analytics, data science and ML.

Skill sets: Python, Spark, (PYSPARK) used APIs to integrate with various services, Google big data services, Cloud data proc, data store, BigQuery, cloud composer Google data services. On prem Apache Airflow streaming tool core orchestrator. Kafka for streaming services getting data sourced from and then spark streaming.

Additional Skills & Qualifications

Additional skills to look for in any/all the above candidates as a plus: GCP, Kafka/Kafka Connect, Hive DB development

Experience with Google cloud data services such as cloud storage, cloud proc, cloud flow, and Big Query. Google Cloud Big Data Specialty hands on experience ideally not just a certification

Hands-on experience developing and managing technical and business metadata

Experience creating/managing Time-Series data from full data snapshots or incremental data changes

Hands-on experience with implementing fine-grained access controls such as Attribute Based Access Controls using Apache Ranger

Experience automating DQ validation in the data pipelines

Experience implementing automated data change management including code and schema, versioning, QA, CI/CD, rollback processing

Experience with automating end to end data lifecycle on the big data ecosystem

Experience with managing automated schema evolution within data pipelines

Experience implementing masking and/or other forms of obfuscating data

Experience designing and building microservices, APIs and, MySQL

Advanced understanding of SQL and NoSQL DB schemas

Advanced understanding of Partitioned Parquet, ORC, Avro, various compression formats

Developing containerized Microservices and APIs

Familiarity with key concepts implemented by Apache Hudi or Iceberg, or Databricks Delta Lake (bonus)

Job expectations:

Ability to occasionally work nights and/or weekends as needed for on-call/production issue resolution

Ability to occasionally work nights and/or weekends for off-hours system maintenance

Employee Value Proposition (EVP)

Strategy: The more tactical need (2-3 years) is to implement a robust big data platform on-premise to meet Wells Fargos Cyber Security BI/analytics/reporting and data science/ML needs. This includes building a custom Data Pipeline solution using Spark, Airflow and on top of the Hadoop platform using python. In parallel, we would like to start onboarding select early-adopter use cases to our target state Google Cloud Platform (GCP) starting Q12023. Portability of our on-premise solutions to GCP is critical. As we learn and gain momentum on GCP, we will start to accelerate our journey to the public cloud expect that to be around Q3/Q4 of 2023.

Work Environment

Wells Fargo core locations are ideal to come onsite 3 days a week and 2 days remote (they can choose the days
)

Business Drivers/Customer Impact

Strategy: The more tactical need (2-3 years) is to implement a robust big data platform on-premise to meet Wells Fargos Cyber Security BI/analytics/reporting and data science/ML needs. This includes building a custom Data Pipeline solution using Spark, Airflow and on top of the Hadoop platform using python. In parallel, we would like to start onboarding select early-adopter use cases to our target state Google Cloud Platform (GCP). Portability of our on-premise solutions to GCP is critical. As we learn and gain momentum on GCP, we will start to accelerate our journey to the public cloud expect that to be around Q3/Q4 of 2023.

Why is the position open(provide details)

External Communities Job Description

Lead complex initiatives on selected domains. Ensure systems are monitored to increase operational efficiency and managed to mitigate risk. Define opportunities to maximize resource utilization and improve processes while reducing cost. Lead, design, develop, test and implement applications and system components, tools and utilities, models, simulation, and analytics to manage complex business functions using sophisticated technologies. Resolve coding, testing and escalated platform issues of a

--

Regards,

Venkat

Sr IT Recruiter

Blueverse Systems

Address: 13800 Coppermine Rd, Herndon, VA - 20171

: [email protected]

Company LinkedIn :
https://www.linkedin.com/company/blue-verse-systems/mycompany/

--

Keywords: continuous integration continuous deployment quality analyst machine learning business intelligence database information technology green card Arizona California Minnesota New Jersey New York North Carolina Pennsylvania Texas Virginia
Big Data Lead software engineer
[email protected]

[email protected]
View all

Thu Aug 15 22:59:00 UTC 2024

To remove this job post send "job_kill 1664227" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

venkat@blueversesystems.com wrote:
Hello,

Greetings from Blueverse systems

Hope you are doing well.

This is Venkat from Blueverse systems
.

Review the below job description and let me know your interest by replying to this email with an updated resume.

Job Title: Lead Big Data Engineer

Visa: USC, GC only.

Multiple Locations: AZ, CA, MN, NC, NY, NJ, PA, VA, TX ( Onsite)

Experience: 13

What I need to submit:

Required Items:

One manager reference

Please send one reference I can speak to on your behalf. These must be from managers or supervisors youve reported most recently or within the past 3 years

Please provide them in the following format: Manager Name, Manager Title, LinkedIn Profile, Phone Number, Work Email Address
.

Top Skills' Details

1) Experience building/standing up a big data on-prem solution (Hadoop, Cloudera, Hortonworks), data lake or data warehouse or similar solution.

2) All roles require 100% hands on experience and strong data foundations. These skills MUST be on prem NOT in the cloud, if you have both thats fine but they need these skills from an on-prem big data environment.

3) Big Data Platform (data lake) and data warehouse engineering experience. Preferably with Hadoop stack: HDFS, Hive, SQL, Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Atlas, Flink, Kafka, Cloudera Manager, Airflow, Impala, Hive, HBase, Tez, Hue, and a variety of source data connectors. Solid hands-on software engineer who can design and code Big Data pipeline frameworks (as a software product - Cloudera ideally)  not just a data engineer implementing spark jobs, or a team lead for data engineers. Building self-service data pipelines that help automate the controls to help build the data pipeline and ingest data into the ecosystem (data lake) and transform it for different consumption to support GCP, Hadoop on Premise, bring in massive volumes of Cybersecurity data , validating data and data quality.

4. solid PySpark Developer  experience working with Spark Core, Spark Streaming, Spark Optimizations  know how to optimize the code, PySpark API. Experience writing PySpark code. PySpark with solid Hadoop data lake foundations

5. Airflow experience - just using it and developing workflows in Airflow.

Job Description

The Big Data Lead software engineer is responsible for owning and driving the technical innovation along with big data technologies. The individual is a subject matter expert technologist with strong Python experience and deep hands-on experience building data pipelines for the Hadoop platform as well as Google cloud. This person will be part of successful Big Data implementations for large data integration initiatives. The candidates for this role must be willing to push the limits of traditional development paradigms typically found in a data-centric organization while embracing the opportunity to gain subject matter expertise in the cyber security domain.

In this role you will

Lead the design and development of sophisticated, resilient, and secure engineering solutions for modernizing our data ecosystem that typically involve multiple disciplines, including big data architecture, data pipelines, data management, and data modeling specific to consumer use cases.

Provide technical expertise for the design, implementation, maintenance, and control of data management services  especially end-to-end, scale-out data pipelines.

Develop self-service, multitenant capabilities on the cyber security data lake including custom/of the shelf services integrated with the Hadoop platform and Google cloud, use API and messaging to communicate across services, integrate with distributed data processing frameworks and data access engines built on the cluster, integrate with enterprise services for security, data governance and automated data controls, and implement policies to enforce fine-grained data access

Build, certify and deploy highly automated services and features for data management (registering, classifying, collecting, loading, formatting, cleansing, structuring, transforming, reformatting, distributing, and archiving/purging) through Data Ingestion, Processing, and Consumption stages of the analytical data lifecycle.

Provide the highest technical leadership in terms of design, engineering, deployment and maintenance of solutions through collaborative efforts with the team and third-party vendors.

Design, code, test, debug, and document programs using Agile development practices.

Review and analyze complex data management technologies that require in depth evaluation of multiple factors including intangibles or unprecedented factors.

Assist in production deployments, including troubleshooting and problem resolution.

Collaborate with enterprise, data platform, data delivery, and other product teams to provide strategic solutions, influencing long range internal and enterprise level data architecture and change management strategies.

Provide technical leadership and recommendation into the future direction of data management technology and custom engineering designs.

Collaborate and consult with peers, colleagues, and managers to resolve issues and achieve goals.

10+ years of Big Data Platform (data lake) and data warehouse engineering experience. Preferably with Hadoop stack: HDFS, Hive, SQL, Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Atlas, Flink, Kafka, Cloudera Manager, Airflow, Impala, Hive, HBase, Tez, Hue, and a variety of source data connectors. Solid hands-on software engineer who can design and code BigData pipeline frameworks (as a software product - Cloudera ideally)  not just a data engineer implementing spark jobs, or a team lead for data engineers. Building self-service data pipelines that help automate the controls to help build the data pipeline and ingest data into the ecosystem (data lake) and transform it for different consumption to support GCP, Hadoop On Premise, bring in massive volumes of Cybersecurity data , validating data and data quality. Reporting consumption  advanced analytics, data science and ML.

- 3+ years of hands-on experience designing and building modern, resilient, and secure data pipelines, including movement, collection, integration, transformation of structured/unstructured data with built-in automated data controls, and built-in logging/monitoring/alerting, and pipeline orchestration managed to operational SLAs. Preferably using Airflow Custom Operator (at least 1 year of experience customizing within it), DAGS, connector plugins. - Python, spark, PySpark - working with APIs to integrate different services, Google big data services, Cloud data proc, data store, BigQuery, cloud composer  Google data services. On prem  Apache Airflow  streaming tool core orchestrator. Kafka for streaming services  getting data sourced from and then spark streaming.

Python, spark, APIs to integrate different services, GCP services

Building self-service data pipelines  supports GCP, Hadoop On Premise, bring in massive volumes of Cybersecurity data , validating data and data quality. Reporting consumption  advanced analytics, data science and ML.

Skill sets: Python, Spark, (PYSPARK) used APIs to integrate with various services, Google big data services, Cloud data proc, data store, BigQuery, cloud composer  Google data services. On prem  Apache Airflow  streaming tool core orchestrator. Kafka for streaming services  getting data sourced from and then spark streaming.

Additional Skills & Qualifications

Additional skills to look for in any/all the above candidates as a plus: GCP, Kafka/Kafka Connect, Hive DB development

Experience with Google cloud data services such as cloud storage, cloud proc, cloud flow, and Big Query. Google Cloud Big Data Specialty  hands on experience ideally not just a certification

Hands-on experience developing and managing technical and business metadata

Experience creating/managing Time-Series data from full data snapshots or incremental data changes

Hands-on experience with implementing fine-grained access controls such as Attribute Based Access Controls using Apache Ranger

Experience automating DQ validation in the data pipelines

Experience implementing automated data change management including code and schema, versioning, QA, CI/CD, rollback processing

Experience with automating end to end data lifecycle on the big data ecosystem

Experience with managing automated schema evolution within data pipelines

Experience implementing masking and/or other forms of obfuscating data

Experience designing and building microservices, APIs and, MySQL

Advanced understanding of SQL and NoSQL DB schemas

Advanced understanding of Partitioned Parquet, ORC, Avro, various compression formats

Developing containerized Microservices and APIs

Familiarity with key concepts implemented by Apache Hudi or Iceberg, or Databricks Delta Lake (bonus)

Job expectations:

Ability to occasionally work nights and/or weekends as needed for on-call/production issue resolution

Ability to occasionally work nights and/or weekends for off-hours system maintenance

Employee Value Proposition (EVP)

Strategy: The more tactical need (2-3 years) is to implement a robust big data platform on-premise to meet Wells Fargos Cyber Security BI/analytics/reporting and data science/ML needs. This includes building a custom Data Pipeline solution using Spark, Airflow and on top of the Hadoop platform using python. In parallel, we would like to start onboarding select early-adopter use cases to our target state Google Cloud Platform (GCP) starting Q12023. Portability of our on-premise solutions to GCP is critical. As we learn and gain momentum on GCP, we will start to accelerate our journey to the public cloud  expect that to be around Q3/Q4 of 2023.

Work Environment

Wells Fargo core locations are ideal to come onsite 3 days a week and 2 days remote (they can choose the days
)

Business Drivers/Customer Impact

Strategy: The more tactical need (2-3 years) is to implement a robust big data platform on-premise to meet Wells Fargos Cyber Security BI/analytics/reporting and data science/ML needs. This includes building a custom Data Pipeline solution using Spark, Airflow and on top of the Hadoop platform using python. In parallel, we would like to start onboarding select early-adopter use cases to our target state Google Cloud Platform (GCP). Portability of our on-premise solutions to GCP is critical. As we learn and gain momentum on GCP, we will start to accelerate our journey to the public cloud  expect that to be around Q3/Q4 of 2023.

Why is the position open(provide details)

External Communities Job Description

Lead complex initiatives on selected domains. Ensure systems are monitored to increase operational efficiency and managed to mitigate risk. Define opportunities to maximize resource utilization and improve processes while reducing cost. Lead, design, develop, test and implement applications and system components, tools and utilities, models, simulation, and analytics to manage complex business functions using sophisticated technologies. Resolve coding, testing and escalated platform issues of a

Regards,

Venkat

Sr IT Recruiter

Blueverse Systems

Address: 13800 Coppermine Rd, Herndon, VA - 20171

: venkat@blueversesystems.com

Company LinkedIn :
  https://www.linkedin.com/company/blue-verse-systems/mycompany/

Keywords: continuous integration continuous deployment quality analyst machine learning business intelligence database information technology green card Arizona California Minnesota New Jersey New York North Carolina Pennsylvania Texas Virginia 
Big Data Lead software engineer
venkat@blueversesystems.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: , Arizona