Big Data Lead software engineer at Remote, Remote, USA |
Email: [email protected] |
Hello, Greetings from Blueverse systems Hope you are doing well. This is Venkat from Blueverse systems . Review the below job description and let me know your interest by replying to this email with an updated resume. Job Title: Lead Big Data Engineer Visa: USC, GC only. Multiple Locations: AZ, CA, MN, NC, NY, NJ, PA, VA, TX ( Onsite) Experience: 13 What I need to submit: Required Items: One manager reference Please send one reference I can speak to on your behalf. These must be from managers or supervisors youve reported most recently or within the past 3 years Please provide them in the following format: Manager Name, Manager Title, LinkedIn Profile, Phone Number, Work Email Address . Top Skills' Details 1) Experience building/standing up a big data on-prem solution (Hadoop, Cloudera, Hortonworks), data lake or data warehouse or similar solution. 2) All roles require 100% hands on experience and strong data foundations. These skills MUST be on prem NOT in the cloud, if you have both thats fine but they need these skills from an on-prem big data environment. 3) Big Data Platform (data lake) and data warehouse engineering experience. Preferably with Hadoop stack: HDFS, Hive, SQL, Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Atlas, Flink, Kafka, Cloudera Manager, Airflow, Impala, Hive, HBase, Tez, Hue, and a variety of source data connectors. Solid hands-on software engineer who can design and code Big Data pipeline frameworks (as a software product - Cloudera ideally) not just a data engineer implementing spark jobs, or a team lead for data engineers. Building self-service data pipelines that help automate the controls to help build the data pipeline and ingest data into the ecosystem (data lake) and transform it for different consumption to support GCP, Hadoop on Premise, bring in massive volumes of Cybersecurity data , validating data and data quality. 4. solid PySpark Developer experience working with Spark Core, Spark Streaming, Spark Optimizations know how to optimize the code, PySpark API. Experience writing PySpark code. PySpark with solid Hadoop data lake foundations 5. Airflow experience - just using it and developing workflows in Airflow. Job Description The Big Data Lead software engineer is responsible for owning and driving the technical innovation along with big data technologies. The individual is a subject matter expert technologist with strong Python experience and deep hands-on experience building data pipelines for the Hadoop platform as well as Google cloud. This person will be part of successful Big Data implementations for large data integration initiatives. The candidates for this role must be willing to push the limits of traditional development paradigms typically found in a data-centric organization while embracing the opportunity to gain subject matter expertise in the cyber security domain. In this role you will Lead the design and development of sophisticated, resilient, and secure engineering solutions for modernizing our data ecosystem that typically involve multiple disciplines, including big data architecture, data pipelines, data management, and data modeling specific to consumer use cases. Provide technical expertise for the design, implementation, maintenance, and control of data management services especially end-to-end, scale-out data pipelines. Develop self-service, multitenant capabilities on the cyber security data lake including custom/of the shelf services integrated with the Hadoop platform and Google cloud, use API and messaging to communicate across services, integrate with distributed data processing frameworks and data access engines built on the cluster, integrate with enterprise services for security, data governance and automated data controls, and implement policies to enforce fine-grained data access Build, certify and deploy highly automated services and features for data management (registering, classifying, collecting, loading, formatting, cleansing, structuring, transforming, reformatting, distributing, and archiving/purging) through Data Ingestion, Processing, and Consumption stages of the analytical data lifecycle. Provide the highest technical leadership in terms of design, engineering, deployment and maintenance of solutions through collaborative efforts with the team and third-party vendors. Design, code, test, debug, and document programs using Agile development practices. Review and analyze complex data management technologies that require in depth evaluation of multiple factors including intangibles or unprecedented factors. Assist in production deployments, including troubleshooting and problem resolution. Collaborate with enterprise, data platform, data delivery, and other product teams to provide strategic solutions, influencing long range internal and enterprise level data architecture and change management strategies. Provide technical leadership and recommendation into the future direction of data management technology and custom engineering designs. Collaborate and consult with peers, colleagues, and managers to resolve issues and achieve goals. 10+ years of Big Data Platform (data lake) and data warehouse engineering experience. Preferably with Hadoop stack: HDFS, Hive, SQL, Spark, Spark Streaming, Spark SQL, HBase, Kafka, Sqoop, Atlas, Flink, Kafka, Cloudera Manager, Airflow, Impala, Hive, HBase, Tez, Hue, and a variety of source data connectors. Solid hands-on software engineer who can design and code BigData pipeline frameworks (as a software product - Cloudera ideally) not just a data engineer implementing spark jobs, or a team lead for data engineers. Building self-service data pipelines that help automate the controls to help build the data pipeline and ingest data into the ecosystem (data lake) and transform it for different consumption to support GCP, Hadoop On Premise, bring in massive volumes of Cybersecurity data , validating data and data quality. Reporting consumption advanced analytics, data science and ML. - 3+ years of hands-on experience designing and building modern, resilient, and secure data pipelines, including movement, collection, integration, transformation of structured/unstructured data with built-in automated data controls, and built-in logging/monitoring/alerting, and pipeline orchestration managed to operational SLAs. Preferably using Airflow Custom Operator (at least 1 year of experience customizing within it), DAGS, connector plugins. - Python, spark, PySpark - working with APIs to integrate different services, Google big data services, Cloud data proc, data store, BigQuery, cloud composer Google data services. On prem Apache Airflow streaming tool core orchestrator. Kafka for streaming services getting data sourced from and then spark streaming. Python, spark, APIs to integrate different services, GCP services Building self-service data pipelines supports GCP, Hadoop On Premise, bring in massive volumes of Cybersecurity data , validating data and data quality. Reporting consumption advanced analytics, data science and ML. Skill sets: Python, Spark, (PYSPARK) used APIs to integrate with various services, Google big data services, Cloud data proc, data store, BigQuery, cloud composer Google data services. On prem Apache Airflow streaming tool core orchestrator. Kafka for streaming services getting data sourced from and then spark streaming. Additional Skills & Qualifications Additional skills to look for in any/all the above candidates as a plus: GCP, Kafka/Kafka Connect, Hive DB development Experience with Google cloud data services such as cloud storage, cloud proc, cloud flow, and Big Query. Google Cloud Big Data Specialty hands on experience ideally not just a certification Hands-on experience developing and managing technical and business metadata Experience creating/managing Time-Series data from full data snapshots or incremental data changes Hands-on experience with implementing fine-grained access controls such as Attribute Based Access Controls using Apache Ranger Experience automating DQ validation in the data pipelines Experience implementing automated data change management including code and schema, versioning, QA, CI/CD, rollback processing Experience with automating end to end data lifecycle on the big data ecosystem Experience with managing automated schema evolution within data pipelines Experience implementing masking and/or other forms of obfuscating data Experience designing and building microservices, APIs and, MySQL Advanced understanding of SQL and NoSQL DB schemas Advanced understanding of Partitioned Parquet, ORC, Avro, various compression formats Developing containerized Microservices and APIs Familiarity with key concepts implemented by Apache Hudi or Iceberg, or Databricks Delta Lake (bonus) Job expectations: Ability to occasionally work nights and/or weekends as needed for on-call/production issue resolution Ability to occasionally work nights and/or weekends for off-hours system maintenance Employee Value Proposition (EVP) Strategy: The more tactical need (2-3 years) is to implement a robust big data platform on-premise to meet Wells Fargos Cyber Security BI/analytics/reporting and data science/ML needs. This includes building a custom Data Pipeline solution using Spark, Airflow and on top of the Hadoop platform using python. In parallel, we would like to start onboarding select early-adopter use cases to our target state Google Cloud Platform (GCP) starting Q12023. Portability of our on-premise solutions to GCP is critical. As we learn and gain momentum on GCP, we will start to accelerate our journey to the public cloud expect that to be around Q3/Q4 of 2023. Work Environment Wells Fargo core locations are ideal to come onsite 3 days a week and 2 days remote (they can choose the days ) Business Drivers/Customer Impact Strategy: The more tactical need (2-3 years) is to implement a robust big data platform on-premise to meet Wells Fargos Cyber Security BI/analytics/reporting and data science/ML needs. This includes building a custom Data Pipeline solution using Spark, Airflow and on top of the Hadoop platform using python. In parallel, we would like to start onboarding select early-adopter use cases to our target state Google Cloud Platform (GCP). Portability of our on-premise solutions to GCP is critical. As we learn and gain momentum on GCP, we will start to accelerate our journey to the public cloud expect that to be around Q3/Q4 of 2023. Why is the position open(provide details) External Communities Job Description Lead complex initiatives on selected domains. Ensure systems are monitored to increase operational efficiency and managed to mitigate risk. Define opportunities to maximize resource utilization and improve processes while reducing cost. Lead, design, develop, test and implement applications and system components, tools and utilities, models, simulation, and analytics to manage complex business functions using sophisticated technologies. Resolve coding, testing and escalated platform issues of a -- Regards, Venkat Sr IT Recruiter Blueverse Systems Address: 13800 Coppermine Rd, Herndon, VA - 20171 : [email protected] Company LinkedIn : https://www.linkedin.com/company/blue-verse-systems/mycompany/ -- Keywords: continuous integration continuous deployment quality analyst machine learning business intelligence database information technology green card Arizona California Minnesota New Jersey New York North Carolina Pennsylvania Texas Virginia Big Data Lead software engineer [email protected] |
[email protected] View all |
Thu Aug 15 22:59:00 UTC 2024 |