Job Details

Home

PySpark Developer with Data Bricks at Remote, Remote, USA

Email: [email protected]

From:

SHASHI,

Cloudingest

[email protected]

Reply to: [email protected]

PySpark Developer with Data Bricks

Remote

Job Description ::

Develop and optimize data processing jobs using PySpark to handle complex data

transformations and aggregations efficiently.

Design and implement robust data pipelines on the AWS platform, ensuring scalability

and efficiency(Databricks exposure will be an advantage)

Leverage AWS services such as EC2, S3, etc. for comprehensive data processing and

storage solutions.

Expertly manage SQL database schema design, query optimization, and performance

tuning to support data transformation and loading processes.

Design and maintain scalable and performant data warehouses, employing best practices

in data modeling and ETL processes.

Utilize modern data platforms for collaborative data science, integrating seamlessly with

various data sources and types.

Ensure high data quality and accessibility by maintaining optimal performance of

Databricks clusters and Spark jobs.

Develop and implement security measures, backup procedures, and disaster recovery

plans using AWS best practices.

Manage source code and automate deployment using GitHub along with CI/CD practices

tailored for data operations in cloud environments.

Provide expertise in troubleshooting and optimizing PySpark scripts, Databricks

notebooks, SQL queries, and Airflow DAGs.

Keep abreast of latest developments in cloud data technologies and advocate for the

adoption of new tools and practices that can benefit the team.

Use Apache Airflow to orchestrate and automate data workflows, ensuring timely and

reliable execution of data jobs across various data sources and systems.

Collaborate closely with data scientists and business analysts to design data models and

pipelines that support advanced analytics and machine learning projects.

Qualifications:

Bachelors or Masters degree in Computer Science, Engineering, Information

Technology, or related field.

Minimum of 5 years of experience as a Data Engineer with extensive expertise in AWS,

and PySpark.

Deep knowledge of SQL and experience with data warehouse design and optimization.

Strong understanding of AWS services and how they integrate with Databricks and other

data engineering tools.

Demonstrated ability to design, build, and maintain end-to-end data pipelines.

Excellent problem-solving abilities, with a track record of implementing complex data

solutions.

Nice to Have ::

Experience in managing and automating workflows using Apache Airflow.

Familiarity with Python, Snowflake, and CI/CD processes using GitHub.

Strong communication skills for effective collaboration across technical teams and

stakeholders.

Keywords: continuous integration continuous deployment sthree
PySpark Developer with Data Bricks
[email protected]

[email protected]
View all

Wed Jul 03 23:31:00 UTC 2024

To remove this job post send "job_kill 1534540" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

shashi@cloudingest.com wrote:
From:

SHASHI,

Cloudingest

shashi@cloudingest.com

Reply to:   shashi@cloudingest.com

PySpark Developer with Data Bricks

Remote

Job Description ::

Develop and optimize data processing jobs using PySpark to handle complex data

transformations and aggregations efficiently.

Design and implement robust data pipelines on the AWS platform, ensuring scalability

and efficiency(Databricks exposure will be an advantage)

Leverage AWS services such as EC2, S3, etc. for comprehensive data processing and

storage solutions.

Expertly manage SQL database schema design, query optimization, and performance

tuning to support data transformation and loading processes.

Design and maintain scalable and performant data warehouses, employing best practices

in data modeling and ETL processes.

Utilize modern data platforms for collaborative data science, integrating seamlessly with

various data sources and types.

Ensure high data quality and accessibility by maintaining optimal performance of

Databricks clusters and Spark jobs.

Develop and implement security measures, backup procedures, and disaster recovery

plans using AWS best practices.

Manage source code and automate deployment using GitHub along with CI/CD practices

tailored for data operations in cloud environments.

Provide expertise in troubleshooting and optimizing PySpark scripts, Databricks

notebooks, SQL queries, and Airflow DAGs.

Keep abreast of latest developments in cloud data technologies and advocate for the

adoption of new tools and practices that can benefit the team.

Use Apache Airflow to orchestrate and automate data workflows, ensuring timely and

reliable execution of data jobs across various data sources and systems.

Collaborate closely with data scientists and business analysts to design data models and

pipelines that support advanced analytics and machine learning projects.

Qualifications:

Bachelors or Masters degree in Computer Science, Engineering, Information

Technology, or related field.

Minimum of 5 years of experience as a Data Engineer with extensive expertise in AWS,

and PySpark.

Deep knowledge of SQL and experience with data warehouse design and optimization.

Strong understanding of AWS services and how they integrate with Databricks and other

data engineering tools.

Demonstrated ability to design, build, and maintain end-to-end data pipelines.

Excellent problem-solving abilities, with a track record of implementing complex data

solutions.

Nice to Have ::

Experience in managing and automating workflows using Apache Airflow.

Familiarity with Python, Snowflake, and CI/CD processes using GitHub.

Strong communication skills for effective collaboration across technical teams and

stakeholders.

Keywords: continuous integration continuous deployment sthree 
PySpark Developer with Data Bricks
shashi@cloudingest.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 0

Location: ,