Job Details

Home

Data Engineer|Austin, TX (Day 1 onsite) | at Day, New York, USA

Email: [email protected]

Data Engineer (Day 1 onsite)

Austin, TX

Must to have skills

Python

Pyspark

SQL

Data Engineering

Big Data

Job Description

We're seeking a Data Engineer to
take the lead in implementing and scaling data

collection, storage, processing,
and filtering for fine-tuning large language models (LLMs) within

Conversational Engineering. These
data pipelines are crucial for powering our cutting-edge

research, safety systems, and
product development. If you're passionate about working with

data and are eager to create
solutions that directly impact the advancement of LLMs, we'd love

to hear from you. This role
provides the exciting opportunity to collaborate closely with applied

ML engineers, software engineers,
and data scientists that create our AI systems today.

In this role, you will:

Design, build, and manage scalable
data pipelines for collecting, storing, processing, and

filtering large volumes of text
data for fine-tuning LLMs.

Develop and optimize data storage
architectures to handle the massive scale of data

required for training
state-of-the-art language models.

Implement efficient data
preprocessing, cleaning, and feature extraction techniques to

ensure high-quality data for model
training.

Collaborate with machine learning
engineers and researchers to understand their data

requirements and provide tailored
solutions for LLM fine-tuning.

Design and implement robust and
fault-tolerant systems for data ingestion, processing,

and delivery.

Optimize data pipelines for
performance, scalability, and cost-efficiency, leveraging

distributed computing frameworks
and cloud platforms.

Ensure the security, privacy, and
compliance of data according to industry best practices

and regulatory requirements.

You might thrive in this role if
you:

Have 7+ years of experience as a
data engineer, with a strong background in designing

and building large-scale data
pipelines.

Possess deep expertise in
distributed computing frameworks such as Apache Spark,

Hadoop, or Flink, and have hands-on
experience optimizing data processing at scale.

Are proficient in programming
languages commonly used in data engineering, such as

Python, and have a solid
understanding of data structures and algorithms.

Have extensive experience with
cloud platforms like AWS, Google Cloud, or Azure for

data storage, processing, and
management.

Are well-versed in various data
storage technologies, including distributed file systems

(e.g., HDFS, S3), databases (e.g.,
Cassandra, HBase), and data warehouses (e.g.,

Redshift, BigQuery).

Have hands-on experience with ETL
orchestration tools such as Apache Airflow, Dagster,

or Prefect for managing complex
data workflows.

Possess knowledge of natural
language processing (NLP) techniques and have worked

with text data preprocessing,
normalization, and feature extraction.

Are passionate about staying
up-to-date with the latest advancements in data

engineering and NLP, and are eager
to apply innovative techniques to solve challenging

problems.

Have strong problem-solving skills,
are detail-oriented, and can effectively communicate

technical concepts to both
technical and non-technical stakeholders.

Thanks & Regards,

Irfan Shaik

P : 972-440-0069

Cell No: 647-375-2228

Agile Enterprise
Solutions Inc.

2591 Dallas Parkway,Suite 300, Frisco,TX 75034.

Email:

[email protected]

Website:

www.aesinc.us.com

--

Keywords: artificial intelligence machine learning sthree information technology California Texas
Data Engineer|Austin, TX (Day 1 onsite) |
[email protected]

[email protected]
View all

Thu Jun 13 00:34:00 UTC 2024

To remove this job post send "job_kill 1477245" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.

Your reply to [email protected] -

To

Subject
Message -

pathanshaik1407@gmail.com wrote:
Data Engineer (Day 1 onsite)

Austin, TX

Must to have skills

Python

Pyspark

SQL

Data Engineering

Big Data

Job Description

We're seeking a Data Engineer to
take the lead in implementing and scaling data

collection, storage, processing,
and filtering for fine-tuning large language models (LLMs) within

Conversational Engineering. These
data pipelines are crucial for powering our cutting-edge

research, safety systems, and
product development. If you're passionate about working with

data and are eager to create
solutions that directly impact the advancement of LLMs, we'd love

to hear from you. This role
provides the exciting opportunity to collaborate closely with applied

ML engineers, software engineers,
and data scientists that create our AI systems today.

In this role, you will:

Design, build, and manage scalable
data pipelines for collecting, storing, processing, and

filtering large volumes of text
data for fine-tuning LLMs.

Develop and optimize data storage
architectures to handle the massive scale of data

required for training
state-of-the-art language models.

Implement efficient data
preprocessing, cleaning, and feature extraction techniques to

ensure high-quality data for model
training.

Collaborate with machine learning
engineers and researchers to understand their data

requirements and provide tailored
solutions for LLM fine-tuning.

Design and implement robust and
fault-tolerant systems for data ingestion, processing,

and delivery.

Optimize data pipelines for
performance, scalability, and cost-efficiency, leveraging

distributed computing frameworks
and cloud platforms.

Ensure the security, privacy, and
compliance of data according to industry best practices

and regulatory requirements.

You might thrive in this role if
you:

Have 7+ years of experience as a
data engineer, with a strong background in designing

and building large-scale data
pipelines.

Possess deep expertise in
distributed computing frameworks such as Apache Spark,

Hadoop, or Flink, and have hands-on
experience optimizing data processing at scale.

Are proficient in programming
languages commonly used in data engineering, such as

Python, and have a solid
understanding of data structures and algorithms.

Have extensive experience with
cloud platforms like AWS, Google Cloud, or Azure for

data storage, processing, and
management.

Are well-versed in various data
storage technologies, including distributed file systems

(e.g., HDFS, S3), databases (e.g.,
Cassandra, HBase), and data warehouses (e.g.,

Redshift, BigQuery).

Have hands-on experience with ETL
orchestration tools such as Apache Airflow, Dagster,

or Prefect for managing complex
data workflows.

Possess knowledge of natural
language processing (NLP) techniques and have worked

with text data preprocessing,
normalization, and feature extraction.

Are passionate about staying
up-to-date with the latest advancements in data

engineering and NLP, and are eager
to apply innovative techniques to solve challenging

problems.

Have strong problem-solving skills,
are detail-oriented, and can effectively communicate

technical concepts to both
technical and non-technical stakeholders.

Thanks & Regards,

Irfan Shaik

P : 972-440-0069

Cell No: 647-375-2228

Agile Enterprise
Solutions Inc.

2591 Dallas Parkway,Suite 300, Frisco,TX 75034.

Email:

irfan_shaik@aesincus.com

Website:

www.aesinc.us.com

Keywords: artificial intelligence machine learning sthree information technology California Texas 
Data Engineer|Austin, TX (Day 1 onsite) |
pathanshaik1407@gmail.com

Your email id:

Captcha Image:

Captcha Code:

Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 74

Location: , Texas