Home

Data Engineer | Onsite at Remote, Remote, USA
Email: [email protected]
Need only 10+ yrs experience

Passport Number must | No GC & USC

Data
Engineer (Day 1 onsite)

Austin,
TX

Must
to have skills

Python

Pyspark

SQL

Data Engineering

Big Data

Job
Description

We're
seeking a Data Engineer to take the lead in implementing and scaling data

collection,
storage, processing, and filtering for fine-tuning large language models (LLMs)
within

Conversational
Engineering. These data pipelines are crucial for powering our cutting-edge

research,
safety systems, and product development. If you're passionate about working
with

data
and are eager to create solutions that directly impact the advancement of LLMs,
we'd love

to
hear from you. This role provides the exciting opportunity to collaborate
closely with applied

ML
engineers, software engineers, and data scientists that create our AI systems
today.

In
this role, you will:

Design,
build, and manage scalable data pipelines for collecting, storing, processing,
and

filtering
large volumes of text data for fine-tuning LLMs.

Develop
and optimize data storage architectures to handle the massive scale of data

required
for training state-of-the-art language models.

Implement
efficient data preprocessing, cleaning, and feature extraction techniques to

ensure
high-quality data for model training.

Collaborate
with machine learning engineers and researchers to understand their data

requirements
and provide tailored solutions for LLM fine-tuning.

Design
and implement robust and fault-tolerant systems for data ingestion, processing,

and
delivery.

Optimize
data pipelines for performance, scalability, and cost-efficiency, leveraging

distributed
computing frameworks and cloud platforms.

Ensure
the security, privacy, and compliance of data according to industry best
practices

and
regulatory requirements.

You
might thrive in this role if you:

Have
7+ years of experience as a data engineer, with a strong background in
designing

and
building large-scale data pipelines.

Possess
deep expertise in distributed computing frameworks such as Apache Spark,

Hadoop,
or Flink, and have hands-on experience optimizing data processing at scale.

Are
proficient in programming languages commonly used in data engineering, such as

Python,
and have a solid understanding of data structures and algorithms.

Have
extensive experience with cloud platforms like AWS, Google Cloud, or Azure for

data
storage, processing, and management.

Are
well-versed in various data storage technologies, including distributed file
systems

(e.g.,
HDFS, S3), databases (e.g., Cassandra, HBase), and data warehouses (e.g.,

Redshift,
BigQuery).

Have
hands-on experience with ETL orchestration tools such as Apache Airflow,
Dagster,

or
Prefect for managing complex data workflows.

Possess
knowledge of natural language processing (NLP) techniques and have worked

with
text data preprocessing, normalization, and feature extraction.

Are
passionate about staying up-to-date with the latest advancements in data

engineering
and NLP, and are eager to apply innovative techniques to solve challenging

problems.

Have
strong problem-solving skills, are detail-oriented, and can effectively
communicate

technical
concepts to both technical and non-technical stakeholders.

--

Keywords: artificial intelligence machine learning sthree information technology green card Texas
Data Engineer | Onsite
[email protected]
[email protected]
View all
Thu Jun 13 00:34:00 UTC 2024

To remove this job post send "job_kill 1477251" as subject from [email protected] to [email protected]. Do not write anything extra in the subject line as this is a automatic system which will not work otherwise.


Your reply to [email protected] -
To       

Subject   
Message -

Your email id:

Captcha Image:
Captcha Code:


Pages not loading, taking too much time to load, server timeout or unavailable, or any other issues please contact admin at [email protected]
Time Taken: 3

Location: ,