Building for large systems and long-running background jobs.
Credit: Ilias Chebbi on UnsplashMonths ago, I assumed the role that required building infrastructure for media(audio) streaming. But beyond serving audio as streamable chunks, there were long-running media processing jobs and an extensive RAG pipeline that catered to transcription, transcoding, embedding, and sequential media updates. Building an MVP with a production mindset had us reiterate till we achieved a seamless system. Our approach has been one where we integrated features and the underlying stack of priorities.
Over the course of building, each iteration came as a response to immediate and often “encompassing” need. Initial concern was queuing jobs, which readily sufficed with Redis; we simply fired and forgot. Bull MQ in the NEST JS framework gave us an even better control over retries, backlogs, and the dead-letter queue. Locally and with a few payloads in production, we got the media flow right. We were soon burdened by the weight of Observability:
Logs → Record of jobs (requests, responses, errors).
Metrics → How much / how often these jobs run, fail, complete, etc.
Traces → The path a job took across services (functions/methods called within the flow path).
You can solve some of these by designing APIs and building a custom dashboard to plug them into, but the problem of scalability will suffice. And in fact, we did design the APIs.
The challenge of managing complex, long-running backend workflows, where failures must be recoverable, and state must be durable, Inngest became our architectural salvation. It fundamentally reframed our approach: each long-running background job becomes a background function, triggered by a specific event.
For instance, an Transcription.request event will trigger a TranscribeAudio function. This function might contain step-runs for: fetch_audio_metadata, deepgram_transcribe, parse_save_trasncription, and notify_user.
The core durability primitive is the step-runs. A background function is internally broken down into these step-runs, each containing a minimal, atomic block of logic.
Inngest function abstract:
import { inngest } from 'inngest-client';
export const createMyFunction = (dependencies) => {
return inngest.createFunction(
{
id: 'my-function',
name: 'My Example Function',
retries: 3, // retry the entire run on failure
concurrency: { limit: 5 },
onFailure: async ({ event, error, step }) => {
// handle errors here
await step.run('handle-error', async () => {
console.error('Error processing event:', error);
});
},
},
{ event: 'my/event.triggered' },
async ({ event, step }) => {
const { payload } = event.data;
// Step 1: Define first step
const step1Result = await step.run('step-1', async () => {
// logic for step 1
return `Processed ${payload}`;
});
// Step 2: Define second step
const step2Result = await step.run('step-2', async () => {
// logic for step 2
return step1Result + ' -> step 2';
});
// Step N: Continue as needed
await step.run('final-step', async () => {
// finalization logic
console.log('Finished processing:', step2Result);
});
return { success: true };
},
);
};
The event-driven model of Inngest provides granular insight into every workflow execution:
The caveat to relying on pure event processing is that while Inngest efficiently queues function executions, the events themselves are not internally queued in a traditional messaging broker sense. This absence of an explicit event queue can be problematic in high-traffic scenarios due to potential race conditions or dropped events if the ingestion endpoint is overwhelmed.
To address this and enforce strict event durability, we implemented a dedicated queuing system as a buffer.
AWS Simple Queue System (SQS) was the system of choice (though any robust queuing system is doable), given our existing infrastructure on AWS. We architected a two-queue system: a Main Queue and a Dead Letter Queue (DLQ).
We established an Elastic Beanstalk (EB) Worker Environment specifically configured to consume messages directly from the Main Queue. If a message in the Main Queue fails to be processed by the EB Worker a set number of times, the Main Queue automatically moves the failed message to the dedicated DLQ. This ensures no event is lost permanently if it fails to trigger or be picked up by Inngest. This worker environment differs from a standard EB web server environment, as its sole responsibility is message consumption and processing (in this case, forwarding the consumed message to the Inngest API endpoint).
An understated and rather pertinent part of building enterprise-scale infrastructure is that it consumes resources, and they are long-running. Microservices architecture provides scalability per service. Storage, RAM, and timeouts of resources will come into play. Our specification for AWS instance type, for example, moved quickly from t3.micro to t3.small, and is now pegged at t3.medium. For long-running, CPU-intensive background jobs, horizontal scaling with tiny instances fails because the bottleneck is the time it takes to process a single job, not the volume of new jobs entering the queue.
Jobs or functions like transcoding, embedding are typically CPU-bound and Memory-bound. CPU-bound because they require sustained, intense CPU usage, and Memory-Bound because they often require substantial RAM to load large models or handle large files or payloads efficiently.
Ultimately, this augmented architecture, placing the durability of SQS and the controlled execution of an EB Worker environment directly upstream of the Inngest API, provided essential resiliency. We achieved strict event ownership, eliminated race conditions during traffic spikes, and gained a non-volatile dead letter mechanism. We leveraged Inngest for its workflow orchestration and debugging capabilities, while relying on AWS primitives for maximum message throughput and durability. The resulting system is not only scalable but highly auditable, successfully translating complex, long-running backend jobs into secure, observable, and failure-tolerant micro-steps.
Building Spotify for Sermons. was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.


