Beyond Buttons: Using AI Agents To Augment Web Apps

Guillaume PiersonJanuary 31, 2025

#ai#js#ux

How can we integrate the new capabilities offered by LLMs into existing web applications? Most products either add buttons that trigger AI agents or transform the UI into a chatbot. We've explored both approaches in previous articles (see Using AI To Autofill Forms With Wikipedia and Speech-to-Text Web App) but they have limitations. For complex tasks, the button interface is too rigid, and the text interface of chatbots is too limited.

What if we could use AI agents to help the user achieve complex tasks on a web app without changing the UI? This article explores this idea and shows how to integrate AI agents into a web app using a conversational approach.

The following video illustrates this idea with a text-based agent filling and submitting a form based on a conversation with the user:

This article shows how to implement a similar agent to fill out a contact form. I went one step further and replaced the text input with a voice interaction. It makes the AI agent more discreet and user-friendly. This could be particularly useful for people with difficulty typing or for developing voice-controlled applications.

A Classic Web App

We will first create a simple contact form that the user can fill out by hand. For simplicity, we'll use plain HTML and JavaScript.

<form>
    <label for="first_name">First Name: <input type="text" id="first_name" name="first_name" /></label>
    <label for="last_name">Last Name: <input type="text" id="last_name" name="last_name" /></label>
    <label for="email">Email: <input type="email" id="email" name="email" /></label>
    <label for="age">Age: <input type="number" id="age" name="age" /></label>
    <label for="phone_number">Phone Number: <input type="tel" id="phone_number" name="phone_number" /></label>
    <label for="address">Address: <input type="text" id="address" name="address" /></label>
    <label for="city">City: </label><input type="text" id="city" name="city" /></label>
    <label for="zip_code">Zip Code: <input type="text" id="zip_code" name="zip_code" /></label>
    <label for="country">Country: <input type="text" id="country" name="country" /></label>
    <label for="free_form">Free Form:  <textarea id="free_form" name="free_form"></textarea></label>
</form>

Nothing fancy here—just a simple form with a few fields.

Tip: You can find the complete implementation on GitHub: marmelab/fill-form-with-ai.

Adding A Conversational UI

Next, we add a text input where users can ask the AI for assistance:

<div id="messages"></div>
<textarea id="prompt" row="3"> </textarea>
<button type="button" id="askModel">Submit</button>

The div with the ID messages will be used to display the conversation with the AI. The textarea with the ID prompt is where users can type their requests.

We'll add an event listener to the submit button to call the AI agent.

document.getElementById("askModel").onclick = async () => {
    const promptInput = document.getElementById('prompt') as HTMLInputElement;
    const prompt = promptInput.value;

    if (prompt === '') {
        return;
    }

    promptInput.value = '';
    await handlePrompt(prompt);
};

Creating A Web Agent

To handle the user's requests, we need an AI agent.

We must first initialize the OpenAI client and create a thread to store the conversation with the AI.

const client = new OpenAI({
    apiKey: import.meta.env.VITE_OPENAI_API_KEY,
    // For the sake of simplicity the AI assistant are run client side, but your shouldn't expose your API key
    dangerouslyAllowBrowser: true,
});
const thread = await client.beta.threads.create();

Warning: The app uses the OpenAI API key directly in the front end. However, this approach is not recommended for production environments. Instead, you should use a back end to handle requests to the API securely.

Next, we will create the agent using OpenAI's Assistant API (currently in beta). An assistant is a customized AI model with a specific prompt and tools to perform actions.

const inputsName = [
    'first_name',
    'last_name',
    'email',
    'age',
    'phone_number',
    'address',
    'city',
    'zip_code',
    'country',
    'free_form',
];

const assistant = await client.beta.assistants.create({
    model: 'gpt-4o-mini',
    instructions: `You are a form assistant. Interpret the question and call the functions to fill the form. The form has the following fields: ${inputsName.join(
        ', ',
    )}.`,
    tools: [
        {
            type: 'function',
            function: {
                name: 'fillForm',
                description: 'Fill the form with the given data.',
                parameters: {
                    type: 'object',
                    // We give all fields available in the form
                    properties: Object.fromEntries(
                        inputsName.map(name => [name, { type: 'string' }]),
                    ),
                },
            },
        },
    ],
});

The assistant is created using the gpt-4o-mini model. You can also use newer models or ones with a larger context window for potentially better results.

We allow the assistant to interact with the form via function calling. In this case, we define one function capable of filling the form based on the conversation, fillForm, which we'll see later.

Handling the User Request

Now, let's look at the handlePrompt function, which handles the user's request.

This function hooks the form and the agent. It sends the request to the AI, displays the response, and handles any actions requested by the AI.

let allMessages: OpenAI.Beta.Threads.Messages.Message[] = [];

export const handlePrompt = async (prompt: string) => {
    const content = prompt.trim();
    if (content === '') return;
    // Disable prompt and show loading state
    disableInteractions();
    // Create a message in the thread
    const message = await client.beta.threads.messages.create(thread.id, {
        role: 'user',
        content,
    });
    // Add the message to the list
    allMessages.push(message);
    // Render the conversation again
    renderConversation(allMessages);

    // Create a run with the assistant
    // This will query the OpenAI API and return the messages
    const run = await client.beta.threads.runs.createAndPoll(thread.id, {
        assistant_id: assistant.id,
    });
    // Process the response from the assistant
    const messages = await handleRunStatus(run);
    if (!messages) return;
    // Make sure the last message received will be at the end
    allMessages = messages.reverse();
    // Render the conversation again
    renderConversation(allMessages);
    // Reenable prompt, we got the messages from the IA
    enableInteractions();
};

Handling The Agent's Response

The heart of the agent is the handleRunStatus function. It processes the response from the AI and calls the appropriate functions to handle the actions requested by the AI.

const handleRunStatus = async (run: OpenAI.Beta.Threads.Runs.Run) => {
    // Check if the run is completed
    if (run.status === 'completed') {
        const messages = await client.beta.threads.messages.list(thread.id);
        return messages.data;
    }
    if (run.status === 'requires_action') {
        return await handleRequiresAction(run);
    }
    console.error('Run did not complete:', run);
};

If the assistant responds with an action, the run will look like the following:

{
    "status": "requires_action",
    "required_action": {
        "submit_tool_outputs": {
            "tool_calls": [
                {
                    "id": "tool_call_0",
                    "function": {
                        "name": "fillForm",
                        "arguments": "{\"first_name\":\"John\",\"last_name\":\"Doe\"}"
                    }
                }
            ]
        }
    }
}

In this case, the handleRequiresAction function will check if the requested function is fillForm and execute it to populate the form accordingly.

const handleRequiresAction = async (
    run: OpenAI.Beta.Threads.Runs.Run,
): Promise<OpenAI.Beta.Threads.Messages.Message[] | undefined> => {
    // Check if there are tools that require outputs
    if (run.required_action?.submit_tool_outputs?.tool_calls) {
        // Loop through each tool in the required action section
        const toolOutputs = run.required_action.submit_tool_outputs.tool_calls.map(
            tool => {
                if (tool.function.name === 'fillForm') {
                    return fillForm(tool);
                }
                console.error('Unknown function:', tool.function.name);
                return {
                    tool_call_id: tool.id,
                    output: `Unknown function: ${tool.function.name}`,
                };
            },
        );

        const runToSubmit =
            toolOutputs.length > 0
                ? await client.beta.threads.runs.submitToolOutputsAndPoll(
                      thread.id,
                      run.id,
                      { tool_outputs: toolOutputs },
                  )
                : run;

        // Check status after submitting tool outputs
        return handleRunStatus(runToSubmit);
    }
};

Interacting With The Form

Let's look at the fillForm function, which fills the form based on the AI's request.

const fillForm = (
    tool: OpenAI.Beta.Threads.Runs.RequiredActionFunctionToolCall,
) => {
    const parameters = JSON.parse(tool.function.arguments) as Record<string, string>;
    const functionCallsOutput = Object.entries(parameters)
        .map(([param, value]) => {
            if (!inputsName.includes(param)) {
                return `Unknown parameter: ${param}`;
            }
            if (value == null || value === '') {
                return '';
            }
            const hasFilledFomInput = fillFormInput(param, value);
            return hasFilledFomInput ?
                `Filled form input: ${param}` :
                `Failed to fill form input: ${param}. No input found.`;            
        })
        .join('\n');
    return {
        tool_call_id: tool.id,
        output: functionCallsOutput,
    };
};

For each parameter, the fillForm function calls fillFormInput.

const fillFormInput = (inputName: string, value: string) => {
	const input = document.querySelector(
		`input[name="${inputName}"], textarea[name="${inputName}"], select[name="${inputName}"]`,
	) as HTMLInputElement;
	if (!input) {
        // Skip if the input is not found
		return false;
	}
	input.value = value;
	return true;
};

Rendering The Responses

The final part is the renderConversation function, called by handlePrompt. It is responsible for rendering the messages to and from the AI. In our example, it simply renders each message in a paragraph element.

export const renderConversation = (
    messages: OpenAI.Beta.Threads.Messages.Message[],
) => {
    const messagesContainer = document.getElementById('messages')!;
    messagesContainer.innerHTML = '';
    for (const message of messages) {
        const messageElement = document.createElement('div');
        messageElement.classList.add(
            message.role === 'assistant' ? 'assistant' : 'user',
        );
        for (const content of message.content) {
            if (content.type !== 'text') {
                console.error('Unsupported content type:', content.type);
                continue;
            }
            const contentElement = document.createElement('p');
            contentElement.textContent = content.text.value;
            messageElement.appendChild(contentElement);
        }
        messagesContainer.appendChild(messageElement);
    }
};

Now, everything is set up to let the user chat with the assistant and fill out the form.

One Step Further: Voice Interaction

Now that we have an agent who can fill out the form upon user request, we can take it a step further by controlling the agent via voice interaction.

For this, we'll use the Web Speech API, which enables speech recognition and synthesis capabilities directly in the browser.

This is a key difference with a previous article, Adding Voice Recognition To A Web App. Unlike that approach, we do not rely on external services like the OpenAI Speech Recognition API. Instead, we use the built-in browser API.

However, not all browsers support speech recognition, so it's important to check for compatibility before using it.

<button type="button" id="speech">Speech to form</button>
<img id="speechIcon" src="/microphone.svg" alt="microphone" class="invisible" />

Here is the JS code to handle the speech recognition:

const SpeechRecognition = // @ts-ignore types does not exist yet
    window.SpeechRecognition || window.webkitSpeechRecognition;

if (SpeechRecognition) {
    const recognition = new SpeechRecognition();
    recognition.lang = 'fr-FR';
    recognition.maxAlternatives = 1;

    let listening = false;

    recognition.onresult = (event: any) => {
        const speechMessage = event.results[0][0].transcript;
        handlePrompt(speechMessage);
    };

    recognition.onerror = (event: any) => {
        console.error('Speech recognition error', event);
        markMicAsNotListening();
        listening = false;
    };

    recognition.onspeechend = () => {
        recognition.stop();
        markMicAsNotListening();
        listening = false;
    };

    document.getElementById('speech')!.onclick = () => {
        if (listening) {
            recognition.stop();
            markMicAsNotListening();
            listening = false;
            return;
        }
        markMicAsListening();
        recognition.start();
        listening = true;
    };
} else {
    document.getElementById('speech')!.style.display = 'none';
}

On Chrome, voice recognition is processed on Google servers, so an internet connection is required.

Once the user stops speaking, the browser calls recognition.onresult, which in turn calls the handlePrompt function with the recognized speech as an argument. The user message and the AI's response will be displayed in the conversation.

In my tests, the stop detection isn't perfect—it may stop too quickly at times.

And that's it! You can now fill out the form using your voice.

Conclusion

This article demonstrates how to integrate AI agents into a web app using a conversational approach. While still in its early stages, we can use AI to help users achieve common tasks on existing web applications.

The AI can assist by fetching data, filling out forms, and submitting them. It can also answer users' questions, making interactions more dynamic and user-friendly.

Although it's a promising start, there's still much work to be done to make it practical. For instance, Web Speech Recognition could be improved—at least on Linux. I'm using Ubuntu as my daily work environment, and the experience is not seamless.

A significant drawback is that the AI can make mistakes and is slow to respond. In the demo, it took about a minute to complete the filling out of the form, which is unacceptable for production use. Nonetheless, it’s an encouraging beginning.

Did you like this article? Share it!