Improving Data Categorization in Marketo Engage Using Fine-Tuned AI Models

Last update: Tue Sep 02 2025 00:00:00 GMT+0000 (Coordinated Universal Time)

CREATED FOR:

Beginner
Intermediate
Experienced
User

As a Revenue Ops professional, you may be struggling with SPAM form submissions, keyword matching in job titles to determine personas, or messy open-text fields that make it hard to extract insights from your data. These data categorization challenges hinder segmentation, personalization, and reporting, preventing your team from leveraging your data and making it difficult to send tailored content to your audience.

Explore how fine-tuned Large Language Models (LLMs) can help address these persistent data problems. Learn how custom-trained models can significantly boost the accuracy of SPAM filtering, automate persona classification, and intelligently categorize unstructured inputs, and be confident about bringing AI into Marketo Engage.

You will learn about,

Real-world use cases where AI meaningfully improves data categorization in Marketo Engage.
How to fine-tune an LLM using your own data (featuring OpenAI as an example).
Using the Fine-Tuned model in Marketo Engage via Webhooks.

Transcript

Hi, everyone. Welcome to todayâ€™s presentation on improving data categorization in Marketo Engage using fine-tuned AI models. My name is Tyran Pretorius. And today Iâ€™m going to be walking you through three use cases for fine-tuned models in Marketo Engage, how you can detect spam form fills, how you can match job titles to personas, and how you can categorize open text fields. Iâ€™ll show you and inspire you why you should start using these three use cases in your instance. And then Iâ€™ll also show you how to set them up using webhooks to make the OpenAI requests, how you can train your data set for the OpenAI fine-tuning interface, how you can create the fine-tuned model, and then finally, Iâ€™ll speak to some of the limitations of webhooks and why you might want to use self-service flow steps to overcome these.

So as you can probably tell, thereâ€™s a bit of a discrepancy between my name and my accent. I was born in South Africa, race in Ireland, and I now live in San Diego, so Iâ€™ve bounced around quite a bit. I love volleyball, dislike surfing. And before I joined the marketing operations world, I was a mechanical engineer. I love problem solving and data analysis. So now Iâ€™m just doing that in the business domain instead of the engineering domain.

And Iâ€™m also a blogger in my free time at the Workflow Pro, where I talk about all the AI and Marketo projects that Iâ€™m working on. And thereâ€™s a picture of me presenting at Summit with my friend Lucas Mercado.

And now without further ado, letâ€™s dive into the first use case of how we can detect spam form submissions using OpenAI. So if any of you have used the in-built CAPTCHA integration with Marketo, you might have noticed that sometimes a clear spam form fill like this gets a very high CAPTCHA score, and it gets the trusted label.

And on the other hand, you might have genuine form submissions like this where you can clearly see that if someone interested in SIM cards for vehicle tracking, theyâ€™ve got a legit looking email address, website, and phone number, but theyâ€™re still being marked as suspicious by CAPTCHA and given a very low CAPTCHA score.

Where this causes issues downstream is if you use the CAPTCHA normalized score is not suspicious constraint on your triggers, then what this results in is that genuine person who is looking for SIM cards, they wonâ€™t make it through to the sales team because of their low CAPTCHA score. And on the other hand, that clear spam form submission with all the random characters and all the fields, that gets classified as genuine. So weâ€™ll make it through this trigger and into the rest of the flow of your smart campaign. So this will waste salespeopleâ€™s time for all these spam form submissions. And then youâ€™ve got genuine leads who arenâ€™t making it to sales, which can lead to lost revenue. So thatâ€™s why itâ€™s important that we start using OpenAI to help us detect these spam form submissions. And the thing I like the most is that with CAPTCHA, itâ€™s a bit of a black box. You donâ€™t really know why Google classified a lead as trusted or suspicious. Whereas when you define your own fine-tuned model, you specify the rules which determines whether someoneâ€™s going to be a spam form submission or genuine.

And you can also prompt it to give you an explanation why a lead was marked as genuine or a bot.

And Iâ€™ll show you how to set up the webhook later. But the main idea is that once your smart campaign is triggered, weâ€™ll call the form fill categorization webhook, and weâ€™ll map the output of this webhook. Youâ€™ll see thereâ€™s this field here. If this spam categorization field, which we map to the output of the webhook contains bot, then weâ€™re going to stop the person from progressing any further down the rest of the flow. So thatâ€™s how we can screen out all these spam form submissions from our Marketo smart campaigns once weâ€™ve used the OpenAI spam categorization fine-tuned model that weâ€™ve created. The next use case Iâ€™m going to talk about is persona matching based on job title. So if any of you have tried to do persona matching based on job title, when youâ€™re trying to do title contains certain keywords, youâ€™ve likely run into inadvertent matches where letâ€™s say youâ€™re trying to match on the chief operating officer persona, and you try and do job title contains COO, that can have inadvertent matches for things like cook and coordinator, which you obviously donâ€™t want being associated with. So youâ€™re trying to match the person associated with the chief operating officer persona.

And then what if someone enters a job title thatâ€™s not in English, if itâ€™s in Spanish or French, or what if they misspelled their job title, and they say chef operating officer instead of chief. These are all issues that exist with the current keyword matching on job title to try and get corresponding personas.

But these can all be solved by using AI. And the powerful thing is that AI is smart enough to know what job titles and job title acronyms correspond to a chief operating officer. So if someone misspells it, and they say chef operating officer, itâ€™s smart enough to know that that still maps to the chief operating officer persona.

And it can also handle any language. So that there is director de operations. I think thatâ€™s Spanish for obviously chief operating officer, itâ€™s still smart enough to know that that maps to your same CEO persona.

And the way our smart campaign flow would be, is that when anyone is created with a job title populated, or their job title changes, weâ€™re going to call the persona categorization webhook to make that request to open AI. And then once we get that persona value back, that persona field is mapped, which you can see in the flow here, that persona fields map to the output of the webhook. So when we get our persona back, we can maybe do some lead scoring based off of that and give people more points if theyâ€™re in the C suite versus if theyâ€™re an engineer.

And the powerful thing of building our fine tuned models on top of the broader base model that open AI offers, so letâ€™s say GPT4O building on top of that broader model means that even though our training data set might not contain misspellings, or it might not contain different languages, it is smart enough to extrapolate away from our training data set so that if there are different languages, or if there are misspellings, itâ€™s smart enough to still do the correct persona matching because weâ€™re building our fine tuned model and that fine tuned data set on top of the broader base model.

Now the third and final use case is open text field categorization. So at my company Telnex, we have this, how did you hear about us field and itâ€™s an open text field. So imagine for a second that it wasnâ€™t an open text field and it was just a drop down pick list. What youâ€™re really getting here is reinforcement for attribution sources that you already know about paid search, organic, youâ€™ve got social media, you might have like LLM bots now is another option. But theyâ€™re just reinforcing all the sources you already know about. Youâ€™re not really gaining new insights. Whereas if you leave this as an open text field, youâ€™ll be able to see new insights like YouTube influencers who might be referencing your products. There might be apps that are listing you as an integration partner, or there might be blogs linking to your website, you wouldnâ€™t be able to find out all these unique new sources if you had a drop down. So this is the power of having an open text field for something like this.

But then the challenge downstream is it makes it difficult to segment, personalize and report on your data. And it makes it difficult if you want to send tailored content to people based on all these attribution values. And youâ€™ll notice here how stratified each of the bars is in the bar chart. And thatâ€™s because of all the infinite values that people can type in this open text field. And to give a very concrete example, people will misspell Google, theyâ€™ll say Google, Goggle and Google. And although these should all match to the exact same value, we know these all should go to Google, they all appear as distinct lines on the bar chart here. So thatâ€™s just an example of how it can be difficult to look at this open text field data if you want to analyze it. And obviously, if youâ€™re getting less than 10 form fills a day, it might be possible for you to manually, manually review these fields, and then see insights and patterns. But if youâ€™ve got hundreds of form fills a day, youâ€™re obviously not going to have the bandwidth to manually go through all these open text field values to do the categorization. And thatâ€™s where AI comes in, it can do this categorization for us.

And as you can see here, Iâ€™ve just used the example where weâ€™re categorizing into a few different buckets like organic, referral and unknown. And as I mentioned before, AI is very powerful when it comes to fuzzy matching. So it can handle misspellings like it will know to put Google and goggle in the organic Google bucket. And it can also handle different languages. So Je vous est trouvÃ© en searchÃ© en lâ€™in. I took French for five years in high school, so hopefully Iâ€™m not too rusty there. But that basically means I found you while searching online. So that should also go in the organic bucket. So itâ€™s also smart enough to put that in the organic bucket. And here, just to keep it simple, Iâ€™ve only got it bucketing into these, you could say higher level buckets like organic, referral and unknown. But you can also prompt it if you want to give you a lead source detail. So the lead source could be organic, and then you could prompt it to give you a lead source detail of Google.

Okay, so now that Iâ€™ve shown you the three use cases, and itâ€™s hopefully inspired you to start thinking of ways that you could categorize your own data within Marketo, I want to take a step back and show you how we need to set up the webhook in Marketo so that we can leverage OpenAI to do this categorization for us. So this is what the webhook looks like here. Youâ€™ll see that in the URL field, we were specifying the OpenAI URL here. And thereâ€™s a champion blog post, which Iâ€™ll show you in the resources at the end of this presentation, where you can access this URL. And youâ€™ll also need it later on to access things like your OpenAI API key.

So donâ€™t worry about that. Thereâ€™s a blog post that shares all these resources with you.

And Iâ€™m going to go to the next slide because itâ€™s easier to look at the payload of the webhook when itâ€™s zoomed in like this. So the model value here, this is going to be the fine-tuned model ID, which weâ€™ll get after we create the model in OpenAI. The temperature ranges from zero to two. If you put zero in, then the output will be very robotic and deterministic. So if you put the same input in, youâ€™re very likely to get the same output time after time. But if you use a value of two, then itâ€™s going to be very creative and random. So if you put the same input in, youâ€™re going to get different outputs every time you run it.

Iâ€™ll speak to the max completion tokens on the next slide and what that enables us to do. But Iâ€™ll skip forward to the messages parameter here. So within messages, we specify the system prompt. And here weâ€™re just saying your job is to categorize given job title into one of the following personas. Weâ€™ve got C-suite, engineer, manager and other. And then we also give it the user value. And here thatâ€™s just going to be the job title, which weâ€™re bringing in using the lead token.

And OpenAI give you something called the tokenizer. Whenever you make a prompt to OpenAI or any large language model, it transforms the characters and the words you send it into tokens. So we can see here that C-suite uses up two tokens. And I put in the three other personas we had, like engineer, other and manager. All four of these personas, they only consume two tokens at max. So thatâ€™s why Iâ€™ve constrained the max tokens parameter here to be two because sometimes large language models can be a bit verbose and they give you back more than you want. So in order to prevent that, and to ensure it only gives me one of those four personas, Iâ€™m constraining the maximum output tokens to be two.

And then the last part of our webhook configuration is we need to set our API key here. So in that authorization header, where you see bearer and then the three Xs, youâ€™re going to replace those three Xs with your OpenAI API key. And as I mentioned before, the link to get your API key from OpenAI will be in that champion blog post.

And then at the bottom, youâ€™ll notice that weâ€™re mapping the Marketo persona field to a response attribute. And that response attribute choices zero message content looks a little bit complicated. But the only reason we need to do this is just because of the structure of the data we receive back from OpenAI. So youâ€™ll notice that it gives us a choices array. And then we want to get the first index of that array, so index zero, then we want to get the message parameter. And then within the message parameter, we want to get the content value. And thatâ€™s what actually contains the persona weâ€™re looking for. So this notation here is just allowing us to get the message parameter value we want from OpenAI.

So now we know how to set up our webhook. Weâ€™re going to move towards creating a training data set that weâ€™ll need to create the fine-tune model in OpenAI.

So the way weâ€™re going to do this is weâ€™re going to create a smart list. So for the use case where we want to categorize spam form fills, weâ€™re going to extract 30 days worth of contact sales data to train our model. So here Iâ€™m including all the form fields that are present in that form. Iâ€™ll speak to PII and data privacy concerns with LLMs a bit later on. But for now, know that if youâ€™re concerned about PII and sharing that with large language models, then you can remove the full name and email field, and then only send the company, phone, website, and additional information fields, because those alone should be enough for the large language model to detect whether the form fill was a spam submission or not.

So weâ€™re going to export this data from Arquetta. Weâ€™re going to download it as a CSV.

And then weâ€™re going to import it into a Google Sheet. So one thing to note is that the file weâ€™re going to upload to OpenAI later on, it needs to be a JSON lines file or a JSONL file for short.

And they say you need at least 10 examples in order to create a fine-tune model. 50 are recommended. And if youâ€™ve got near 100 examples, Iâ€™d recommend splitting it 80-20. So youâ€™ll have 80% in a training dataset, and then youâ€™ll have another 20% of your examples in a validation dataset.

And with the structure of your slides here, so the user value, in this case, this is going to be all the job titles that you brought in from Arquetta. And then the assistant value is going to be the desired output that youâ€™d like this job title to be matched to.

And itâ€™s important here when youâ€™re defining the assistant column, youâ€™ll manually have to go in and say for each job title, this is what I want the persona to be.

And you should be consistent here. So letâ€™s say, for example, youâ€™ve got a software manager. If you call that, and if you map that to the manager persona in one row, but then later on, you see the software manager again, and then you map that to the engineering persona, thatâ€™s going to confuse the AI model because itâ€™s inconsistent. In one place, youâ€™re saying it should map to the manager persona, and the other youâ€™re saying it should map to the engineering persona. So thatâ€™s going to confuse it. So be as consistent as possible here when youâ€™re going through and youâ€™re manually setting the assistant value for each row here.

And then the system prompt, this is going to be the exact same across all the rows in your spreadsheet. So you can kind of just fill it out once at the top and then drag it down.

And then weâ€™ll use a Google Sheets formula, which Iâ€™ll show you in a second when I hop out of the presentation, to join the system prompt, the user value, which in this case is the job title, and the assistant value, which in this case is the desired persona. Weâ€™re going to join those all together with a formula in the JSON field here. So weâ€™re going to form a JSON object. And then all these lines in the Google Sheet, which is the JSON object, thatâ€™s whatâ€™s going to form our JSONL file that we upload to OpenAI later on.

So Iâ€™m going to jump out of the presentation now to show you what this looks like in Google Sheets.

So once weâ€™re in Google Sheets here, youâ€™ll notice that weâ€™ve got a lot of gray columns. And these gray columns are essentially helper columns that are going to help us form the desired JSON object that we need to upload to OpenAI. So this is the sheet for the, how did you hear about us mapping to the desired source values. And it looks the exact same whether weâ€™re doing the job title mapping, or whether weâ€™re doing the spam form for the classification. The layout of the sheet is the exact same every time. So Iâ€™ll use the job title one since itâ€™s the simplest one as my example. So weâ€™ve got our system prompt here, which as I mentioned before, you can just drag down all the way for all the rows of your sheet, weâ€™ve got all our job titles here. And then we have to manually come in here and type out what assistant, we have to manually type out in the assistant column, what persona we want each of these job titles to match to. And then these helper columns A through D here, theyâ€™re concatenated using this concat function, along with the system user and assistant values to form the required JSON object that OpenAI needs.

And youâ€™ll notice here that Iâ€™m using something called, itâ€™s the two JSON safe function. This is a custom function, which Iâ€™ve created here, because sometimes there can be illegal characters present.

Particularly, letâ€™s say for example, in this form additional, information field, people can put new line characters in here, they can put double quotes, and those kind of illegal characters. Once we concatenate them together and put them here in the JSON object, they could cause it to break JSON syntax and would later on be rejected by OpenAI.

So this is a function that basically just replaces any of these illegal characters like backslashes, double quotes, new lines, carriage returns and tabs. It just replaces them so it wonâ€™t break JSON syntax. And we call that here when weâ€™re doing the concatenation. So weâ€™re calling it on the user value, weâ€™re calling it on the assistant value, and weâ€™re also calling it on the system prompt.

So that will ensure that this JSON object we create is valid.

And before we download our JSON lines file, is just copy all of these lines.

And then weâ€™re going to paste them into the sheet here, and then click this big validate button. And you can see it says the input is valid JSON lines format. So this is exactly what weâ€™re looking for. However, if there was some sort of issue in here, letâ€™s say for example, I just delete something here.

Itâ€™s now telling me invalid JSON on line nine.

So that means I have to go to line nine. So in this case, thatâ€™s going to be this one here. And then I know okay, itâ€™s line nine, which is correspondingly like row 10 in my spreadsheet. I know I need to look at this particular row to find the error. And to help us with that, we can copy it in here.

So obviously, this is correct, because I didnâ€™t make the same delivery. I made the same deletion I made in this browser in the Google Sheet. So this is still accurate. But if I, letâ€™s say delete a character here, the nice thing is, it flags for you where the error is occurring, and itâ€™ll give you an error message down here.

And if you donâ€™t know enough about JSON syntax to fix it, thatâ€™s where large language models are a godsend, you can copy and paste this JSON body here. And you can paste the error message into chat GPT, or you can use Claude or any large language model you want, like Grok offered by X, you can paste it in there, and it will help you fix the JSON object. And itâ€™ll actually, you could prompt it just to give you the fixed JSON object, then you copy that.

And then thatâ€™s what youâ€™d use here in column N.

And then the final part of this is once youâ€™ve validated that all of these JSON objects are in the correct format, you can copy them.

And then here, weâ€™re going to download these values as a JSON lines file. And we just click Download here. And then this is going to be the JSON line file that weâ€™re going to import into OpenAI in the next step. So Iâ€™m going to jump back into the presentation now and show you exactly how to do that.

OK, so I just showed you option one, which is using a Google Sheet to manually create the JSON file you need. I will also say that in the champion blog post, which I share in the resources, I share Python code that makes it much easier to create the desired JSON file format. All you need for the Python code is to give it the user column and the desired assistant column or the assistant values. So in this case, you can see here weâ€™re going to give it all the job titles in the user column and then all the corresponding personas in the assistant column. Thatâ€™s all it needs.

And then you just need to modify the system prompt if you want to. And then you just need to also modify the locations of all these files. So you can see here itâ€™s referencing my local downloads folder. Youâ€™d obviously want to change that to your own folder. And you can rename some of these files here. But once youâ€™ve done that, itâ€™s going to create the JSONL file for you in the location that you specify in the train output file and the validate output file. And itâ€™s also nice because if youâ€™ve got more than 100 examples, itâ€™ll automatically break your data set 80% in the training output file and 20% in the validate output file. So if youâ€™re competent with running code, I recommend that as the method to create your JSON lines file. And even if youâ€™re not, itâ€™s becoming easier and easier nowadays to use an LLM to guide you through running and executing code. So Iâ€™d explore this option if youâ€™re comfortable using LLMs for programming.

OK, so now weâ€™re getting to the, I donâ€™t know if you could classify this as the more exciting part, but maybe the most crucial part of the whole presentation, which is actually creating the fine-tuned model in OpenAI.

And before we talk about creating the model and I show you how to do that, I wanted to speak to their data retention policies and security practices. When you share data with OpenAI when youâ€™re using the API, it does not use your data to improve its models. And in Marketo, when weâ€™re making the webhook request to OpenAI, weâ€™re using its API. So in this case, whatever data we send it over the webhook is fine. It wonâ€™t use it to improve its models.

And when youâ€™re using chat-gbt in the browser, it may use the data for model improvement, but thereâ€™s an option to deactivate this in the settings. And as Iâ€™ll show you on the next slide, if you have a paid company account, thereâ€™ll always be a little blurb at the bottom of the chat interface saying that it doesnâ€™t use your companyâ€™s data for training its models.

Whether you use the API or the browser, OpenAI will always retain your data for 30 days just to make sure youâ€™re not violating any of their policies.

But this is my high-level overview. If you have any concerns, Iâ€™d recommend talking to the compliance officials in your company and seeking legal advice. And another best practice step is to anonymize the data wherever possible. Like I showed you earlier on, if you have a paid company account, you can remove the email address so itâ€™s not associated with a person.

And this is what I was referring to before. If you have a paid company account, youâ€™ll always see this blurb at the bottom which says, OpenAI doesnâ€™t use Telnetâ€™s official workspace data to train its models.

OK, so now weâ€™ve gotten all the legal stuff out of the way. Iâ€™ll show you what the interface looks like when youâ€™re creating a fine-tuned model. So the base model is the one you choose. In this case, Iâ€™ve just chosen ChatGPT 4.0. The suffix is, you can set this to whatever you want it to be, but make sure itâ€™s something that will easily correspond to what the fine-tuned model does. So in this case, itâ€™s persona matching. And then I just put the date. Iâ€™m creating the fine-tuned model. You can leave the seed blank. Its job is basically just to help increase the reproducibility of the outputs of the fine-tuned model.

But if you leave this blank, OpenAI handles it for you. So Iâ€™d recommend just leaving it blank. You can upload your training data file and then your validation data file. As I mentioned, you should have an 80-20 split between those. But Iâ€™d only worry about splitting it this way if youâ€™ve got about 100 examples.

And then leave the hyperparameters and then you can just go back and click on this. Youâ€™re very familiar with how LLMs work in the back end. And youâ€™re very familiar with what each of these does. Iâ€™d recommend just using the auto configuration here.

And then hit Create. And then go off and do something else because it usually takes a while for the fine-tuned model to generate. But they will email you once the model has been completed.

And then you can come in. And then youâ€™ll see that the output model ID, this is the value weâ€™re going to copy and weâ€™re going to use in our Marketo web hooks like I showed you earlier on. And then youâ€™ll also see the hyperparameters that were used here. So if you want to improve performance later on, you can try and tweak all of these values.

And when youâ€™re trying to decide if performance is better from one model compared to the other, what youâ€™re going to look at is this metrics section here. And you want these values to be as low as possible.

You want them to be very low decimals. So like 0029 is quite good, but Iâ€™ve seen better.

And you can also, if youâ€™re very curious, you can ask OpenAI what each of these values means and how to improve each one. And itâ€™ll give you recommendations for that. But the main guiding principle here is that the more examples you have, the better. The more consistent you are when youâ€™re defining a certain column like I showed you earlier on, the better the performance.

So bear those things in mind when youâ€™re trying to improve performance.

I mentioned web hooks as the main method for which weâ€™re going to make these requests to OpenAI. But there are two main limitations of this. The first is that OpenAI can sometimes take a while to return the results to Marketo. And in that case, we could run into a 30-second timeout limit because if it takes that long, Marketo thinks the web hook has failed. So then youâ€™ll never get the categorized data that you were looking for.

And then the second issue is in Marketo, we canâ€™t dig any deeper than the content parameter thatâ€™s returned from OpenAI. So even if OpenAI gives us a here source and a here source detail within the content parameter, unfortunately, we canâ€™t dig any deeper in Marketo. So this would have been nice for the how did you hear about us categorization where we could have a here source value like organic, and then we could have a here source detail value like Google.

Itâ€™d be really nice to get that extra level of detail. But unfortunately, with web hooks, itâ€™s not possible to dig in and get this. So the solution I recommend if youâ€™re running into either these two issues, the 30-second timeout, or if you want to be able to map multiple values from OpenAI to multiple Marketo fields, then I recommend using self-service flow steps to get around these issues.

Okay, so what do I want you to take away after looking at todayâ€™s presentation? Hopefully, youâ€™ve been inspired by the three use cases I showed you today. And theyâ€™ve started you thinking about different challenges youâ€™re having with data categorization at your organization. So Iâ€™d like you to pick one of them, and then export the relevant data from Marketo Engage using a smart list like I showed you before. Then I want you to prepare your data for fine tuning using either the Google Sheet approach, like I demonstrated, or using the Python script thatâ€™s shared in the Champion blog. Then youâ€™re going to go to OpenAI and create your fine tune model. Youâ€™re going to get that output model ID. And then thatâ€™s what youâ€™re going to use in your Marketo Engage webhook or your self-service flow step. And then finally, youâ€™re going to call this webhook in your smart campaign to categorize your data and finally start reaping the benefits of AI for data categorization. So thatâ€™s everything I want you to take away from todayâ€™s presentation. Thank you for your attention today. I hope this was beneficial for you. And Iâ€™m now happy to answer any questions that you might have.

Thanks so much, Tyran. So much to take away from that discussion. But before we move on to our next session, I want to give you the opportunity to answer some questions from the audience. So Iâ€™m going to ask you a few questions. And if youâ€™re ready, if youâ€™re in the chat, go ahead and submit some now because Tyranâ€™s going to answer them live. And then Tyran, Iâ€™ll read this to you, then you can answer them for the audience. Sure, sounds good.

Great. What sort of maintenance do you have to do to maintain these models that you created? Not a lot of maintenance. Once you set it up, itâ€™s continually run in the background. But if you ever do want to retrain the model, if you see some examples coming through and youâ€™re not happy with the output, you can take those examples, use them to retrain the model, and then youâ€™ll get a new URL for the fine-tune model and you can swap that in. So that would really be the only time youâ€™d need to update the webhook. Or like if ChatGPT just released a new model, like the five model last week and you want to update to that, that would be another example. But for the most part, itâ€™s pretty maintenance-free.

Great, great, great. And then what are your recommendations if youâ€™re trying to pull in like non-native language or non-English information into the model? And how does that language differentiate or update based on it coming in Japanese and you want to output it in English or anything like that? Yeah, I saw the question about the Japanese job titles. And Iâ€™d say thatâ€™s the power of AI, where itâ€™s not going to convert from Japanese to English and then try and do the persona matching. Itâ€™s smart enough to understand what the Japanese job title means and map it to the correct persona. So thatâ€™s the power of using AI for this job title matching, is itâ€™s smart enough to understand all the cultural context or whatever there is behind a job title and map it to the correct persona that you have.

Cool. Very great.

So we had a question for this, similarly for Josh, about using third-party data. And then how would you incorporate something like a ring lead for data categorization or fuzzy matching? And how do you use that with OpenAI to create superior or data quality improvements, anything like that? The nice thing about using OpenAI is youâ€™ve got a lot more control, Iâ€™d say, than if youâ€™re using ring lead, because with the fine-tune model that I showed, you can train it on examples to say, this is exactly what we saw in our Marketo instance, and this is what we want the response to be. And then you can obviously use that fine-tune model in your Marketo instance, so youâ€™re mapping job titles to persona in the exact same way that you want, and you have full control over how that mapping is done. So Iâ€™d say maybe thatâ€™s the advantage of using OpenAI, is youâ€™ve got full control over the prompt you give the AI and all the training examples that you give it.

Great. Great, great, great. All right, on to the next question. Talking about more of that persona matching, we were just discussing, can you have it use a combination of like job title, job role, and department? Absolutely, yep. In that Google Sheet example I showed you guys where youâ€™re bringing in all the values you exported from Marketo in one column, you just need to bring in those two extra fields of job role and department, and then when youâ€™re modifying the assistant column, also specify the desired output for all those job title, job role, and department combinations. So you just have to bring in the extra fields, basically, and then you can do that in two states, but then it will work right out of the bat, same as before.

Awesome. Great, great, great. The next one then is, Iâ€™m assuming that youâ€™ve created some sort of step-by-step guide people can follow to build this on their own, knowing you. Where can they get that information from you from? Yep, so that 2JsonSafe function, thereâ€™s a Marketo champion blog that should have just been released today on how to use fine-tuned models with Marketo engage, and in that blog post, I linked to GitHub, where I share the Google script you need for that 2JsonSafe function, so you can just copy it straight from there and put it into the Google Apps extension.

Awesome. Great, great, great. And then, speaking of JSON, how do you validate that JSON format when you send it through the webhook, so itâ€™s on the Forbes submissions when they come through? When youâ€™re configuring your webhook in Marketo, thereâ€™s an option called Request Token Encoding. And in that drop-down, you can select JSON. So then, if there are any illegal characters like I was talking about before, that will convert them to a JSON-safe version, so your webhook will always send successfully to OpenAI.

Sweet, sweet, sweet. Awesome.

Now, do you do any self-service flow steps with this, or is it all webhooks? Thatâ€™s one advantage of self-service flow steps is that sometimes OpenAI can take a long time to send the information back to Marketo. And if it takes longer than 30 seconds, Marketo will mark the webhook has failed. So if youâ€™re seeing that quite often, then thatâ€™s where you should switch to self-service flow steps because they circumnavigate that 30-second timeout limit. So if you send the payload to OpenAI, OpenAI can take as long as it wants to send back the response to the self-service flow step, so youâ€™ll never run into that timeout issue. And it also gives you a lot more flexibility, like if you want to map to multiple fields from the OpenAI response, like a here source and a here source detail for your attribution, you can only do that with a self-service flow step because with the Marketo webhook configuration, you can only map to a single field from the OpenAI response, but using a self-service flow step unlocks the ability to map the OpenAI response to multiple fields.

Got it, got it, got it. All right. Speaking of OpenAI, next question. How do you suggest someone get started? Obviously, you are not on a get started. Youâ€™ve been doing this for a long time, very much an expert. But letâ€™s say someone is green, theyâ€™re a novice in this. Where would you recommend they get started? Iâ€™m assuming you probably have some more content to share with them. I do, yeah. Thereâ€™s a blog post in the resources for this presentation, which is called Integrating Marketo with Chat GPT. And that walks through the basics of where do I get my OpenAI API key that I need to use when Iâ€™m configuring the Marketo webhook. And then it sets you up by walking you through how to create that webhook, reference the OpenAI API key, and it uses the example we went through today of categorizing your contact sales form fills. So thatâ€™s the place to start. It walks you through all the setup in your OpenAI account and then in Marketo to get started.

Great, great, great. Can you combine a couple of different data normalization operations using a single model, or do you recommend fine-tuning each model so that itâ€™s specific for each use case? I think specificity is usually better.

In order to answer this properly, I need more information on the state of normalization youâ€™re trying to do. But in general, if youâ€™re trying to categorize your attribution data like the here source, itâ€™s better to do that with a separate model than trying to combine it with one that does contact sales form fills or one that does the job titles. I think itâ€™s best to have a specific model for each one.

Got it, got it, got it. Now, have you utilized OpenAI to do any like spam identification so itâ€™s not passing bad fields over or any bad values over into your Marketo instance or over to your CRM? Yeah, itâ€™s similar to the contact sales example I walked through today where you could get it to validate all the fields that have been submitted. And then if you know there are certain things to look out for, you can prompt it to look for them and then kind of send back a flag to change it. I havenâ€™t done any specific use cases like that. I have done the one where Iâ€™m just trying to categorize in general is this a spam submission or a genuine one. But Iâ€™m glad someone asked the question because it seems like theyâ€™ve started to think of ways that they could use OpenAI for this sort of task. But yeah, thatâ€™s definitely something you can prompt it to look for when you do send it all your Marketo information. Like these are the red flags for certain fields. Please highlight these for us so we can take a look.

Got it, got it, got it. How long did it take you to like train your models using OpenAI to get this all ready to go? Yeah, thatâ€™s the most intensive part of the fine tune models is just creating that, getting all the examples basically of, these are the inputs, these are desired outputs. It just takes time to manually go row by row and do all of that.

So Iâ€™d say for some things Iâ€™ve worked on, it would take me maybe like an hour to two hours just to get all the data in the sheet. But then once you have the data, itâ€™s very easy to use the Python script or that Google Sheets process I showed in the video to upload it to OpenAI. And then it just takes a while to generate the model. So creating the model once you have the data set is very easy. But the most manually intensive and time consuming part is just manually saying for all these inputs, this is the desired output. Thatâ€™s the most time consuming part.

Got it, got it, got it. Well, Tyran, thank you so much for answering our questions. Weâ€™re going to end it there. We will see you soon. And if everyone has any more questions, Tyranâ€™s on LinkedIn, shoot him a message. Thatâ€™s very active there. Thanks, everyone. Appreciate it. See you, Tyran.

AI Use Cases for Data Categorization

Spam Detection AI models outperform CAPTCHA, reducing false positives/negatives and saving sales teams time.
Persona Matching AI accurately maps job titles (even with misspellings or in other languages) to personas, improving lead scoring and segmentation.
Open Text Field Categorization AI buckets diverse attribution sources, handling misspellings and languages, enabling richer insights and reporting.
Customization Fine-tuned models allow you to define rules and explanations for each categorization, giving you full control over outcomes.

Additional resources

recommendation-more-help

82e72ee8-53a1-4874-a0e7-005980e8bdf1

51ºÚÁÏ²»´òìÈ