How Structure and Language Choices Impact Prompt Engineering for LLMs

Choose a prompt:

Your output should embody brevity.

Or:

Your output should demonstrate brevity.

This isn’t a trick question. It’s a practical problem in the age of ChatGPT, Gemini, Claude — or whatever flavor of generative AI (genAI) large language model (LLM) you prefer.

Because interestingly (and frustratingly), when you interact with genAI LLMs, the way you structure prompts and your approach to wording and phrasing matter much more than you probably want. It’s an inevitable consequence of your interface.

If you control your genAI using language, then the language you use matters.

The knobs and levers, the switches and sliders. They’re gone. You got maximal freedom in instructing your bot. If you do it the right way, that is.

What’s the Right Way to Structure Prompts?

It largely depends on what you want to achieve.

AI is a tool, not a magic bullet. That’s what Gemini told me yesterday. Anyway, there is no singular “right way” to structure prompts, but there are general best practices that help it along.

These organizational crutches improve prompt design (and we’ll talk about WHY within each of the sections below):

Scaffolding

In a remarkably meaningful namesake, scaffolding for LLMs does exactly what it sounds like: it provides structural support for prompts. One way you could look at it is like an outline, except it doesn’t necessarily lead to sequential execution.

Consider this simple prompt (we’ll call this our “naive” prompt):

Write a 300-word passage on international recruitment via Employer of Record (EOR) services meant to be read by SMB owners with digital-first businesses who want to expand their team. The main topic is why EORs are a good option for remote teams.

That seems slightly thought-over. At least it gives the LLM an idea about the target audience and an angle on the topic (i.e. “why”) — and it even provides output expectations in the form of a word count.

Let’s see what Gemini would give us for this:
(click to expand)

For many companies’ blogs, this might work. And I emphasize “might.”

Indeed, for years the Search Engine Optimization (SEO) game implicitly encouraged surface level, mediocre quality content like this for its own sake: it’s a piece of content that provides an internal link, metadata, and context for Google to better understand a company’s space, its authority in said space, and a small part of its content mix.

Of course, SEO experts would point out whitehat best practice says otherwise. But that’s an entirely different thread we’re not interested in at the moment.

If you wanted to improve on the naive prompt above based on what you wanted to tweak in its output, you might:

Address some tonal concerns; say you want it to sound more authoritative instead of sales-y.
Add specific pain points that you want it to mention and how EORs address those, so maybe you look at your buyer personas and pick one.
Add a bit of structure and direction to the discussion because, for instance, you want to talk about three specific things (because your offering does those things).

So now, for example, you cram that all into an “updated” prompt:

Write a 300-word passage on international recruitment via Employer of Record (EOR) services meant to be read by SMB owners with digital-first businesses who want to expand their team but aren’t sure what’s the best way to do that. The main topic is why EORs are a good option for remote teams. Discuss in the passage how local incorporation is a concern, how they can choose between FTEs and contractors, and how they need to carefully integrate payroll or invoicing. Be conversational but also authoritative, like giving advice; don’t make a sales pitch.

So how would this updated prompt fare?
(click to expand)

At this point, you’ve already begun to provide scaffolding. Or at least a step ladder, if not a scaffold.

Fast forward and you can imagine how this would develop into a structured approach and a much lengthier prompt with intentional design and engineering.

This is where scaffolding frameworks like CARE (context, ask, rules, and examples) come in handy.

Note on output quality:
(click to expand)

You might notice that the updated prompt somehow stripped away Gemini’s favorable formatting with bullet points. Because our naive prompt essentially lets Gemini decide what points to discuss and how, its output reflects the weights it assigned to those specific talking points and the decision to use bullet points. When we started to override its more granular considerations, we can reasonably assume some things:

As a result, it did not offer to do of its own volition, probably because our updated prompt overruled its original naive disposition (which is a culmination of its training data and the weights, guardrails, and reinforcement learning applied by Google)

It completely disregarded its own talking points to deploy our own,

It did not receive explicit instructions regarding readability (formatting like boldface and bullet points), and

As a result, it did not offer to do of its own volition, probably because our updated prompt overruled its original naive disposition (which is a culmination of its training data and the weights, guardrails, and reinforcement learning applied by Google)

Demarcation and Delineation

An inherent part of scaffolding prompts is demarcating where sections start and stop. If you want to make the sections more explicit in structure so the LLM “understands” the distinction between them, you could also provide delineation.

Let’s demarcate (and reorganize a bit) our prompt above, with delineation in the form of a line of hashtags:

We’re going to write a blog post for a company that provides Employer of Record (EOR) services. The target audience is SMB owners with digital-first businesses who want to expand their team but aren’t sure what’s the best way to do that. We want to explain to them why EORs are a good option for building remote teams.

##########

The target 300-word post needs to be conversational but also authoritative, like giving advice; don’t make a sales pitch.

Discuss in the passage:

1 – how local incorporation is a concern

2 – how they can choose between FTEs and contractors

3 – how they need to carefully integrate payroll or invoicing

In this example, it’s actually a little overkill to use delineation because the LLM probably won’t have trouble understanding section demarcation even without it. But for hefty, complex prompts, delineation can become much more complicated, as we will explore in a later article.

So what about this prompt? How’s its output?
(click to expand)

In the same way that using delineation here didn’t really make that much of a difference, providing well-structured demarcation also didn’t result in significant improvements. We’re only showing the progression for the sake of demonstration, though — given more complex prompt requirements, demarcation and delineation can meaningfully influence output.

They also act as a crutch to the commodification of a prompt. Well demarcated and delineated prompt design lends itself well to templating. Even without integration via APIs to either your own SaaS platform or straight to your CMS, if you got a lineup of prompt templates and boilerplates ready to go, you can reliably leverage genAI.

Notably, I think any improvements you see in the newest output above (compared to the last one) is mostly due to the updated prompt clarifying intent. In the new prompt, it specifies that it’s a blog post for a provider of EOR services.

So now would be a good time to talk about word choice and phrasing in prompt engineering.

Verbiage and Phrasing: All about Feature Activation in LLMs

Chair.

You just read the word, “chair.” What came to mind?

Probably a visualization of a real world chair. The word in the English language is just a linguistic representation of the actual physical object.

For LLMs, the word, the language, is all they “know.” They can’t map that back to the real world because they’re language models. So they instead know the computational relationships of the word “chair” to other words and phrases that constitute related concepts.

In an LLM’s head, the word “chair,” for instance, is computationally, relationally closer to the word “table” than it is to the phrase “I think, therefore I am.” And the concept represented by the words “World Wrestling Entertainment” is computationally between those two distances — probably close to the related phrase “folding chair.”

LLMs don’t even have a visual representation of the word “chair” — just pure math.

These words, phrases, concepts, and tacit, computational clumps of “knowledge” within an LLM are called “features.” These features are spread out over digital neurons that are then “activated” when the feature they represent is tapped for a completion or generation task. Parts of the human brain light up to signify which areas are involved in what activities. In the same vein, LLM vector spaces — their “minds” — reflect feature activation when they are called upon to generate output.

This is, of course, a simplification. Mind any lax interpretations, but in general: when processing a prompt, the LLM calculates what features to activate to then guide its output.

If you’re interested to read more:
(click to expand)

ArsTechnica covered recent research regarding feature activation and explainability of genAI LLM processes: Here’s what’s really going on inside an LLM’s neural network. Additionally, there’s research looking into multimodal LLMs and an interesting dive into whether a language model can visually construct and identify an actual chair, with curious results.

So what does feature activation mean for prompt design and engineering?

Influencing LLMs Mean Activating the Right Features

When you’re trying to wrangle genAI LLMs to output in just the right way, you’re trying to make it activate the right features.

If you say:

“Write [X] in a happy tone.”

The feature activations you achieve all relate to the concept “happy.” The LLM will scour all the right words, phrases, and pieces of work in its training data that it computationally knows falls under the category “happy” to extract elements it can emulate.

If you change it to:

“Write [X] in an angry tone.”

It’s the same song and dance with feature activations for “angry.”

As the output you require (and its prerequisite prompt) get more complex, the wiggle room for interpretation and misaligned expectations grows. This is where a lot of prompt designers and engineers get frustrated.

Tone of delivery is a major headache for many.

NNGroup published a bit about their own research and recommendations into this issue. It reflects essentially a few degrees more complex of a development of the above principle. Ultimately, prompt engineers would do well with a bit of linguistic acuity — or a technically well-structured approach to communication.

Prompt Engineering Best Practices

So, are there some specific, concrete best practices that can help us wrangle LLMs more effectively?

Trumpets, please, for “The Prompt Report: A Systematic Survey of Prompting Techniques.” This paper is a meta-study that analyzed existing research around prompting LLMs and is a treasure trove of techniques and best practices.

One of the key takeaways from the report is the concept of prompt sensitivity, which confirms that LLMs are incredibly sensitive to even slight changes in wording. This circles back to feature activation and when in complex scenarios this prompt sensitivity leads to unexpected results.

Word choice and even word order can trigger changes in the LLM output.

It also confirms that providing exemplars (examples) and using Chain-of-Thought (CoT) prompting are quite powerful in directing output, though of course the increased complexity makes the linguistic minefield we need to navigate likewise more complicated. Because, at the very least, now we need to mind the wording and phrasing of those, as well.

The “Prompt Report” provides a bunch of best practices to keep in mind:

Be mindful of exemplar quantity. Don’t overload the LLM with too many examples. Start with a smaller set (around 10-15) and experiment to see if adding more actually improves performance — note that the paper identifies 20 exemplars as a sweet spot where if you go above that, you get sharply depreciating gains.
Prioritize similar exemplars. Choose examples that are very similar to the task you want the LLM to perform. This helps activate the right features and guide the output.
Mind the order of exemplars – LLMs can emulate pretty well. So if you have a set of exemplars, it’s better if there are more positive examples than negative ones. Additionally, order examples from most-to-least — start with which ones the AI should follow most.
Experiment with exemplar format. The way you present examples matters. The common “Q: {input}, A: {label}” format is a good starting point, but don’t be afraid to try other formats that might be more effective for your task.
Leverage role prompting. Tell the LLM to act as a specific persona. This can dramatically improve the style and tone of the output. Want a more formal tone? Tell the LLM to act as a professor. Want a more creative tone? Tell it to act as a poet.
Specify the desired style. Don’t leave it up to the LLM to guess the style you want. Explicitly state whether you want something conversational, formal, informative, persuasive, etc.
Use thought-inducing phrases. If you want the LLM to show its reasoning process, use phrases like “Let’s think step by step” or “Walk me through this.” This activates the LLM’s reasoning features and leads to more insightful outputs.
Provide CoT exemplars. If you want even better reasoning, show the LLM examples of how to do it. Include exemplars that demonstrate the thought process you want to see.

Let me just close on this takeaway: all of this highlights the need for genAI expertise if you want to make it work for your content purposes. In an AI-driven world where content generation naturally involves LLMs, the human-in-the-loop is quintessential — and you better make sure the specific human knows what they’re doing.

In a later piece:

We’ll explore more advanced prompt design and engineering that covers:

Optimizing pseudocode for minimizing tokenization
Leveraging hyper-interfacing and markup to develop non-linear scaffolding
Providing archetypes and frameworks for the LLM to emulate
Using grounding and output schema