4.1 Prompt Engineering

An API token is required for OpenAI’s ChatGPT API service. I created one at https://platform.openai.com/api-keys and saved it to .Renviron. See usethis::edit_r_environ().

It is possible to elicit responses from OpenAI models in a particular format. Large models like GPT-3.5 will adapt their responses to the format you specify. Let’s try a few examples. First, set up a function to make the call.

get_openai_response <- function(message_list) {
  my_resp <-
    request("https://api.openai.com/v1/chat/completions") %>%
    req_headers(Authorization = paste("Bearer", Sys.getenv("OPENAI_API_KEY"))) %>%
    req_body_json(list(
      model = "gpt-3.5-turbo",
      temperature = 1, # predictable..creative [0..2]
      messages = message_list
    )) %>%
    req_perform() %>% 
    resp_body_json() %>%
    pluck("choices", 1, "message", "content")
}

A standard prompt with system message like “You are an expert in Major League Baseball.” followed by user message “Who won the 2016 World Series?” might return a free-form text response like this:

get_openai_response(
  message_list = list(
    list(role = "system", content = "You are an expert in Major League Baseball."),
    list(role = "user", content = "Who won the 2016 World Series?")
  )
) %>% cat()

## The Chicago Cubs won the 2016 World Series. It was their first championship title since 1908, ending a 108-year drought. The Cubs defeated the Cleveland Indians in a thrilling seven-game series.

You can tweak the system prompt to set a formatted response. The following is called a one-shot because it gives a single example for the model to learn from.

get_openai_response(
  message_list = list(
    list(role = "system", content = glue(
    "You are an expert in Major League Baseball.\n\n",
    "Who won the 2016 World Series?\n",
    "Winning Team: Chicago Cubs.\n",
    "Losing Team: Cleveland Indians.")),
    list(role = "user", content = "Who won the 2016 World Series?")
  )
) %>% cat()

## The Chicago Cubs won the 2016 World Series.

That didn’t work. How about a two-shot?

get_openai_response(
  message_list = list(
    list(role = "system", content = glue(
    "You are an expert in Major League Baseball.\n\n",
    "Who won the 2016 World Series?\n",
    "Winning Team: Chicago Cubs.\n",
    "Losing Team: Cleveland Indians.\n\n",
    "Who won the 1997 World Series?\n",
    "Winning Team: Florida Marlins.\n",
    "Losing Team: Cleveland Indians.")),
    list(role = "user", content = "Who won the 2016 World Series?")
  )
) %>% cat()

## The Chicago Cubs won the 2016 World Series.

Thought that would do it… There is another way. The model learns best by conversation. Use the assistant role to help it along.

get_openai_response(
  message_list = list(
    list(role = "system", content = "You are an expert in Major League Baseball.\n\n"),
    list(role = "user", content = "Who won the 2016 World Series?"),
    list(role = "assistant", content = glue("Winning Team: Chicago Cubs.\n",
                                            "Losing Team: Cleveland Indians.")),
    list(role = "user", content = "Who won the 1997 World Series?"),
    list(role = "assistant", content = glue("Winning Team: Florida Marlins.\n",
                                            "Losing Team: Cleveland Indians.")),
    list(role = "user", content = "Who won the 2016 World Series?")
  )
) %>% cat()

## Winning Team: Chicago Cubs.
## Losing Team: Cleveland Indians.

You would operationalize that by putting the system and user-assistant pairs in the function definition. Still one more option is to supply explicit instructions. Below, I combine explicit instructions with two-shot.

get_openai_response(
  message_list = list(
    list(role = "system", content = glue("You are an expert in Major League Baseball.\n\n",
                                         "Answer the question with the winning team ",
                                         "and losing team. Use the following format:\n",
                                         "Winning Team: \n",
                                         "Losing Team: ")),
    list(role = "user", content = "Who won the 2016 World Series?"),
    list(role = "assistant", content = glue("Winning Team: Chicago Cubs.\n",
                                            "Losing Team: Cleveland Indians.")),
    list(role = "user", content = "Who won the 1997 World Series?"),
    list(role = "assistant", content = glue("Winning Team: Florida Marlins.\n",
                                            "Losing Team: Cleveland Indians.")),
    list(role = "user", content = "Who won the 2016 World Series?")
  )
) %>% cat()

## Winning Team: Chicago Cubs.
## Losing Team: Cleveland Indians.