TL;DR, check the decision tree and summary.

Intro

For most beginners, CMake if is a nightmare. It sometimes takes a quoted string as variable and sometimes it takes a quoted string as a pure string literal, as many other languages do.

In CMake, everything is string. You may see CACHE option can set the type for string, such as BOOL, STRING, and PATH. Well, for my understanding, those types are served for cmake-gui purposes. As the cache entry says:

BOOL
Boolean ON/OFF value. cmake-gui(1) offers a checkbox.
FILEPATH
Path to a file on disk. cmake-gui(1) offers a file dialog.
PATH
Path to a directory on disk. cmake-gui(1) offers a file dialog.
STRING
A line of text. cmake-gui(1) offers a text field or a drop-down selection if the STRINGS cache entry property is set.
INTERNAL
A line of text. cmake-gui(1) does not show internal entries. They may be used to store variables persistently across runs. Use of this type implies FORCE.

Is there a good rule to avoid the brain-twisting quoted and unquoted strings? I try to give an answer to address this problem.


string arguments for a command

String plays a very important role in CMake. Every command argument is a string and every variable in CMkae is a string. Well, you can say everything in CMake is a string, which is similar to Bash and Make.

set(OKAY "Oll Korrect")
if("OKAY")
message("Oll Korrect")
endif()

let’s run this script with CMake --trace --debug-output -P Okay.cmake:

/Users/guihaoliang/Work/CMakeDemo/Okay.cmake[4]: set(OKAY Oll Korrect )
/Users/guihaoliang/Work/CMakeDemo/Okay.cmake[5]: if(OKAY )
CMake Warning (dev) at Okay.cmake:5 (if):
 Policy CMP0054 is not set: Only interpret if() arguments as variables or
 keywords when unquoted. Run "cmake --help-policy CMP0054" for policy
 details. Use the cmake_policy command to set the policy and suppress this
 warning.

 Quoted variables like "OKAY" will no longer be dereferenced when the policy is set to NEW. Since the policy is not set the OLD behavior will be used.
This warning is for project developers. Use -Wno-dev to suppress it.
/Users/guihaoliang/Work/CMakeDemo/Okay.cmake[6]: message(Oll Korrect )
Oll Korrect

We will cover the warning, and basically, CMake treats if(OKAY) and if("OKAY") as same thing. Or equivalently, CMake strips the double-quotes.

That’s right, everything is string and no explicit double quotes are needed. And CMake engine will check its variable definition map to verify if OKAY is defined. If it’s defined, great, it will be expanded to its value by if command.

However, I only want to use "OKAY" as a pure string, and no other meanings neither it should mean a variable. If double-quotes are explicitly used, CMake should recognize the string input to the command is just a string, don’t do any extra interpretations. And that’s what policy CMP0054 does if you read the warning carefully.

Since everything is a string, let’s discuss by its 2 different forms: unquoted and quoted (or string literal, in most languages).


unquoted string

The unquoted string can be treated as a single string entity (without any whitespace and cannot be parsed as a list) or variable that refers to some string value. It’s confusing.

set(VAR a) # same as set("VAR" "a")
message(${VAR})

In this example, VAR is a variable and a is a string. Okay, that should be easy to distinguish a variable from a string. If we use ${}, then it’s a variable. Otherwise, string.

Things are getting complicated when CMake if comes into play:

if(<variable|string>)
True if the variable is defined to a value that is not a false constant. False otherwise. (Note macro arguments are not variables.)

The if command was written very early in CMake’s history, predating the ${} variable evaluation syntax, and for convenience, (if command) evaluates variables named by its arguments as shown in the above signatures.

It says since there’s no variable evaluation syntax when if was developed, the CMake has to evaluate the variable to get its value implicitly by expanding the variable because explicit ${} was unavailable at that time. That caused this super weird implicit variable evaluation. As Python Zen, explicit is better than implicit.

If you read the document carefully, if(<string>) will always return FALSE, regardless whether the string is empty or not. Well, that’s weird in other languages, especially in C/C++, where a non-empty string literal will be evaluated as TRUE.

if(<variable|string>)
True if the variable is defined to a value that is not a false constant. False otherwise. (Note macro arguments are not variables.)

As a result, in the if(unquoted) clause, you cannot tell whether unquoted is a variable or a string used to evaluate the condition. If unquoted is treated as a string, the result is FALSE.

In this case for simple if(unquoted), it becomes convenient and easy to remember. You can blindly treat the unquoted as a variable if it’s not a constant. Why? If(unquoted) is only TRUE when it’s a defined variable and its value is not a false constant. Just remember this condition, and all other situations evaluate to FALSE.

But, but, but, other than this simple form if(unquoted),

# explained later
cmake_policy(SET CMP0054 NEW)
cmake_policy(SET CMP0012 NEW)

if(unquoted STREQUAL "unquoted")
message(TRUE)
else()
message(FALSE)
endif()

you cannot blindly treat unquoted as a variable. Annoying. No simple rule to cover all.


quoted string

Since everything is string, can we strip double quotes from string literal form?

It should be okay for most cases except for whitespaces and variable reference form ${}. Let me explain.


quoted string as variable name

Each argument is a string. If argument contains spaces or variable references, quotes come into play. If it’s quoted, it will be treated as one entity. Otherwise, it will be treated as a list, which is merely a semicolon-delimited string.

When you pass a quoted string to a command, you explicitly want CMake don’t split the argument by whitespaces into multiple arguments and treat them as part of a list.

Let’s first look at quoted string arguments with whitespaces inside.

We can use set command to set up a variable:

set(OKAY "Oll Korrect") # set("OKAY" "Oll Korrect")
message(${OKAY})

Due to the quoted string, we can set a variable name containing space (which is a bad practice):

set("Oll Korrect" okay)
# message(${Oll Korrect}) # error since ${} won't allow us include whitespaces
set(Indirect_Var "Oll Korrect")
message(${${Indirect_Var}}) # prints okay

Even though it violates the CMake variable name convention, where no space is allowed in a valid CMake variable name, we can still indirectly reference that invalid variable name. If you are still confused, you can expand ${${Indirect_Var}} to ${Oll Korrect}, which you cannot use directly.


quoted string as input argument

Like Bash or Make, quotes can be used to explicitly group several arguments into one single parameter,

# VAR is a string
set(VAR "a b c")
message(${VAR}) # prints a b c

# VAR is a list
set(VAR a b c)
message(${VAR}) # prints abc

a b c in the first set is treated as a single entity, whereas in the second set, it’s treated as 3 separate entities.


quoted variable reference

What about a quoted variable references?

You can substitute a variable with its value by using ${variable}, where variable is not a quoted string literal. If you do ${"variable"}, you will get an error.

# VAR is a list
set(VAR a b c)
message(${VAR}) # equivalent to message(a b c) or message(a;b;c), prints abc
message("${VAR}") # prints list a;b;c

As the above example shows, if variable references ${VAR} are unquoted, its value, which is a list, will be expanded to 3 separate arguments to command message. Whereas, the second example will treat the list as a single entity. This behavior is quite similar to how Bash command treat input arguments


quoted string as to refer a variable

In the old versions (dark days of CMake), quoted-string can also be used in the if clause to represent variable.

It’s a chaos that both quoted string and unquoted string can be treated as variables and then be dereferenced to get the value. It’s painful when the quoted string happens to be a variable’s name and you only intend to use it as a string, instead of a variable.

set(GUI gui)

if ("GUI" STREQUAL "gui")
# "GUI" will be dereferenced to "gui" since it's defined
# whereas "gui" is treated as string since there's no variable
# named as "gui"
message(TRUE)
else()
message(FALSE)
endif()
# prints TRUE

This behavior is annoying. Luckily, CMake has a warning for this type of behavior. Policy CMP0054 can be set to avoid quoted-string be interpreted as variable and thereafter be dereferenced, if it’s defined. In this way, quoted strings should be purely strings.

What if the quoted-string is a boolean constant?

if(<constant>)
True if the constant is 1, ON, YES, TRUE, Y, or a non-zero number. False if the constant is 0, OFF, NO, FALSE, N, IGNORE, NOTFOUND, the empty string, or ends in the suffix -NOTFOUND. Named boolean constants are case-insensitive. If the argument is not one of these constants, it is treated as a variable.

if ("ON")
# "ON" is treated as variable ON
message(TRUE)
else()
message(FALSE)
endif()
# prints FALSE

CMake tries to interpret ON as a variable, which is undefined, then treat it as a string, finally evaluates the condition statement as FALSE.

set(ON ON)
if ("ON")
# ON is dereferenced by ${ON}
message(TRUE)
endif()
# prints TRUE

In CMake versions 2.6.4 and lower the if() command implicitly dereferenced arguments corresponding to variables, even those named like numbers or boolean constants, except for 0 and 1.

To disable this behavior, just enable policy CMP0012 to avoid dereference boolean constants when quoted and in this way, quoted boolean constant string should be treated as normal string boolean constant.

Later versions of CMake prefer to treat numbers and boolean constants literally, so they should not be used as variable names.

cmake_policy(SET CMP0012 NEW)
set(ON OFF)

# if(ON) works as the same. ON won't have a chance to be treated as a variable.
if ("ON")
# "ON" -> ON, won't dereference it to OFF
message(TRUE)
endif()

# prints TRUE

decision tree for string input

To address this ambiguity, I have this decision tree for you to think like a CMake program:

  1. parse all string entities. if policy CMP0054 is set, memorize quoted string as a single entity that cannot be used as <variable> later to do dereference.

  2. Translate to the equivalent form with quotes stripped because CMake doesn’t see double quotes when it processes string inputs internally.

  3. If the string entity in the command can be accepted as <constant>, do so. If CMP0012 is set to OLD, most boolean constants cannot be identified, such as ON, TRUE and FALSE, etc.

  4. Otherwise, continue to fall back. If it can be accepted as <variable>, check the definition table. If it’s found, dereference its value.

  5. Otherwise, it will be used as a pure <string>.

The same thing applies to CMake while.

Here’s an single example to verify the rule above:

set("Oll Korrect" ON)

if("Oll Korrect") # will be expanded if CMP0054 is not set
message("Oll Korrect")
endif()
  1. “Oll Korrect” is one entity
  2. “Oll Korrect” becomes Oll Korrect, still one entity
  3. if() can accept constant, and Oll Korrect is not a constant.
  4. Oll Korrect is a variable, expand it to ON. Stop.

with trace output:

/Users/guihaoliang/Work/CMakeDemo/Okay.cmake[4]: set(Oll Korrect ON )
/Users/guihaoliang/Work/CMakeDemo/Okay.cmake[5]: if(Oll Korrect )
/Users/guihaoliang/Work/CMakeDemo/hello.cmake[6]: message(Oll Korrect )

You see, CMake knows Oll Korrect is one string entity in if(Oll Korrect ), instead of 2.

Let’s set CMP0054 on,

cmake_policy(SET CMP0054 NEW)
set("Oll Korrect" ON)

if("Oll Korrect")
message("Oll Korrect")
else()
message("Not Oll Korrect")
endif()
  1. “Oll Korrect” is one entity, which is a string literal and cannot be treated as a variable.
  2. “Oll Korrect” becomes Oll Korrect, still one entity
  3. if() can accept constant, and Oll Korrect is not a constant.
  4. Oll Korrect cannot be a variable, skip this step.
  5. Oll Korrect is a string.

Another example,

cmake_policy(SET CMP0054 NEW)
cmake_policy(SET CMP0012 NEW)

set(ON "NOT ON")

if(ON STREQUAL "ON")
message(TRUE)
else()
message(FALSE)
endif()
  1. first ON can be a variable or string. second “ON” is a string.
  2. “ON” becomes ON, but not same as first ON. To distinguish them, call fist ON as ON1, and second ON as ON2.
  3. if( STREQUAL ) form doesn’t accept constants, skip.
  4. ON2 cannot be variable, but ON1 can be dereferenced to string “NOT ON”.
  5. ON2 is string.

The net result is if("NOT ON" STREQUAL "ON"), which prints FALSE.

A take-home question for you, try with our decision tree and print the result in your mind or with cmake -P ;-)

cmake_policy(SET CMP0054 NEW)
cmake_policy(SET CMP0012 NEW)

set(gui GUI)

if(GUI STREQUAL gui)
message(TRUE)
else()
message(FALSE)
endif()

if(GUI STREQUAL "gui")
message(TRUE)
else()
message(FALSE)
endif()

All these examples show that our decision tree works well as hell (might be the internal logic of CMake C++ implementation, who knows).

updates: 2020-01-05, my guess is right,

My posted question got answered!

bool cmConditionEvaluator::GetBooleanValue(
  cmExpandedCommandArgument& arg) const
{
  // Check basic constants.
  if (arg == "0") {
    return false;
  }
  if (arg == "1") {
    return true;
  }

  // Check named constants.
  if (cmIsOn(arg.GetValue())) {
    return true;
  }
  if (cmIsOff(arg.GetValue())) {
    return false;
  }

  // Check for numbers.
  if (!arg.empty()) {
    char* end;
    double d = strtod(arg.c_str(), &end);
    if (*end == '\0') {
      // The whole string is a number.  Use C conversion to bool.
      return static_cast<bool>(d);
    }
  }

  // Check definition.
  const char* def = this->GetDefinitionIfUnquoted(arg);
  return !cmIsOff(def);
}

The above cmake source code is equivalent to if(<arg>).

The arg contains the state to tell whether it’s quoted or not (under the hoold, its value is stored in std::string). If you are curious, check the definition for this->GetDefinitionIfUnquoted:

const char* cmConditionEvaluator::GetDefinitionIfUnquoted(
  cmExpandedCommandArgument const& argument) const
{
  // skip if it's quoted when CMP0054 is set to NEW
  if ((this->Policy54Status != cmPolicies::WARN &&
       this->Policy54Status != cmPolicies::OLD) &&
      argument.WasQuoted()) {
    return nullptr;
  }

  const char* def = this->Makefile.GetDefinition(argument.GetValue());

  if (def && argument.WasQuoted() &&
      this->Policy54Status == cmPolicies::WARN) {
      // ...
      // warnings
    }
  }

  return def;
}

All in all

  1. set CMP0012 and CMP0054 on so that you will expect CMake works like a normal programming language. :-)
  2. don’t use quoted string to refer a variable.
  3. don’t use quoted string to refer a constant
  4. quote variable reference to avoid list expansion.
  5. quote string input with intra-whitespaces to avoid being treated as a list of input strings.
  6. use our decision tree model!

some great resources

I recommend reading the book mastering CMake. It covers many key and fundamental concepts of CMake. Personally, I like to learn things systematically. The first few chapters of this book explain CMake’s internal working mechanism and its basic language syntax extremely well and clear.

For practitioners, I recommend getting hands dirty by following examples after getting exposed to some fundamentals of CMake. Besides, I found this wonderful post quite useful to get started with CMake.