Detecting canary files using Python

Let's face it, defenders have their backs against the wall and are resorting to awesome new techniques to detect malicious actors.  One of those tactics is using file canaries that alert the defenders when the file is opened.  Normally defenders name these files with juicy names like 2020-passwords-backup.docx or important-urls.docx and fill it with junk data to lure an attacker to interact with it.

Canary Services

There are a bunch of Canary services out there, some free, some pay-to-win.  For the sake of this post though, I am focusing on https://canarytokens.org/ from Thinkst Canary (https://canary.tools/). This is a lightweight version of their pay product that has a lower entry point for defenders and may be more prevalent in environments that don't have budget for pay-for tools.

How defender detection works

At the most basic level, the defender visits https://canarytokens.org, selects a token they want to create, in this case a Microsoft Word Document.  At this point, Canary Tokens generates a new word document with a unique identifier embedded in it.  When the attacker opens the file, a resource (an image) is loaded via a URL that has a reference to the canary's unique identifier.  Canary Tokens monitors these URLs and when they see a hit on that unique ID they trigger an alert.

The canary URL is loaded in a document section XML file located in the docx archive.  The payload in the document looks like below

<w:instrText xml:space="preserve"> INCLUDEPICTURE  "http://canarytokens.com/about/REDACTED/submit.aspx" \d  \* MERGEFORMAT </w:instrText>

Detecting basic Microsoft Word Canaries

Now that we know how detection works from the defender side, how can we detect these items as attackers?  Welcome detect-canary.py, you can find all the code at the bottom of this post or at https://github.com/n3tsurge/detect-canary

High level how it works

The script works by looking at a single file, or multiple files in a directory, unzipping them and looking for footer2.xml and then checks to see if certain keywords exist in that file, in this case INCLUDEPICTURE; but it could be any keyword.

Usage

  1. Open a command prompt
  2. Run python detect-canary.py --path <base_path> --search *.docx
  3. Wait
  4. Don't open the files flagged as canaries

If a canary was found the script will issue a warning, as seen in the screenshot below

What's next?

I plan to expand on this tool and add some necessary features:

  • A signature engine to craft different detection methods for different canary types

The Script

import os
import sys
import zipfile
import glob
import logging
import argparse

def get_files(base_path=".", pattern="*.docx"):
    '''
    Use a base_path and a glob pattern to create a list
    of files we want to work on
    '''

    full_path = os.path.join(base_path,pattern)
    print(full_path)

    files = glob.glob(full_path)
    return files


if __name__ == "__main__":

    logging.basicConfig(
        format='%(asctime)s - %(levelname)s - %(message)s', level=logging.INFO)
    
    parser = argparse.ArgumentParser()
    parser.add_argument('--path', type=str, help="The base path where to search")
    parser.add_argument('--search', type=str, help="The pattern to search for e.g. *.docx", default="*.docx")
    parser.add_argument('--keywords', type=list, help="A list of keywords to look for that may indicate the file is a canary", default=['INCLUDEPICTURE'])

    args = parser.parse_args()
    if not args.path:
        logging.error("Need to supply a search path using the --path parameter")
        logging.error("Exiting...")
        sys.exit(1)

    # Create a list of files to work on
    files = get_files(args.path, args.search)

    logging.info("Found {} files".format(len(files)))
    # For each file, open the zip container and look
    # for a file named footer2.xml
    for f in files:
        logging.info("Working on \"{}\"".format(f))
        with zipfile.ZipFile(f) as z:
            footers = [m for m in z.namelist() if m.endswith('footer2.xml')]
            if len(footers) > 0:
                with z.open(footers[0]) as footer:
                    for k in args.keywords:
                        if k in footer.read().decode():
                            logging.warning("\"{}\" is a canary file!".format(f))