Building Custom Company-Specific Wordlists

Building a company-specific wordlist is a skill both Defenders and Attackers need. When defending, it is vital to ensure that users are not using weak passwords to protect assets. We need to think like an Attacker and develop the same wordlists an attacker would use against us in brute-force attacks. A custom company-specific wordlist enables a Defender to conduct password audits or block weak passwords from being set. In this post, we will think like an Attacker and create a company-specific wordlist to test our company password policies.

Wordlist Generating Approach

The guiding principle of this site is offense-informing defensive strategies. An honest understanding of our weaknesses will better prepare us against an attacker. With this in mind, we need to generate a wordlist as if we were attacking our own organization. Every organization is guilty of using weak, generic passwords containing public organization-related words. Think of the default passwords your organization uses when setting up a new system or the password the Help-Desk team sets when doing a user password reset. Do those passwords contain words related to your organization?
Some public organization-related words are as follows.

  • Organization Name
  • Organization name abbreviated
  • Organization mascot name
  • Year the organization was established
  • The Organization’s phone number
  • The Organization’s slogan or motto
  • Zip codes where the organization resides
  • City names where the organization resides
  • The name of a product the organization sells
  • The numbers of the organization’s headquarters address
  • etc…

The takeaway is that if it is public, it should not be used as a password.


Wordlist Building Tools

There are many tools for building custom wordlists for password dictionary attacks. I want to focus on the few tools geared toward building a wordlist against a specific target, like a business organization. So wordlist generator tools for building generalized wordlists, like Crunch, will be ignored. We are building a highly targeted wordlist for attacking an organization. We want to narrow our focus as much as possible. Here are the two tools we will be using in this process.

CeWL (Custom Word List generator)

Custom Word List generator, CeWL(pronounced “cool.”) is a ruby app that spiders a given URL up to a specified depth and returns a list of words that can be used for password crackers, such as John the Ripper. CeWL is in the Kali default repository, so you should have no issue installing it if it is not already.

LongTongue

LongTongue is a simple python script that can help you generate wordlists for a specific target. This tool can build lists for users or company-specific targets. You need to download or clone the GitHub repository to get this tool. After getting the LongTongue script, you can just run it with python3.


Building a Targeted Wordlist

Let’s review how we will build our company-targeted wordlist with our tools.

  1. Using CeWL, scrape websites for keywords for the most frequently used words on the site. Save the output to a file.
  2. In a bash terminal, filter the scraped word list for the top 200 words, and prepare the words to be used with LongTongue.
  3. Run the LongTongue tool providing it with the company details and our website scraped words.

1. Scrape the Company Websites for Keywords

We are going to crawl our target website for keywords using CeWL. We will use the list generated later in our wordlist build process. Run the following command to begin crawling the target’s website.

cewl -c -a -m 5 -w data-output.txt https://example.com/
  • -c, Show the count for each word found.
  • -a, Include metadata.
  • -m, Minimum word length, we will set 5 as the minimum.
  • -w, Write the output to the file.
Using CeWL to build a customer wordlist

2. Filter the Output File

We had CeWL add the number of times a word was found to our output file. This word count will allow us to narrow our wordlist down further. The below command will take words found more than 200 times, convert the word to all lowercase, remove duplicate words, and add commas between each word.

awk -F',' '{if($2>200)print$1}' data-output.txt |tr '[:upper:]' '[:lower:]' |sort -u|tr '\n' ',' 
Filtering ceWL output for customer wordlist

Keep the output of the filtered wordlist. We will use these words in the next step.

3. Generate the Final Wordlist with LongTongue

Lastly, we will use the python script LongTongue. The script will take in details about the company target and generate our wordlist.

LongTongue info page.
longtongue.py file settings

You may have noticed in the image above some of the settings need to be updated inside the python script; i.e., we need to directly edit the “longtongue.py” file. Open the “longtongue.py” file and go to line 81. Edit the settings to match your target. The only thing I recommend you change is the year’s settings. I recommend you set the start year as the same year the company was founded. If the target company is a hundred-plus years old, maybe only do the last ten years.

Now we should be set to start the script up using the below command.

python longtongue.py -c -y -l -n
run longtongue.py at the command line.

We gave LongTongue the “-c option, which means we will target a company. LongTongue will then asks us questions about the target company.

When you get to the “Useful Keywords,” paste in the list, we generated in step . Make sure you remove the tailing comma(,).

running longtongue.py to target a company

After entering all the data, the script will start building the wordlist. The script may appear to hang here; however, it has not. The wordlist process is CPU intensive and may take some time; just wait for the script to complete.


Useful Keywords

A quick note about additional “Useful Keywords” before moving on. In the above section, Wordlist Generating Approach, I talk about company details that make leak passwords. Those details, like city, zip, mascot, etc., must be added to the “Useful Keywords” input in the LongTongue script.


Targeted Wordlist Summary

Building a wordlist to target an organization is not just a Red Team or adversarial skill. Defenders need to be able to think like and act like an adversary to build defenses better. These wordlists we build can be used in several tools, such as…

Lastly, remember that this exercise can also help better inform internal company staff on their password practices. Simply demonstrating to others in the company how attackers think will guide them to make better security decisions.