Creating fake words: a pseudoword generator

I was trying to figure out how to create a word that's not a word. What I ended up doing was creating a way of generating a random syllable, and then simply appending 2 or 3 of them together. It seems to work well enough. Here's what I got in Python:

import random
 
vowels = ["a", "e", "i", "o", "u"]
consonants = ['b', 'c', 'd', 'f', 'g', 'h', 'j', 'k', 'l', 'm', 'n', 'p', 'q', 
              'r', 's', 't', 'v', 'w', 'x', 'y', 'z']
 
def _vowel():
    return random.choice(vowels)
 
def _consonant():
    return random.choice(consonants)
 
def _cv():
    return _consonant() + _vowel()
 
def _cvc():
    return _cv() + _consonant()
 
def _syllable():
    return random.choice([_vowel, _cv, _cvc])()
 
def create_fake_word():
    """ This function generates a fake word by creating between two and three
        random syllables and then joining them together.
    """
    syllables = []
    for x in range(random.randint(2,3)):
        syllables.append(_syllable())
    return "".join(syllables)
 
if __name__ == "__main__":
    print create_fake_word()

The first four functions are for generating an individual type of syllable (V, CV, or CVC) and then _syllable() just chooses one of them at random. Finally, create_fake_word() calls _syllable() a few times and joins them together. Here is some example output:

hojocina
eliphaa
deaketyed
ciboa
tiuzi

I haven't a clue whether or not there is a better means of generating words that look somewhat real. If you know of a better method, I'd love to hear it!

 

 

13 Comments

  1. Roberto Bonvallet wrote,

    Markov chains are the typical method for random text generation:
    http://en.wikipedia.org/wiki/Markov_chain#Markov_text_generators
    http://www.codinghorror.com/blog/archives/001132.html
    http://apocryph.org/2008/03/10/trying_have_fun_with_markov/

    Cheers.

  2. Seo Sanghyeon wrote,

    This is good enough. If you want something more real, you want to study something called "phonotactics". Wikipedia has a good summary: http://en.wikipedia.org/wiki/English_phonology#Phonotactics

  3. ΤΖΩΤΖΙΟΥ wrote,

    In a hangman game I've written (whose first version I wrote many, many years ago in another galaxy and language :), for the computer-guesses-word-put-by-human phase, part of the algorithm is to break all known words into 4-letter windows (e.g "hangman" breaks into "hang", "angm", "ngma", "gman"), and I've included an easter egg to produce random words that sound natural (I use it to create real-life object names for various purposes). So, basically, open your /usr/share/dict/words file, gather statistics, and start building words. Some examples that I just produced: altifix, exter, cackgrow, dalown, poonfoibilise.

  4. someguy wrote,

    this goes somewhat into the same direction: http://diagrammes-modernes.blogspot.com/2007/08/friendly-readable-id-strings.html

  5. Chris Neugebauer wrote,

    A somewhat different approach here — the words look considerably more "organic" than the ones your script generates, hopefully it's along the lines of what you want:

    http://books.google.com/books?id=Q0s6Vgb98CQC&lpg=PA396&ots=hc1099VmuA&dq=python%20cookbook%20pastiche&pg=PA394

  6. Bryan Rasmussen wrote,

    Ciboa is a real word, although not in english so I guess you are ok (but it is also a place name and person name).

    An improvement I could see would be generate the fake word, run it against Wordnet, if it exists you drop it from results because it is not fake.

    Another thing, to make it look more real would be to have a common list of prefixes and suffixes, make more syllable types to say that this is a word that needs a prefix or suffix and then generate it.

    for an example I think the word
    antihojocina
    or
    hojocinaism
    looks more real than just hojocina.

  7. Roger wrote,

    Pronounceable password generators have been doing this for a few decades. For example here is some background and Java code on one used in Multics (an inspiration for UNIX). http://www.multicians.org/thvv/gpw.html

    I actually use this approach for naming my projects. Using "APG Online" (a web interface to a password generator) I get long lists of words and then pick nice ones and do a Google search on them. It is amazing just how many hits you get on what appear to be nonsense words. If I get zero hits then I name the project after that and it means that the name is unambiguous. It should be possible to also program what type of word you want. For example some words will seem more scientific than others, some will seem more like food, some will appear to be Italian derived etc.

    There are also programs that take various passages and generate new random ones based on that. One example is DadaDodo. There was even a prank pulled by some folks from MIT that generated a random text based on scientific papers and had it accepted in a real journal.

  8. rgz wrote,

    Langmaker

    As said above, its called phonotactics, back when I was interested in making my own language I found out this ge,m

    http://www.softpedia.com/get/Others/Home-Education/LangMaker.shtml

    It's windows only tough. Ah how nostalgic…

  9. Wes Maldonado wrote,

    This small syllable approach sounds a similar this ruby gem called rufus-mnemo formerly known as openwferu-kotoba. http://github.com/jmettraux/rufus-mnemo/tree/master

    > openwferu-kotoba turns (large) integer numbers into japanese-sounding words and vice-versa (but it’s not meant to turn any japanese word into an integer).

    I found this via this blog post that might be interesting to people making wordish phrases: http://blog.logeek.fr/2009/7/2/creating-small-unique-tokens-in-ruby

    Bubble Babble (also mentioned in this post) might be a good fit for this problem space also http://en.wikipedia.org/wiki/Bubble_Babble and here is python implementation http://code.activestate.com/recipes/299133/

    Fun stuff!

  10. Christopher Pound wrote,

    Have a look at http://www.ruf.rice.edu/~pound
    and since you prefer Python, see this in particular http://www.ruf.rice.edu/~pound/lc.py
    It is in essence a Markov chain text generator.

    A phonotactic solution in perl is
    http://www.ruf.rice.edu/~pound/werd
    when used with
    http://www.ruf.rice.edu/~pound/w-english

  11. CJHurst wrote,

    This is rather wonderful.

    When I build a city it will be named using this method, as shall all it's streets.

  12. Python Markov Chains and how to use them | Evan Fosmark wrote,

    [...] due to recommendations to use Markov chains for text generation after my last little script on creating fake words, I have finally gotten around to learning and [...]

  13. emmanuel wrote,

    Take a look at http://crr.ugent.be/Wuggy. This is a pseudoword generator creating very nice pseudowords for different languages. It's written entirely in Python (and with a WxPython interface).

Leave a comment