Package 'keyToEnglish' reference manual

Title:	Convert Data to Memorable Phrases
Description:	Convert keys and other values to memorable phrases. Includes some methods to build lists of words.
Authors:	Max Candocia
Maintainer:	Max Candocia <[email protected]>
License:	GPL (>= 2)
Version:	0.2.1
Built:	2025-02-24 03:31:58 UTC
Source:	https://github.com/mcandocia/keytoenglish

Corpora to Word List

Description

Converts a collection of documents to a word list

Usage

corpora_to_word_list(
  paths,
  ascii_only = TRUE,
  custom_regex = NA,
  max_word_length = 20,
  stopword_fn = DEFAULT_STOPWORDS,
  min_word_count = 5,
  max_size = 16^3,
  min_word_length = 3,
  output_file = NA,
  json_path = NA
)
corpora_to_word_list(
  paths,
  ascii_only = TRUE,
  custom_regex = NA,
  max_word_length = 20,
  stopword_fn = DEFAULT_STOPWORDS,
  min_word_count = 5,
  max_size = 16^3,
  min_word_length = 3,
  output_file = NA,
  json_path = NA
)

Arguments

`paths`	Paths of plaintext documents
`ascii_only`	Will omit non-ascii characters if TRUE
`custom_regex`	If not NA, will override ascii_only and this will determine what a valid word consists of
`max_word_length`	Maximum length of extracted words
`stopword_fn`	Filename containing stopwords to use or a list of stopwords (if length > 1)
`min_word_count`	Minimum number of occurrences for a word to be added to word list
`max_size`	Maximum size of list
`min_word_length`	Minimum length of words
`output_file`	File to write list to
`json_path`	If input text is JSON, then it will be parsed as such if this is a character of JSON keys to follow

Value

A 'character' vector of words

Greater Common Denominator

Description

Calculates greatest common denominator of a list of numbers

Usage

GCD(...)
GCD(...)

Arguments

...

Any number of 'numeric' vectors or nested 'list's containing such

Value

A 'numeric' that is the greatest common denominator of the input values

Generate Random Sentences

Description

Randomly generate sentences with a specific structure

Usage

generate_random_sentences(n, punctuate = TRUE, fast = FALSE)
generate_random_sentences(n, punctuate = TRUE, fast = FALSE)

Arguments

`n`	'numeric' number of sentences to generate
`punctuate`	'logical' value of whether to add spaces, capitalize first letter, and append period
`fast`	'logical'

Value

'character' vector of randomly generated sentences

Hash to Sentence

Description

Hashes data to a sentence that contains 54 bits of entropy

Usage

hash_to_sentence(x, ...)
hash_to_sentence(x, ...)

Arguments

`x`	- Input data, which will be converted to 'character' if not already 'character'
`...`	- Other parameters to pass to 'keyToEnglish()', besides 'word_list', 'hash_subsection_size', and 'hash_function'

Value

'character' vector of hashed field resembling phrases

Key to English

Description

Hashes field to sequence of words from a list.

Usage

keyToEnglish(
  x,
  hash_function = "md5",
  phrase_length = 5,
  corpus_path = NA,
  word_list = wl_common,
  hash_subsection_size = 3,
  sep = "",
  word_trans = "camel",
  suppress_warnings = FALSE,
  hash_output_length = NA,
  forced_limit = NA,
  numeric_append_range = NA
)
keyToEnglish(
  x,
  hash_function = "md5",
  phrase_length = 5,
  corpus_path = NA,
  word_list = wl_common,
  hash_subsection_size = 3,
  sep = "",
  word_trans = "camel",
  suppress_warnings = FALSE,
  hash_output_length = NA,
  forced_limit = NA,
  numeric_append_range = NA
)

Arguments

`x`	- field to hash
`hash_function`	'character' name of hash function or hash 'function' itself, returning a hexadecimal character
`phrase_length`	'numeric' of words to use in each hashed key
`corpus_path`	'character' path to word list, as a single-column text file with one word per row
`word_list`	'character' list of words to use in phrases
`hash_subsection_size`	'numeric' length of each subsection of hash to use for word index. 16^N unique words can be used for a size of N. This value times phrase_length must be less than or equal to the length of the hash output. Must be less than 14.
`sep`	'character' separator to use between each word.
`word_trans`	A ‘function', 'list' of functions, or ’camel' (for CamelCase). If a list is used, then the index of the word of each phrase is mapped to the corresponding function with that index, recycling as necessary
`suppress_warnings`	'logical' value indicating if warning of non-character input should be suppressed
`hash_output_length`	optional 'numeric' if the provided hash function is not a 'character'. This is used to send warnings if the hash output is too small to provide full range of all possible combinations of outputs.
`forced_limit`	for multiple word lists, this is the maximum number of values used for calculating the index (prior to taking the modulus) for each word in a phrase. Using this may speed up processing longer word lists with a large least-common-multiple among individual word list lengths. This will introduce a small amount of bias into the randomness. This value should be much larger than any individual word list whose length is not a factor of this value.
`numeric_append_range`	optional 'numeric' value of two integers indicating range of integers to append onto data

Value

'character' vector of hashed field resembling phrases

Examples

# hash the numbers 1 through 5
keyToEnglish(1:5)

# alternate upper and lowercase, 3 words only
keyToEnglish(1:5, word_trans=list(tolower, toupper), phrase_length=3)
# hash the numbers 1 through 5
keyToEnglish(1:5)

# alternate upper and lowercase, 3 words only
keyToEnglish(1:5, word_trans=list(tolower, toupper), phrase_length=3)

Least Common Multiple

Description

Calculates least common multiple of a list of numbers

Usage

LCM(...)
LCM(...)

Arguments

...

Any number of 'numeric' vectors or nested 'list's containing such

Value

A 'numeric' that is the least common multiple of the input values

Uniqueness Max Size

Description

Returns approximate number of elements that you can select out of a set of size 'N' if the probability of there being any duplicates is less than or equal to 'p'

Usage

uniqueness_max_size(N, p)
uniqueness_max_size(N, p)

Arguments

`N`	'numeric' size of set elements are selected from, or a 'list' of 'list's of 'character' vectors (e.g., 'wml_animals')
`p`	'numeric' probability that there are any duplicate elements

Value

'numeric' value indicating size. Value will most likely be non-integer

Examples

# how many values from 1-1,000 can I randomly select before
# I have a 10% chance of having at least one duplicate?

uniqueness_max_size(1000,0.1)
# 14.51
# how many values from 1-1,000 can I randomly select before
# I have a 10% chance of having at least one duplicate?

uniqueness_max_size(1000,0.1)
# 14.51

Uniqueness Probability

Description

Calculates probability that all 'r' elements of a set of size 'N' are unique

Usage

uniqueness_probability(N, r)
uniqueness_probability(N, r)

Arguments

`N`	'numeric' size of set. Becomes unstable for values greater than 10^16.
`r`	'numeric' number of elements selected with replacement

Value

'numeric' probability that all 'r' elements are unique

Clean JSON text from Wikipedia

Description

Clean JSON text from Wikipedia

Usage

wiki_clean(x)
wiki_clean(x)

Arguments

`x`	'character' JSON text

Value

'character' JSON text

Non-Origin Adjectives Wordlist

Description

Word list of 256 adjectives that do not describe origin, so they can usually be used prior to visual/origin adjectives without breaking any grammar rules

Usage

data(wl_adjectives_nonorigin)
data(wl_adjectives_nonorigin)

Format

A 'character' vector

Visual Adjectives Wordlist

Description

Word list of 256 adjectives that visually describe an object.

Usage

data(wl_adjectives_visual)
data(wl_adjectives_visual)

Format

A 'character' vector

Animal word list

Description

Word list generated by processing several animal-related pages on Wikipedia

Usage

data(wl_animal)
data(wl_animal)

Format

An object of class 'character'

Examples

data(wl_animal)
keyToEnglish(1:5, word_list=wl_animal)
data(wl_animal)
keyToEnglish(1:5, word_list=wl_animal)

Common word list

Description

Public domain word list of common words

Usage

data(wl_common)
data(wl_common)

Format

An object of class 'character'

References

Public Domain Word Lists. Michael Wehar https://github.com/MichaelWehar/Public-Domain-Word-Lists

Examples

data(wl_common)
keyToEnglish(1:5, word_list=wl_common)
data(wl_common)
keyToEnglish(1:5, word_list=wl_common)

Freq 5663 word list

Description

Public domain word list of common words, slightly truncated from original version

Usage

data(wl_freq5663)
data(wl_freq5663)

Format

An object of class 'character'

References

Public Domain Word Lists. Michael Wehar https://github.com/MichaelWehar/Public-Domain-Word-Lists

Examples

data(wl_common)
keyToEnglish(1:5, word_list=wl_freq5663)
data(wl_common)
keyToEnglish(1:5, word_list=wl_freq5663)

Literature word list

Description

Word list generated by processing several works of literature on Project Gutenberg

Usage

data(wl_literature)
data(wl_literature)

Format

An object of class 'character'

References

Project Gutenberg. Project Gutenberg

Examples

data(wl_literature)
keyToEnglish(1:5, word_list=wl_literature)
data(wl_literature)
keyToEnglish(1:5, word_list=wl_literature)

Concrete Nouns Wordlist

Description

Word list of 2048 singular, concrete nouns, largely excluding materials and liquids that cannot be referred to in the singular form

Usage

data(wl_nouns_concrete)
data(wl_nouns_concrete)

Format

A 'character' vector

Plural Concrete Nouns Wordlist

Description

Word list of 2048 concrete nouns in plural form, largely excluding materials and liquids that cannot be referred to in the singular form.

Usage

data(wl_nouns_concrete_plural)
data(wl_nouns_concrete_plural)

Format

A 'character' vector

Science word list

Description

Word list generated by processing several science-related pages on Wikipedia

Usage

data(wl_science)
data(wl_science)

Format

An object of class 'character'

Examples

data(wl_science)
keyToEnglish(1:5, word_list=wl_science)
data(wl_science)
keyToEnglish(1:5, word_list=wl_science)

Transitive Verbs in Gerund Form

Description

Word list of 256 transitive verbs in gerund form (i.e., "ing" at end)

Usage

data(wl_verbs_transitive_gerund)
data(wl_verbs_transitive_gerund)

Format

A 'character' vector

Transitive Verbs in Infinitive Form

Description

Word list of 256 transitive verbs in infinitive form (minus the "to")

Usage

data(wl_verbs_transitive_infinitive)
data(wl_verbs_transitive_infinitive)

Format

A 'character' vector

Transitive Verbs in Present Form

Description

Word list of 256 transitive verbs in present tense

Usage

data(wl_verbs_transitive_present)
data(wl_verbs_transitive_present)

Format

A 'character' vector

Animal Phrase Structure Word Multilist

Description

Word lists of sizes, colors, animals, and attributes to construct memorable phrases

List of word lists that combine cute words with physics-related words

Usage

data(wml_animals)

data(wml_animals)
data(wml_animals)

data(wml_animals)

Format

A 'list' of 'character' vectors

Examples


keyToEnglish(1:5, word_list=wml_animals)
keyToEnglish(1:5, word_list=wml_animals)

Cute Physics Multilist

Description

List of word lists that combine cute words with physics-related words

Usage

data(wml_cutephysics)
data(wml_cutephysics)

Format

A 'list' of 'character' vectors

Long Sentence Multilist

Description

List of word lists that can be used to make a 54-byte, often humorous, sentence

Usage

data(wml_long_sentence)
data(wml_long_sentence)

Format

A 'list' of 'character' vectors

Package 'keyToEnglish'

Help Index

Corpora to Word List

Description

Usage

Arguments

Value

Greater Common Denominator

Description

Usage

Arguments

Value

Generate Random Sentences

Description

Usage

Arguments

Value

Hash to Sentence

Description

Usage

Arguments

Value

Key to English

Description

Usage

Arguments

Value

Examples

Least Common Multiple

Description

Usage

Arguments

Value

Uniqueness Max Size

Description

Usage

Arguments

Value

Examples

Uniqueness Probability

Description

Usage

Arguments

Value

Clean JSON text from Wikipedia

Description

Usage

Arguments

Value

Non-Origin Adjectives Wordlist

Description

Usage

Format

Visual Adjectives Wordlist

Description

Usage

Format

Animal word list

Description

Usage

Format

Examples

Common word list

Description

Usage

Format

References

Examples

Freq 5663 word list

Description

Usage

Format

References

Examples

Literature word list

Description

Usage

Format

References

Examples