extract_datetimetz
Extracts the date, time, and timezone in the Received field(s) from an .eml file. extract_datetimetz takes in a string and returns a datetime.datetime object from the input string.
extract_datetimetz function, you can check the source code here.
extract_email_address
Extracts email addresses from a string input and returns a list of all the email addresses in the input string.
extract_email_address function, you can check the source code here.
extract_ip_address
Extracts IPv4 and IPv6 IP addresses in the input string and returns a list of all IP address in input string.
extract_ip_address function, you can check the source code here.
extract_ip_address_name
Extracts the names of each IP address in the Received field(s) from an .eml file. extract_ip_address_name takes in a string and returns a list of all IP addresses in the input string.
extract_ip_address_name function, you can check the source code here.
extract_mapi_id
Extracts the mapi id in the Received field(s) from an .eml file. extract_mapi_id takes in a string and returns a list of a string containing the mapi id in the input string.
extract_mapi_id function, you can check the source code here.
extract_ordered_bullets
Extracts alphanumeric bullets from the beginning of text up to three “sub-section” levels.
Examples:
extract_ordered_bullets function, you can check the source code here.
extract_text_after
Extracts text that occurs after the specified pattern.
Options:
- 
If indexis set, extract after the(index + 1)th occurrence of the pattern. The default is0.
- 
Strips trailing whitespace if stripis set toTrue. The default isTrue.
extract_text_after function, you can check the source code here.
extract_text_before
Extracts text that occurs before the specified pattern.
Options:
- 
If indexis set, extract before the(index + 1)th occurrence of the pattern. The default is0.
- 
Strips leading whitespace if stripis set toTrue. The default isTrue.
extract_text_before function, you can check the source code here.
extract_us_phone_number
Extracts a phone number from a section of text.
Examples:
extract_us_phone_number function, you can check the source code here.
group_broken_paragraphs
Groups together paragraphs that are broken up with line breaks for visual or formatting purposes. This is common in .txt files. By default, group_broken_paragraphs groups together lines split by \n. You can change that behavior with the line_split kwarg. The function considers \n\n to be a paragraph break by default. You can change that behavior with the paragraph_split kwarg.
Examples:
group_broken_paragraphs function, you can check the source code here.
remove_punctuation
Removes ASCII and unicode punctuation from a string.
Examples:
remove_punctuation function, you can check the source code here.
replace_unicode_quotes
Replaces unicode quote characters such as \x91 in strings.
Examples:
replace_unicode_quotes function, you can check the source code here.
translate_text
The translate_text cleaning function translates text between languages. translate_text uses the Helsinki NLP MT models from transformers for machine translation. Works for Russian, Chinese, Arabic, and many other languages.
Parameters:
- 
text: the input string to translate.
- 
source_lang: the two letter language code for the source language of the text. Ifsource_langis not specified, the language will be detected usinglangdetect.
- 
target_lang: the two letter language code for the target language for translation. Defaults to"en".
translate_text function, you can check the source code here.
