extract_datetimetz
Received
field(s) from an .eml
file. extract_datetimetz
takes in a string and returns a datetime.datetime object from the input string.
extract_datetimetz
function, you can check the source code here.
extract_email_address
extract_email_address
function, you can check the source code here.
extract_ip_address
extract_ip_address
function, you can check the source code here.
extract_ip_address_name
Received
field(s) from an .eml
file. extract_ip_address_name
takes in a string and returns a list of all IP addresses in the input string.
extract_ip_address_name
function, you can check the source code here.
extract_mapi_id
mapi id
in the Received
field(s) from an .eml
file. extract_mapi_id
takes in a string and returns a list of a string containing the mapi id
in the input string.
extract_mapi_id
function, you can check the source code here.
extract_ordered_bullets
extract_ordered_bullets
function, you can check the source code here.
extract_text_after
index
is set, extract after the (index + 1)
th occurrence of the pattern. The default is 0
.
strip
is set to True
. The default is True
.
extract_text_after
function, you can check the source code here.
extract_text_before
index
is set, extract before the (index + 1)
th occurrence of the pattern. The default is 0
.
strip
is set to True
. The default is True
.
extract_text_before
function, you can check the source code here.
extract_us_phone_number
extract_us_phone_number
function, you can check the source code here.
group_broken_paragraphs
.txt
files. By default, group_broken_paragraphs
groups together lines split by \n
. You can change that behavior with the line_split
kwarg. The function considers \n\n
to be a paragraph break by default. You can change that behavior with the paragraph_split
kwarg.
Examples:
group_broken_paragraphs
function, you can check the source code here.
remove_punctuation
remove_punctuation
function, you can check the source code here.
replace_unicode_quotes
\x91
in strings.
Examples:
replace_unicode_quotes
function, you can check the source code here.
translate_text
translate_text
cleaning function translates text between languages. translate_text
uses the Helsinki NLP MT models from transformers
for machine translation. Works for Russian, Chinese, Arabic, and many other languages.
Parameters:
text
: the input string to translate.
source_lang
: the two letter language code for the source language of the text. If source_lang
is not specified, the language will be detected using langdetect
.
target_lang
: the two letter language code for the target language for translation. Defaults to "en"
.
translate_text
function, you can check the source code here.