Dependencies
This Python script has the following dependencies:
imapclient
: library for accessing Outlook inbox via IMAPemail
: Python standard library for working with emailsos
: Python standard library for interacting with the operating systemdotenv
: library for loading environment variables from a.env
filebs4
(BeautifulSoup): library for extracting information from HTML pagesurllib
: Python standard library for handling URLs
Environment Variables
This script uses the following environment variables:
EMAIL_ADDRESS
: email address of the Outlook account to be usedEMAIL_PASSWORD
: password of the Outlook account to be usedIMAP_SERVER
: address of the Outlook IMAP serverIMAP_SSL
: indicates whether to use an SSL/TLS connection to connect to the IMAP server (True
) or not (False
).UNSEEN
(optional): indicates whether to search only for unread emails (True
) or all emails (False
). If this variable is not defined or its value is not"True"
, all emails will be searched.FOLDERS_EMAIL
: comma-separated list of folders to be checked for emails containing the unsubscribe links.KEYWORDS_FILE
: path to the text file containing the keywords to be searched for in the email links. Each keyword should be on a separate line.
The environment variables are loaded from a .env
file in the root of the project.
Functioning
Searching for emails
The script connects to the Outlook account and goes through the folders indicated in FOLDERS_EMAIL
, searching for unread emails (if UNSEEN
is "True"
) or all emails (otherwise).
For each email found, the script checks if the email content is HTML. If it is, it extracts the links contained in the HTML using the BeautifulSoup library. Then, it checks if any of these links contains any of the keywords present in the file indicated in KEYWORDS_FILE
.
If any link contains one of the keywords, the script stores the email information (sender and subject) and the link in a dictionary and adds that dictionary to a list.
Generating the HTML file
With the list of unsubscribe link information in hand, the script generates an HTML file from a template located in templates/template.html
.
The HTML template contains a table with information about the sender, subject, and unsubscribe link. The script fills the rows of this table with the information stored in the list of dictionaries.
The generated HTML file is saved in unsubscribe_links.html
.
Usage
Before running the script, it is necessary to define the environment variables in the .env
file and create a keywords.txt
file with the keywords that should be searched for in the email links.
The unsubscribe_links.html
file will be generated in the same folder as the script, containing the table with the unsubscribe links found in the emails.
Steps to execute the script:
- Rename the
.env.sample
file to.env
;
mv .env.sample .env
- Fill in the information in the
.env
file with the necessary Outlook credentials and environment variables;
Example .env
file:
EMAIL_ADDRESS=johndoe@outlook.com
EMAIL_PASSWORD=mypassword
IMAP_SERVER=outlook.office365.com
IMAP_SSL=True
UNSEEN=True
FOLDERS_EMAIL=Inbox,Sent Items
KEYWORDS_FILE=keywords.txt
- Create a
keywords.txt
file in the root of the project with the keywords to be searched for in the email links;
Example keywords.txt
file:
unsubscribe
opt-out
cancel subscription
unsubscribe from this list
update your preferences
-
Install the project dependencies using the command
pip install -r requirements.txt
in the terminal; -
Save the content below in a file called
script.py
import imapclient
import email
import os
from dotenv import load_dotenv
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from email.header import decode_header
# carrega as variáveis de ambiente do arquivo .env
load_dotenv()
# Insira suas credenciais do Outlook aqui
email_address = os.getenv('EMAIL_ADDRESS')
email_password = os.getenv('EMAIL_PASSWORD')
# Leia as palavras-chave de um arquivo de texto
with open('keywords.txt', 'r', encoding='utf-8') as file:
keywords = [line.strip().lower() for line in file]
unseen_only = os.getenv('UNSEEN')
if unseen_only is None or unseen_only.lower() == 'true':
search_criteria = ['UNSEEN']
else:
search_criteria = ['ALL']
# Conectar à caixa de entrada do Outlook
imap_server = os.getenv('IMAP_SERVER')
imap_ssl = os.getenv('IMAP_SSL')
client = imapclient.IMAPClient(imap_server, ssl=imap_ssl)
client.login(email_address, email_password)
folders = os.getenv('FOLDERS_EMAIL').split(',')
links_info = []
for folder in folders:
# Conectar à pasta atual
client.select_folder(folder, readonly=True)
# Buscar e-mails não lidos
messages = client.search(search_criteria)
# Iterar pelos e-mails
for msg_id in messages:
msg_data = client.fetch(msg_id, ['RFC822'])
msg = email.message_from_bytes(msg_data[msg_id][b'RFC822'])
# Verificar se o e-mail é HTML
if msg.is_multipart():
for part in msg.walk():
if part.get_content_type() == 'text/html':
html_content = part.get_payload(decode=True)
break
else:
if msg.get_content_type() == 'text/html':
html_content = msg.get_payload(decode=True)
else:
continue
# Extrair links usando BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
for link in soup.find_all('a', href=True):
href = link['href']
text = link.text.lower()
# Verificar se alguma das palavras-chave está presente no texto do link
if any(keyword in text for keyword in keywords):
# Evitar links repetidos
if href not in [info['link'] for info in links_info]:
# Decodificar o campo "From"
decoded_from = decode_header(msg['From'])
from_email = ''.join([str(part, encoding or 'utf-8') if isinstance(part, bytes) else part for part, encoding in decoded_from])
# Decodificar o campo "Subject"
decoded_subject = decode_header(msg['Subject'])
#subject = ''.join([str(part, encoding or 'utf-8') if isinstance(part, bytes) else part for part, encoding in decoded_subject])
subject = ''.join([str(part, encoding or 'utf-8') if isinstance(part, bytes) else part for part, encoding in decoded_subject if encoding != 'unknown-8bit'])
# Armazenar informações em um dicionário
link_info = {'from': from_email, 'subject': subject, 'link': href}
links_info.append(link_info)
# print(f'Link encontrado: {href}')
# Encerrar a conexão com o servidor de e-mail
client.logout()
# Gerar conteúdo HTML
template_path = os.path.join(os.path.dirname(__file__), 'templates', 'template.html')
with open(template_path, 'r') as f:
html_template = f.read()
# Gerar linhas da tabela
table_rows = ''
for info in links_info:
row = f'<tr><td>{info["from"]}</td><td>{info["subject"]}</td><td><a href="{info["link"]}">Clique aqui para descadastrar</a></td></tr>'
table_rows += row
# Combinar modelo HTML e linhas da tabela
html_content = html_template.format(table_rows=table_rows)
# Salvar conteúdo HTML em um arquivo
with open('unsubscribe_links.html', 'w', encoding='utf-8') as file:
file.write(html_content)
print('Arquivo HTML gerado: unsubscribe_links.html')
-
Run the script using the command
python unsubscribe_links.py
in the terminal; -
Check if the
unsubscribe_links.html
file was successfully generated in the root of the project.