Design¶
This is a quick prototype that turned out to be quite usable. The
design is minimal: some home-made ORM for the feed storage, crude
parallelism with the multiprocessing
module and a simple plugin
API using importlib
.
The threading design, in particular, may be a little clunky and is
certainly less tested, which is why it is disabled by default (use
--parallel
to use it). I had multiple design in minds: the current
one (multiprocessing.Pool
and pool.apply_async
) vs aiohttp
(on the asyncio
branch) vs pool.map
(on the threadpoolmap
branch). The aiohttp
design was very hard to diagnose and debug,
which made me abandon the whole thing. After reading up on Curio
and Trio, I’m tempted to give async/await a try again, but that
would mean completely dropping 2.7 compatibility. The pool.map
design is just badly adapted, as it would load all the feed’s
datastructure in memory before processing them.
Comparison¶
feed2exec
is a fairly new and minimal program, so features you may
expect from another feed reader may not be present. I chose to write a
new program because, when I started, both existing alternatives were
in a questionable state: feed2imap was mostly abandoned and
rss2email’s maintainer was also unresponsive. Both were missing the
features I was looking for, which was to unify my feed parsers in a
single program: i needed something that could deliver mail, run
commands and send tweets. The latter isn’t done yet, but I am hoping
to complete this eventually.
The program may not be for everyone, however, so I made those comparison tables to clarify what feed2exec does compared to the alternatives.
General information:
Program | Version | Date | SLOC | Language |
---|---|---|---|---|
feed2exec | 0.2.3 | 2017 | 1231 | Python |
feed2imap | 2.5 | 2015 | 3348 | Ruby |
rss2email | 3.9 | 2014 | 1754 | Python |
- version: the version analysed
- date: the date of that release
- SLOC: Source Lines of Codes as counted by sloccount
- Language: primary programming language
Delivery options:
Program | Maildir | IMAP | SMTP | sendmail | exec |
---|---|---|---|---|---|
feed2exec | ✓ | ✗ | ✗ | ✗ | ✓ |
feed2imap | ✓ | ✓ | ✗ | ✗ | ✗ |
rss2email | ✗ | ✓ | ✓ | ✓ | ✗ |
- maildir: writing to Maildir folders. r2e has a pull request to implement maildir support, but it’s not merged at the time of writing
- IMAP: sending emails to IMAP servers
- SMTP: delivering emails over the SMTP protocol, with authentication
- sendmail: delivering local using the local MTA
- exec: run arbitrary comands to run on new entries. feed2imap has a
execurl
parameter to execute commands, but it receives an unparsed dump of the feed instead of individual entries
Features:
Program | Pause | OPML | Retry | Images | Filter | Reply | Digest |
---|---|---|---|---|---|---|---|
feed2exec | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ |
feed2imap | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
rss2email | ✓ | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ |
- pause: feed reading can be disabled temporarily by user. in feed2exec, this is implemented with the
pause
configuration setting. thecatchup
option can also be used to catchup with feed entries.- retry: tolerate temporary errors. For example,
feed2imap
will report errors only after 10 failures.- images: download images found in feed.
feed2imap
can download images and attach them to the email.- filter: if we can apply arbitrary filters to the feed output. feed2imap can apply filters to the unparsed dump of the feed.
- reply: if the generated email ‘from’ header is usable to make a reply.
rss2email
has ause-publisher-email
setting (off by default) for this, for example. feed2exec does this by default.- digest: possibility of sending a single email per run instead of one per entry
Note
feed2imap
supports only importing OPML feeds, exporting
is supported by a third-party plugin.
Known issues¶
This is an early prototype and may break in your setup, as the
feedparser
library isn’t as solid as I expected. In particular, I
had issues with feeds without dates and without guid.
Unit test coverage is incomplete, but still pretty decent, above 80%.
The exec
plugin itself is not well tested and may have serious
security issues.
API, commandline interface, configuration file syntax and database format can be changed at any moment.
The program is written mainly targeting Python 3.5 and should work in 3.6 but hasn’t been explicitly tested there. Tests fail on Python 2.7 and the maildir handler may specifically be vulnerable to header injections.
API documentation¶
This is the API documentation of the program. It should explain how to create new plugins and navigate the code.
Feeds module¶
This is the core modules that processes all feeds and takes care of the storage. It’s where most of the logic lies.
fast feed parser that offloads tasks to plugins and commands
-
feed2exec.feeds.
fetch
(url)[source]¶ fetch the given URL
exceptions should be handled by the caller
Todo: this should be moved to a plugin so it can be overridden, but so far I haven’t found a use case for this.
Parameters: url (str) – the URL to fetch Return bytes, tuple: the body of the URL and the modification timestamp
-
feed2exec.feeds.
parse
(body, feed, lock=None, force=False)[source]¶ parse the body of the feed
this calls the filter and output plugins and updates the cache with the found items.
Todo: this could be moved to a plugin, but then we’d need to take out the cache checking logic, which would remove most of the code here...
Parameters: Return dict: the parsed data
Main entry point¶
The main entry point of the program is in the
feed2exec.__main__
module. This is to make it possible to call
the program directly from the source code through the Python
interpreter with:
python -m feed2exec
All this code is here rather than in __init__.py
to avoid
requiring too many dependencies in the base module, which contains
useful metadata for setup.py
.
This uses the click
module to define the base command and
options.
fast feed parser that offloads tasks to plugins and commands
-
feed2exec.__main__.
main
()¶
Plugins¶
Plugin interface¶
In this context, a “plugin” is simply a Python module with a defined interface.
-
feed2exec.plugins.
output
(feed, item, lock=None)[source]¶ load and run the given plugin with the given arguments
an “output plugin” is a simple Python module with an
output
callable defined which will process arguments and should output them somewhere, for example by email or through another command. the plugin is called when a new item is found, unless cache is flushed or ignored.The “callable” can be a class, in which case only the constructor is called or a function. The
*args
and**kwargs
parameter SHOULD be used in the function definition for forward-compatibility (ie. to make sure new parameters added do not cause a regression).Plugins should also expect to be called in parallel and should use the provided
lock
(a multiprocessor.Lock object) to acquire and release locks around contentious resources.The following keywords are usually replaced in the arguments:
- %(link)s
- %(title)s
- %(description)s
- %(published)s
- %(updated)s
- %(guid)s
The full list of such parameters is determined by the :module:feedparser module.
Caution
None of those parameters are sanitized in any way other than what feedparser does, so plugins writing files, executing code or talking to the network should be careful to sanitize the input appropriately.
The feed and items are also passed to the plugin as keyword arguments.
Parameters: Return object: the loaded plugin
Echo¶
-
class
feed2exec.plugins.echo.
output
(*args, **kwargs)[source]¶ This plugin outputs, to standard output, the arguments it receives. It can be useful to test your configuration. It also creates a side effect for the test suite to determine if the plugin was called.
This plugin does a similar thing when acting as a filter.
Error¶
Exec¶
-
feed2exec.plugins.exec.
output
(command, *args, **kwargs)[source]¶ The exec plugin is the ultimate security disaster. It simply executes whatever you feed it without any sort of sanitization. It does avoid to call to the shell and executes the command directly, however. Feed contents are also somewhat sanitized by the feedparser module, see the Sanitization documentation for more information in that regard. That is limited to stripping out hostile HTML tags, however.
You should be careful when sending arbitrary parameters to other programs. Even if we do not use the shell to execute the program, an hostile feed could still inject commandline flags to change the program behavior without injecting shell commands themselves.
For example, if a program can write files with the
-o
option, a feed could set their title to-oevil
to overwrite theevil
file. The only way to workaround that issue is to carefully craft the commandline so that this cannot happen.Alternatively, writing a Python plugin is much safer as you can sanitize the arguments yourself.
Html2text¶
Maildir¶
-
feed2exec.plugins.maildir.
make_message
(feed, entry, to_addr=None, cls=<class 'email.message.Message'>)[source]¶ generate a message from the feed
Todo
figure out a way to render multi-element Atom feeds.
Todo
should be moved to utils?
-
class
feed2exec.plugins.maildir.
output
(to_addr=None, feed=None, entry=None, lock=None, *args, **kwargs)[source]¶ The maildir plugin will save a feed item into a Maildir folder.
The configuration is a little clunky, but it should be safe against hostile feeds.
Parameters: - to_addr (str) – the email to use as “to” (defaults to USER@localdomain)
- feed (dict) – the feed
- item (dict) – the updated item
Utilities¶
Those are various utilities reused in multiple modules that did not fit anywhere else.
various reusable utilities
-
feed2exec.utils.
slug
(text)[source]¶ Make a URL-safe, human-readable version of the given text
This will do the following:
- decode unicode characters into ASCII
- shift everything to lowercase
- strip whitespace
- replace other non-word characters with dashes
- strip extra dashes
This somewhat duplicates the
Google.slugify()
function but slugify is not as generic as this one, which can be reused elsewhere.>>> slug('test') 'test' >>> slug('Mørdag') 'mordag' >>> slug("l'été c'est fait pour jouer") 'l-ete-c-est-fait-pour-jouer' >>> slug(u"çafe au lait (boisson)") 'cafe-au-lait-boisson' >>> slug(u"Multiple spaces -- and symbols! -- merged") 'multiple-spaces-and-symbols-merged'
This is a simpler, one-liner version of the slugify module.