Design¶
This is a quick prototype that turned out to be quite usable. The
design is minimal: some home-made ORM for the feed storage, crude
parallelism with the multiprocessing
module and a simple plugin
API using importlib
.
More information about known issues and limitations in the feed2exec manual page document.
Plugin system¶
Plugins are documented in the Plugins section. You can also refer to the Writing new plugins section if you wish to write a new plugin or extend an existing one.
The plugin system uses a simple importlib
based architecture
where plugin are simple Python modules loaded at runtime based on a
module path provided by the user. This pattern was inspired by a
StackOverflow discussion.
The following options were also considered:
- pluggy: used by py.test, tox and devpi
- yapsy
- PluginBase
- plugnplay
- click-plugins: relevant only to add new commands
- `PyPA plugin discovery`_
Those options were ultimately not used because they add an aditionnal
dependency and are more complicated than a simple import
. We also
did not need plugin listing or discovery, which greatly simplifies our
design.
There is some code duplication between different parts (e.g. the
feed2exec.plugins.output()
and feed2exec.plugins.filter()
plugin interfaces, the maildir
and mbox
plugins, etc), but
never more than twice.
Concurrent processing¶
The threading design may be a little clunky and is certainly less
tested, which is why it is disabled by default (use --parallel
to
use it). There are known deadlocks issues with high concurrency
scenarios (e.g. with catchup
enabled).
I had multiple design in minds: the current one
(multiprocessing.Pool
and pool.apply_async
) vs aiohttp
(on
the asyncio
branch) vs pool.map
(on the threadpoolmap
branch). The aiohttp
design was very hard to diagnose and debug,
which made me abandon the whole thing. After reading up on Curio
and Trio, I’m tempted to give async/await a try again, but that
would mean completely dropping 2.7 compatibility. The pool.map
design is just badly adapted, as it would load all the feed’s
datastructure in memory before processing them.
Test suite¶
The test suite is in feed2exec/tests
but also as doctest comments
in some functions imported from the ecdysis project. You can run
all the tests with pytest, using, for example:
pytest-3
This is also hooked into the setup.py
command, so this also works:
python3 setup.py test
Note that some tests will fail in Python 2, as the code is written and tested in Python3. Furthermore, the feed output is taken from an up to date (5.2.1) feedparser version, so the tests are marked as expected to fail for lower versions. You should, naturally, run tests before submitting patches.
The test suite also uses the betamax module to cache HTTP requests
locally so the test suite can run offline. If a new test requires
networking, you can simply add a new test doing requests with the
right fixture (feed2exec.tests.fixtures.betamax()
), and a new
recording will be added to the source tree. Note that you can also use
the normal betamax_session()
fixture provided upstream if you
are going to do standalone HTTP request (not going through the
feed2exec libraries). If a new test is added in an existing test,
you may need to configure recording (in
feed2exec/tests/conftest.py
) to new_episodes
:
config.default_cassette_options['record_mode'] = 'none'
We commit the recordings in git so the test suite actually runs
offline, so be careful about the content added there. Ideally, the
license of that content should be documented in debian/copyright
.
`vcr`_ was first used for tests since it was simpler and didn’t
require using a global requests.session.Session
object. But in
the end betamax seems better maintained and more flexible: it supports
pytest fixtures, for example, and multiple cassette storage (including
vcr backwards compatibility). Configuration is also easier, done in
feed2exec/tests/conftest.py
. Using a session also allows us to use
a custom user agent.
Comparison¶
feed2exec
is a fairly new and minimal program, so features you may
expect from another feed reader may not be present. I chose to write a
new program because, when I started, both existing alternatives were
in a questionable state: feed2imap was mostly abandoned and
rss2email’s maintainer was also unresponsive. Both were missing the
features I was looking for, which was to unify my feed parsers in a
single program: i needed something that could deliver mail, run
commands and send tweets. The latter isn’t done yet, but I am hoping
to complete this eventually.
The program may not be for everyone, however, so I made those comparison tables to clarify what feed2exec does compared to the alternatives.
General information:
Program | Version | Date | SLOC | Language |
---|---|---|---|---|
feed2exec | 0.5 | 2017 | 1417 | Python |
feed2imap | 1.2.5 | 2015 | 3249 | Ruby |
rss2email | 3.9 | 2014 | 1986 | Python |
- version: the version analysed
- date: the date of that release
- SLOC: Source Lines of Codes as counted by sloccount, only counting dominant language (e.g. excluding XML from test feeds)
- Language: primary programming language
Delivery options:
Program | Maildir | Mbox | IMAP | SMTP | sendmail | exec |
---|---|---|---|---|---|---|
feed2exec | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ |
feed2imap | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ |
rss2email | ✗ | ✗ | ✓ | ✓ | ✓ | ✗ |
- maildir: writing to Maildir folders. r2e has a pull request to implement maildir support, but it’s not merged at the time of writing
- IMAP: sending emails to IMAP servers
- SMTP: delivering emails over the SMTP protocol, with authentication
- sendmail: delivering local using the local MTA
- exec: run arbitrary comands to run on new entries. feed2imap has a
execurl
parameter to execute commands, but it receives an unparsed dump of the feed instead of individual entries. rss2email has a postprocess filter that is a Python plugin that can act on indiviual (or digest) messages which could possibly be extended to support arbitrary commands, but that is rather difficult to implement for normal users.
Features:
Program | Pause | OPML | Retry | Images | Filter | Reply | Digest |
---|---|---|---|---|---|---|---|
feed2exec | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ | ✗ |
feed2imap | ✗ | ✓ | ✓ | ✓ | ✓ | ✗ | ✗ |
rss2email | ✓ | ✓ | ✓ | ✗ | ✓ | ✓ | ✓ |
- pause: feed reading can be disabled temporarily by user. in feed2exec, this is implemented with the
pause
configuration setting. thecatchup
option can also be used to catchup with feed entries.- retry: tolerate temporary errors. For example,
feed2imap
will report errors only after 10 failures.- images: download images found in feed.
feed2imap
can download images and attach them to the email.- filter: if we can apply arbitrary filters to the feed output. feed2imap can apply filters to the unparsed dump of the feed.
- reply: if the generated email ‘from’ header is usable to make a reply.
rss2email
has ause-publisher-email
setting (off by default) for this, for example. feed2exec does this by default.- digest: possibility of sending a single email per run instead of one per entry
Note
feed2imap
supports only importing OPML feeds, exporting
is supported by a third-party plugin.