Design

This is a quick prototype that turned out to be quite usable. The design is minimal: some home-made ORM for the feed storage, crude parallelism with the multiprocessing module and a simple plugin API using importlib.

The threading design, in particular, may be a little clunky and is certainly less tested, which is why it is disabled by default (use --parallel to use it). I had multiple design in minds: the current one (multiprocessing.Pool and pool.apply_async) vs aiohttp (on the asyncio branch) vs pool.map (on the threadpoolmap branch). The aiohttp design was very hard to diagnose and debug, which made me abandon the whole thing. After reading up on Curio and Trio, I’m tempted to give async/await a try again, but that would mean completely dropping 2.7 compatibility. The pool.map design is just badly adapted, as it would load all the feed’s datastructure in memory before processing them.

Comparison

feed2exec is a fairly new and minimal program, so features you may expect from another feed reader may not be present. I chose to write a new program because, when I started, both existing alternatives were in a questionable state: feed2imap was mostly abandoned and rss2email’s maintainer was also unresponsive. Both were missing the features I was looking for, which was to unify my feed parsers in a single program: i needed something that could deliver mail, run commands and send tweets. The latter isn’t done yet, but I am hoping to complete this eventually.

The program may not be for everyone, however, so I made those comparison tables to clarify what feed2exec does compared to the alternatives.

General information:

Program Version Date SLOC Language
feed2exec 0.2.3 2017 1231 Python
feed2imap 2.5 2015 3348 Ruby
rss2email 3.9 2014 1754 Python
  • version: the version analysed
  • date: the date of that release
  • SLOC: Source Lines of Codes as counted by sloccount
  • Language: primary programming language

Delivery options:

Program Maildir IMAP SMTP sendmail exec
feed2exec
feed2imap
rss2email
  • maildir: writing to Maildir folders. r2e has a pull request to implement maildir support, but it’s not merged at the time of writing
  • IMAP: sending emails to IMAP servers
  • SMTP: delivering emails over the SMTP protocol, with authentication
  • sendmail: delivering local using the local MTA
  • exec: run arbitrary comands to run on new entries. feed2imap has a execurl parameter to execute commands, but it receives an unparsed dump of the feed instead of individual entries

Features:

Program Pause OPML Retry Images Filter Reply Digest
feed2exec
feed2imap
rss2email
  • pause: feed reading can be disabled temporarily by user. in feed2exec, this is implemented with the pause configuration setting. the catchup option can also be used to catchup with feed entries.
  • retry: tolerate temporary errors. For example, feed2imap will report errors only after 10 failures.
  • images: download images found in feed. feed2imap can download images and attach them to the email.
  • filter: if we can apply arbitrary filters to the feed output. feed2imap can apply filters to the unparsed dump of the feed.
  • reply: if the generated email ‘from’ header is usable to make a reply. rss2email has a use-publisher-email setting (off by default) for this, for example. feed2exec does this by default.
  • digest: possibility of sending a single email per run instead of one per entry

Note

feed2imap supports only importing OPML feeds, exporting is supported by a third-party plugin.

Known issues

This is an early prototype and may break in your setup, as the feedparser library isn’t as solid as I expected. In particular, I had issues with feeds without dates and without guid.

Unit test coverage is incomplete, but still pretty decent, above 80%.

The exec plugin itself is not well tested and may have serious security issues.

API, commandline interface, configuration file syntax and database format can be changed at any moment.

The program is written mainly targeting Python 3.5 and should work in 3.6 but hasn’t been explicitly tested there. Tests fail on Python 2.7 and the maildir handler may specifically be vulnerable to header injections.

API documentation

This is the API documentation of the program. It should explain how to create new plugins and navigate the code.

Feeds module

This is the core modules that processes all feeds and takes care of the storage. It’s where most of the logic lies.

fast feed parser that offloads tasks to plugins and commands

feed2exec.feeds.fetch(url)[source]

fetch the given URL

exceptions should be handled by the caller

Todo:this should be moved to a plugin so it can be overridden,

but so far I haven’t found a use case for this.

Parameters:url (str) – the URL to fetch
Return bytes, tuple:
 the body of the URL and the modification timestamp
feed2exec.feeds.parse(body, feed, lock=None, force=False)[source]

parse the body of the feed

this calls the filter and output plugins and updates the cache with the found items.

Todo:this could be moved to a plugin, but then we’d need to take

out the cache checking logic, which would remove most of the code here...

Parameters:
  • body (bytes) – the body of the feed, as returned by :func:fetch
  • feed (dict) – a feed object used to pass to plugins and debugging
Return dict:

the parsed data

feed2exec.feeds.safe_serial(obj)[source]

JSON serializer for objects not serializable by default json code

Main entry point

The main entry point of the program is in the feed2exec.__main__ module. This is to make it possible to call the program directly from the source code through the Python interpreter with:

python -m feed2exec

All this code is here rather than in __init__.py to avoid requiring too many dependencies in the base module, which contains useful metadata for setup.py.

This uses the click module to define the base command and options.

fast feed parser that offloads tasks to plugins and commands

feed2exec.__main__.main()

Plugins

Plugin interface

In this context, a “plugin” is simply a Python module with a defined interface.

feed2exec.plugins.output(feed, item, lock=None)[source]

load and run the given plugin with the given arguments

an “output plugin” is a simple Python module with an output callable defined which will process arguments and should output them somewhere, for example by email or through another command. the plugin is called when a new item is found, unless cache is flushed or ignored.

The “callable” can be a class, in which case only the constructor is called or a function. The *args and **kwargs parameter SHOULD be used in the function definition for forward-compatibility (ie. to make sure new parameters added do not cause a regression).

Plugins should also expect to be called in parallel and should use the provided lock (a multiprocessor.Lock object) to acquire and release locks around contentious resources.

The following keywords are usually replaced in the arguments:

  • %(link)s
  • %(title)s
  • %(description)s
  • %(published)s
  • %(updated)s
  • %(guid)s

The full list of such parameters is determined by the :module:feedparser module.

Caution

None of those parameters are sanitized in any way other than what feedparser does, so plugins writing files, executing code or talking to the network should be careful to sanitize the input appropriately.

The feed and items are also passed to the plugin as keyword arguments.

Parameters:
  • feed (dict) – the feed metadata
  • item (dict) – the updated item
Return object:

the loaded plugin

feed2exec.plugins.filter(feed, item, lock=None)[source]

common code with output() should be factored out, but output() takes arguments...

Echo

class feed2exec.plugins.echo.output(*args, **kwargs)[source]

This plugin outputs, to standard output, the arguments it receives. It can be useful to test your configuration. It also creates a side effect for the test suite to determine if the plugin was called.

This plugin does a similar thing when acting as a filter.

feed2exec.plugins.echo.filter

This filter just keeps the feed unmodified. It is just there for testing purposes.

alias of output

Error

feed2exec.plugins.error.output(*args, **kwargs)[source]

The error plugin is a simple plugin which raises an exception when called. It is designed for use in the test suite and should generally not be used elsewhere.

Exec

feed2exec.plugins.exec.output(command, *args, **kwargs)[source]

The exec plugin is the ultimate security disaster. It simply executes whatever you feed it without any sort of sanitization. It does avoid to call to the shell and executes the command directly, however. Feed contents are also somewhat sanitized by the feedparser module, see the Sanitization documentation for more information in that regard. That is limited to stripping out hostile HTML tags, however.

You should be careful when sending arbitrary parameters to other programs. Even if we do not use the shell to execute the program, an hostile feed could still inject commandline flags to change the program behavior without injecting shell commands themselves.

For example, if a program can write files with the -o option, a feed could set their title to -oevil to overwrite the evil file. The only way to workaround that issue is to carefully craft the commandline so that this cannot happen.

Alternatively, writing a Python plugin is much safer as you can sanitize the arguments yourself.

Html2text

class feed2exec.plugins.html2text.filter(*args, feed=None, entry=None, **kwargs)[source]

This filter plugin takes a given feed item and replaces the content with its HTML parsed as text.

static parse(html=None)[source]

parse html to text according to our preferences. this is where subclasses can override the HTML2Text settings or use a completely different parser

Maildir

feed2exec.plugins.maildir.make_message(feed, entry, to_addr=None, cls=<class 'email.message.Message'>)[source]

generate a message from the feed

Todo

figure out a way to render multi-element Atom feeds.

Todo

should be moved to utils?

class feed2exec.plugins.maildir.output(to_addr=None, feed=None, entry=None, lock=None, *args, **kwargs)[source]

The maildir plugin will save a feed item into a Maildir folder.

The configuration is a little clunky, but it should be safe against hostile feeds.

Parameters:
  • to_addr (str) – the email to use as “to” (defaults to USER@localdomain)
  • feed (dict) – the feed
  • item (dict) – the updated item

Null

feed2exec.plugins.null.output(*args, **kwargs)[source]

This plugin does nothing. It can be useful in cases where you want to catchup with imported feeds.

feed2exec.plugins.null.filter(entry=None, *args, **kwargs)[source]

The null filter removes all elements from a feed item

Utilities

Those are various utilities reused in multiple modules that did not fit anywhere else.

various reusable utilities

feed2exec.utils.slug(text)[source]

Make a URL-safe, human-readable version of the given text

This will do the following:

  1. decode unicode characters into ASCII
  2. shift everything to lowercase
  3. strip whitespace
  4. replace other non-word characters with dashes
  5. strip extra dashes

This somewhat duplicates the Google.slugify() function but slugify is not as generic as this one, which can be reused elsewhere.

>>> slug('test')
'test'
>>> slug('Mørdag')
'mordag'
>>> slug("l'été c'est fait pour jouer")
'l-ete-c-est-fait-pour-jouer'
>>> slug(u"çafe au lait (boisson)")
'cafe-au-lait-boisson'
>>> slug(u"Multiple  spaces -- and symbols! -- merged")
'multiple-spaces-and-symbols-merged'

This is a simpler, one-liner version of the slugify module.

feed2exec.utils.make_dirs_helper(path)[source]

Create the directory if it does not exist

Return True if the directory was created, false if it was already present, throw an OSError exception if it cannot be created