API documentation

This is the API documentation of the program. It should explain how to create new plugins and navigate the code.

Feeds module

This is the core modules that processes all feeds and takes care of the storage. It’s where most of the logic lies.

fast feed parser that offloads tasks to plugins and commands

feed2exec.feeds.default_config_dir()[source]

the default configuration directory

this is conforming to the XDG base directory specification

..todo:: this more or less conforms: the feed database is also
stored in this directory, whereas the database may be better stored in XDG_CACHE_HOME or XDG_RUNTIME_DIR.
feed2exec.feeds.fetch(url)[source]

fetch the given URL

this is a simple wrapper around the requests module.

exceptions should be handled by the caller.

Todo:this could be moved to a plugin so it can be overridden, but so far I haven’t found a use case for this.
Parameters:url (str) – the URL to fetch
Return bytes:the body of the URL
feed2exec.feeds.normalize_item(feed=None, item=None)[source]

normalize feeds a little more than what feedparser provides.

we do the following operation:

  1. add more defaults to item dates (issue #113):
    • created_parsed of the item
    • updated_parsed of the feed
  2. missing GUID in some feeds (issue #112)
  3. link normalization fails on some feeds, particilarly GitHub, where feeds are /foo instead of https://github.com/foo. unreported for now.
feed2exec.feeds.parse(body, feed, lock=None, force=False)[source]

parse the body of the feed

this parses the given body using feedparser and calls the plugins configured in the feed (using feed2exec.plugins.output() and feed2exec.plugins.filter()). updates the cache with the found items if the output plugin succeeds (returns True) and if the filter plugin doesn’t set the skip element in the feed item.

Todo:

this could be moved to a plugin, but then we’d need to take out the cache checking logic, which would remove most of the code here...

Parameters:
  • body (bytes) – the body of the feed, as returned by :func:fetch
  • feed (dict) – a feed object used to pass to plugins and debugging
  • lock (object) – a multiprocessing.Lock object previously initialized. if None, the global LOCK variable will be used: this is used in the test suite to avoid having to pass locks all the way through the API. this lock is in turn passed to plugin calls.
  • force (bool) – force plugin execution even if entry was already seen. passed to feed2exec.feeds.parse as is
Return dict:

the parsed data

feed2exec.feeds.fetch_feeds(pattern=None, parallel=False, force=False, catchup=False)[source]

main entry point for the feed fetch routines.

this iterates through all feeds configured in the feed2exec.feeds.FeedStorage that match the given pattern.

This will call logging.warning() for exceptions requests.exceptions.Timeout and requests.exceptions.ConnectionError as they are transient errors and the user may want to ignore those.

Other exceptions raised from requests.exceptions (like TooManyRedirects or HTTPError but basically any other exception) may be a configuration error or a more permanent failure so will be signaled with logging.error().

Parameters:
feed2exec.feeds.opml_import(opmlfile, storage)[source]

import a file stream as an OPML feed in the given config storage

class feed2exec.feeds.ConfFeedStorage(pattern=None)[source]

Feed configuration stored in a config file.

This derives from configparser.RawConfigParser and uses the .ini file set in the path member to read and write settings.

Changes are committed immediately, and no locking is performed so loading here should be safe but not editing.

The particular thing about this configuration is that there is an iterator that will yield entries matching the pattern substring provided in the constructor.

path = '~/.config/feed2exec/feed2exec.ini'

default ConfFeedStorage path

add(name, url, output=None, args=None, filter=None, filter_args=None, folder=None, mailbox=None)[source]

add the designated feed to the configuration

this is not thread-safe.

set(section, option, value=None)[source]

override parent to make sure we immediately write changes

not thread-safe

remove_option(section, option)[source]

override parent to make sure we immediately write changes

not thread-safe

remove(name)[source]

convenient alias for configparser.RawConfigParser.remove_section()

not thread-safe

commit()[source]

write the feed configuration

see configparser.RawConfigParser.write()

feed2exec.feeds.FeedStorage

Feed storage used.

An alias to feed2exec.feeds.ConfFeedStorage, but can be overridden by plugins

alias of ConfFeedStorage

Main entry point

The main entry point of the program is in the feed2exec.__main__ module. This is to make it possible to call the program directly from the source code through the Python interpreter with:

python -m feed2exec

All this code is here rather than in __init__.py to avoid requiring too many dependencies in the base module, which contains useful metadata for setup.py.

This uses the click module to define the base command and options.

fast feed parser that offloads tasks to plugins and commands

Plugins

Plugin interface

In this context, a “plugin” is simply a Python module with a defined interface.

feed2exec.plugins.output(feed, item, lock=None)[source]

load and run the given plugin with the given arguments

an “output plugin” is a simple Python module with an output callable defined which will process arguments and should output them somewhere, for example by email or through another command. the plugin is called (from feed2exec.feeds.parse()) when a new item is found, unless cache is flushed or ignored.

The “callable” can be a class, in which case only the constructor is called or a function. The *args and **kwargs parameter SHOULD be used in the function definition for forward-compatibility (ie. to make sure new parameters added do not cause a regression).

Plugins should also expect to be called in parallel and should use the provided lock (a multiprocessor.Lock object) to acquire and release locks around contentious resources.

The following keywords are usually replaced in the arguments:

  • {item.link}
  • {item.title}
  • {item.description}
  • {item.published}
  • {item.updated}
  • {item.guid}

The full list of such parameters is determined by the :module:feedparser module.

Similarly, feed parameters from the configuration file are accessible.

Caution

None of those parameters are sanitized in any way other than what feedparser does, so plugins writing files, executing code or talking to the network should be careful to sanitize the input appropriately.

The feed and items are also passed to the plugin as keyword arguments.

Parameters:
  • feed (dict) – the feed metadata
  • item (dict) – the updated item
Return object:

the loaded plugin

feed2exec.plugins.filter(feed, item, lock=None)[source]

call filter plugins.

very similar to the output plugin, but just calls the filter module member instead of output

Todo

common code with output() should be factored out, but output() takes arguments...

Note

actual plugins are documented in the Plugins document.

Utilities

Those are various utilities reused in multiple modules that did not fit anywhere else.

various reusable utilities

feed2exec.utils.slug(text)[source]

Make a URL-safe, human-readable version of the given text

This will do the following:

  1. decode unicode characters into ASCII
  2. shift everything to lowercase
  3. strip whitespace
  4. replace other non-word characters with dashes
  5. strip extra dashes

This somewhat duplicates the Google.slugify() function but slugify is not as generic as this one, which can be reused elsewhere.

>>> slug('test')
'test'
>>> slug('Mørdag')
'mordag'
>>> slug("l'été c'est fait pour jouer")
'l-ete-c-est-fait-pour-jouer'
>>> slug(u"çafe au lait (boisson)")
'cafe-au-lait-boisson'
>>> slug(u"Multiple  spaces -- and symbols! -- merged")
'multiple-spaces-and-symbols-merged'

This is a simpler, one-liner version of the slugify module.

taken from ecdysis

feed2exec.utils.make_dirs_helper(path)[source]

Create the directory if it does not exist

Return True if the directory was created, false if it was already present, throw an OSError exception if it cannot be created

>>> import tempfile
>>> import os
>>> import os.path as p
>>> d = tempfile.mkdtemp()
>>> make_dirs_helper(p.join(d, 'foo'))
True
>>> make_dirs_helper(p.join(d, 'foo'))
False
>>> make_dirs_helper(p.join('/dev/null', 'foo')) 
Traceback (most recent call last):
    ...
NotADirectoryError: [Errno 20] Not a directory: ...
>>> os.rmdir(p.join(d, 'foo'))
>>> os.rmdir(d)
>>>
feed2exec.utils.find_test_file(name)[source]

need to be updated from ecdysis

feed2exec.utils.find_parent_module()[source]

find the name of a the first module calling this module

if we cannot find it, we return the current module’s name (__name__) instead.

taken from ecdysis