API documentation¶
This is the API documentation of the program. It should explain how to create new plugins and navigate the code.
Controller module¶
This is the core modules that processes all feeds and talks to the storage. It’s where most of the logic lies, althought the parsing is still currently done inside the model. It dispatches the plugin logic to the plugin module.
fast feed parser that offloads tasks to plugins and commands
-
class
feed2exec.controller.
FeedManager
(conf_path, db_path, pattern=None, session=None)[source]¶ a feed manager fetches and stores feeds.
this is a “controller” in a “model-view-controller” pattern. it derives the “model” (
feed2exec.model.FeedConfStorage
) for simplicity’s sake, and there is no real “view” (except maybe __main__).on intialization, a new
requests.Session
object is created to be used across all requests. it is passed to plugins during dispatch as a session parameter so it can be reused.-
sessionConfig
(cache=True)[source]¶ our custom session configuration
we change the user agent and set the file:// hanlder. extra configuration may be performed in the future and will override your changes.
this can be used to configure sessions used externally, for example by plugins.
This can also be used to disable the cache.
-
session
¶ the session property
-
fetch
(parallel=False, force=False, catchup=False)[source]¶ main entry point for the feed fetch routines.
this iterates through all feeds configured in the linked
feed2exec.model.FeedConfStorage
that match the givenpattern
, fetches the feeds and dispatches the parsing, which in turn dispatches the plugins.Parameters: - parallel (bool) – parse feeds in parallel, using
multiprocessing
- force (bool) – force plugin execution even if entry was
already seen. passed to
feed2exec.feeds.parse
as is - catchup (bool) – set the catchup flag on the feed, so that output plugins can avoid doing any permanent changes.
- parallel (bool) – parse feeds in parallel, using
-
fetch_one
(feed)[source]¶ fetch the feed content and return the body, in binary
This will call
logging.warning()
for exceptionsrequests.exceptions.Timeout
andrequests.exceptions.ConnectionError
as they are transient errors and the user may want to ignore those.Other exceptions raised from
requests.exceptions
(like TooManyRedirects or HTTPError but basically any other exception) may be a configuration error or a more permanent failure so will be signaled withlogging.error()
.this will return the body on success or None on failure and cached entries
-
dispatch
(feed, data, lock=None, force=False)[source]¶ process parsed entries and execute plugins
This handles locking, caching, and filter and output plugins.
This calls the plugins configured in the
feed
(usingfeed2exec.plugins.output()
andfeed2exec.plugins.filter()
). It also updates the cache with the found items if theoutput
plugin succeeds (returns True) and if thefilter
plugin doesn’t set theskip
element in the feed item.Parameters: - lock (object) – a
multiprocessing.Lock
object previously initialized. if None, the global LOCK variable will be used: this is used in the test suite to avoid having to pass locks all the way through the API. this lock is in turn passed to plugin calls. - force (bool) – force plugin execution even if entry was
already seen. passed to
feed2exec.feeds.parse
as is
- lock (object) – a
-
Model¶
The “model” keeps track of feeds and their items. It handles configuration and cache storage.
data structures and storage for feed2exec
-
class
feed2exec.model.
Feed
(name, *args, **kwargs)[source]¶ basic data structure representing a RSS or Atom feed.
it derives from the base
feedparser.FeedParserDict
but forces the element to have aname
, which is the unique name for that feed in thefeed2exec.controller.FeedManager
. We also add convenience functions to parse (in parallel) and normalize feed items.For all intents and purposes, this can be considered like a dict() unless otherwise noted.
-
get
(key, default=None)[source]¶ override upstream getter
in my own configuration, the maildir output plugin is default, yet it doesn’t have a maildir or folder defined. somehow in there, the getter here ends up returning None for those, because those keys do not exist.
that’s not exactly what we expect here: what we want is to return the default in those cases.
-
normalize
(item=None)[source]¶ normalize feeds a little more than what feedparser provides.
we do the following operation:
- add more defaults to item dates (issue #113)
- missing GUID in some feeds (issue #112)
- link normalization fails on some feeds, particilarly GitHub, where feeds are /foo instead of https://github.com/foo. unreported for now.
-
-
class
feed2exec.model.
FeedConfStorage
(path, pattern=None)[source]¶ Feed configuration stored in a config file.
This derives from
configparser.RawConfigParser
and uses the.ini
file set in thepath
member to read and write settings.Changes are committed immediately, and no locking is performed so loading here should be safe but not editing.
The particular thing about this configuration is that there is an iterator that will yield entries matching the
pattern
substring provided in the constructor.-
add
(name, url, output=None, args=None, filter=None, filter_args=None, folder=None, mailbox=None)[source]¶ add the designated feed to the configuration
this is not thread-safe.
-
set
(section, option, value=None)[source]¶ override parent to make sure we immediately write changes
not thread-safe
-
remove_option
(section, option)[source]¶ override parent to make sure we immediately write changes
not thread-safe
-
Main entry point¶
The main entry point of the program is in the
feed2exec.__main__
module. This is to make it possible to call
the program directly from the source code through the Python
interpreter with:
python -m feed2exec
All this code is here rather than in __init__.py
to avoid
requiring too many dependencies in the base module, which contains
useful metadata for setup.py
.
This uses the click
module to define the base command and
options.
fast feed parser that offloads tasks to plugins and commands
Plugins¶
Plugin interface¶
In this context, a “plugin” is simply a Python module with a defined interface.
-
feed2exec.plugins.
output
(feed, item, lock=None, session=None)[source]¶ load and run the given plugin with the given arguments
an “output plugin” is a simple Python module with an
output
callable defined which will process arguments and should output them somewhere, for example by email or through another command. the plugin is called (fromfeed2exec.feeds.parse()
) when a new item is found, unless cache is flushed or ignored.The “callable” can be a class, in which case only the constructor is called or a function. The
*args
and**kwargs
parameter SHOULD be used in the function definition for forward-compatibility (ie. to make sure new parameters added do not cause a regression).Plugins should also expect to be called in parallel and should use the provided
lock
(a multiprocessor.Lock object) to acquire and release locks around contentious resources.Finally, the FeedManager will pass along his own
session
that should be reused by plugins to do requests. This allows plugins to be unit-tested and leverages the built-in cache as well.The following keywords are usually replaced in the arguments:
- {item.link}
- {item.title}
- {item.description}
- {item.published}
- {item.updated}
- {item.guid}
The full list of such parameters is determined by the :module:feedparser module.
Similarly, feed parameters from the configuration file are accessible.
Caution
None of those parameters are sanitized in any way other than what feedparser does, so plugins writing files, executing code or talking to the network should be careful to sanitize the input appropriately.
The feed and items are also passed to the plugin as keyword arguments. Plugins should especially respect the
catchup
argument that, when set, forbids plugins to do any permanent activity. For example, plugins MUST NOT run commands, write files, or make network requests. In general, “catchup mode” should be fast: it allows users to quickly catchup with new feeds without firing plugins, but it should also allow users to test configurations so plugins SHOULD give information to the user about what would have been done by the plugin withoutcatchup
.Parameters: Return object: the loaded plugin
Note
more information about plugin design is in the Writing new plugins document.
-
feed2exec.plugins.
filter
(feed, item, lock=None, session=None)[source]¶ call filter plugins.
very similar to the output plugin, but just calls the
filter
module member instead ofoutput
Todo
common code with output() should be factored out, but output() takes arguments…
-
feed2exec.plugins.
resolve
(plugin)[source]¶ resolve a short plugin name to a loadable plugin path
Some parts of feed2exec allow shorter plugin names. For example, on the commandline, users can pass maildir instead of feed2exec.plugins.maildir.
Plugin resolution works like this:
- search for the module in the feed2exec.plugins namespace
- if that fails, consider the module to be an absolute path
Note
actual plugins are documented in the Plugins document.
Utilities¶
Those are various utilities reused in multiple modules that did not fit anywhere else.
various reusable utilities
-
feed2exec.utils.
slug
(text)[source]¶ Make a URL-safe, human-readable version of the given text
This will do the following:
- decode unicode characters into ASCII
- shift everything to lowercase
- strip whitespace
- replace other non-word characters with dashes
- strip extra dashes
This somewhat duplicates the
Google.slugify()
function but slugify is not as generic as this one, which can be reused elsewhere.>>> slug('test') 'test' >>> slug('Mørdag') 'mordag' >>> slug("l'été c'est fait pour jouer") 'l-ete-c-est-fait-pour-jouer' >>> slug(u"çafe au lait (boisson)") 'cafe-au-lait-boisson' >>> slug(u"Multiple spaces -- and symbols! -- merged") 'multiple-spaces-and-symbols-merged'
This is a simpler, one-liner version of the slugify module.
taken from ecdysis
-
feed2exec.utils.
make_dirs_helper
(path)[source]¶ Create the directory if it does not exist
Return True if the directory was created, false if it was already present, throw an OSError exception if it cannot be created
>>> import tempfile >>> import os >>> import os.path as p >>> d = tempfile.mkdtemp() >>> make_dirs_helper(p.join(d, 'foo')) True >>> make_dirs_helper(p.join(d, 'foo')) False >>> make_dirs_helper('') False >>> make_dirs_helper(p.join('/dev/null', 'foo')) # doctest: +ELLIPSIS Traceback (most recent call last): ... NotADirectoryError: [Errno 20] Not a directory: ... >>> os.rmdir(p.join(d, 'foo')) >>> os.rmdir(d) >>>