pollux documentation¶

This site provides documentation for three related projects:

pollux: a web application for doing text analysis.
corpkit, a Python backend for pollux
pollux-cl, a command-line natural language interpreter

With pollux, you can create parsed, structured and metadata-annotated corpora, and then search them for complex lexicogrammatical patterns. Search results can be quickly edited, sorted and visualised, saved and loaded within projects, or exported to formats that can be handled by other tools. In fact, you can easily work with any dataset in CONLL U format, including the freely available, multilingual Universal Dependencies Treebanks.

Concordancing is extended to allow the user to query and display grammatical features alongside tokens. Keywording can be restricted to certain word classes or positions within the clause. If your corpus contains multiple documents or subcorpora, you can identify keywords in each, compared to the corpus as a whole.

Installation

Via pip:

$ pip install pollux

via Git:

$ git clone https://www.github.com/interrogator/pollux
$ cd pollux
$ python setup.py install

Parsing and interrogation of parse trees will also require Stanford CoreNLP. pollux can download and install it for you automatically.

Running the app

After installation, pollux can be started from the command line with:

# load sample project
$ pollux-quickstart

You can parse your own corpus from within the web app, or via the command line:

# parse
$ pollux-parse path/to/corpus
$ mkdir ~/corpora
# add to database
$ cp -R path/to/corpus-parsed ~/corpora
$ pollux-build
# open the tool
$ pollux

pollux-cl is a bit like the Corpus Workbench. You can open it with:

$ pollux-cl
# or, alternatively:
$ python -m pollux.cl

And then start working with natural language commands:

> set junglebook as corpus
> parse junglebook with outname as jb
> set jb as corpus
> search corpus for deps matching "f/nsubj/ <- f/ROOT/"
> calculate result as percentage of self
> plot result as line chart with title as 'Example figure'

From the interpreter, you can enter ipython, jupyter notebook or gui to switch between interfaces, preserving the local namespace and data where possible.

Information about the syntax is available at the Overview.

Web app

API

Interpreter

API reference

Corpus classes
- Corpus
- File
- LoadedCorpus
- Results