User Guide¶

This user guide is intended to walk you through many common tasks that you might want to accomplish using JF. The guide is arranged by topic:

Getting started
List of transformations
Modules
Input and output types
Importing your custom modules
Using JF as a library

Basic usage¶

Filter selected fields

$ cat samples.jsonl | jf '{id: x.id, subject: x.fields.subject}'
{"id": "87086895", "subject": "Swedish children stories"}
{"id": "87114792", "subject": "New Finnish storybooks"}
...

Filter selected items

$ cat samples.jsonl | jf '{id: x.id, subject: x.fields.subject},
        (x.id == "87114792")'
{"id": "87114792", "subject": "New Finnish storybooks"}

For more examples, please see Examples.

How does it work¶

JF works by converting json or yaml data structure through a map/filter-pipeline. The pipeline is compiled from a string representing a comma-separated list filters and mappers. The query parser assumes that each function of the pipeline reads items from a generator. The generator is given as the last non-keyword parameter to the function, so “map(conversion)” is interpreted as “map(conversion, input_generator)”. The result from a previous function is given as the input generator for the next function in the pipeline. The pipeline conversion is shown below as pseudocode:

def build_pipeline(input, conversions):
    pipeline = input
    for convert in conversions:
        pipeline = convert(pipeline)
    return pipeline

The pipeline generated by the previous function is then iterated and printed to the user. The basic building blocks of a pipeline are

{val} = map item to new object
(condition) = filter to show only items matching condition
{val, …} = update item values
del x.key = delete ‘key’ from each item

Some built-in functions headers have been remodeled to be more intuitive with the framework. Most noticeable is the sorted-function, which normally has the key defined as a keyword argument. This was done since it seems more logical to sort items by id by writing “sorted(x.id)” than “sorted(key=lambda x: x.id)”. Similar changes are done for some other useful functions:

islice(stop) => islice(arr, start=0, stop, step=1)
islice(start, stop, step=1) => islice(arr, start, stop, step)
first(N=1) => islice(N)
last(N=1) => iter(deque(arr, maxlen=N))
yield_from(x) => yield items from x
group_by(key) => group items by data key value
chain() => combine items into a list

For datetime processing, two useful helper functions are imported by default:

date(string) for parsing string into a python datetime-object
age(string) for calculating timedelta between now() and date(string)

These are useful for sorting or filtering items based on timestamps. Some of these functions have aliases predefined, such as head(), tail().