User Guide

This user guide is intended to walk you through many common tasks that you might want to accomplish using JF. The guide is arranged by topic:

  • Getting started
  • List of transformations
  • Modules
  • Input and output types
  • Importing your custom modules
  • Using JF as a library

Basic usage

Filter selected fields

$ cat samples.jsonl | jf '{id: x.id, subject: x.fields.subject}'
{"id": "87086895", "subject": "Swedish children stories"}
{"id": "87114792", "subject": "New Finnish storybooks"}
...

Filter selected items

$ cat samples.jsonl | jf '{id: x.id, subject: x.fields.subject},
        (x.id == "87114792")'
{"id": "87114792", "subject": "New Finnish storybooks"}

For more examples, please see Examples.

How does it work

JF works by converting json or yaml data structure through a map/filter-pipeline. The pipeline is compiled from a string representing a comma-separated list filters and mappers. The query parser assumes that each function of the pipeline reads items from a generator. The generator is given as the last non-keyword parameter to the function, so “map(conversion)” is interpreted as “map(conversion, input_generator)”. The result from a previous function is given as the input generator for the next function in the pipeline. The pipeline conversion is shown below as pseudocode:

def build_pipeline(input, conversions):
    pipeline = input
    for convert in conversions:
        pipeline = convert(pipeline)
    return pipeline

The pipeline generated by the previous function is then iterated and printed to the user. The basic building blocks of a pipeline are

  • {val} = map item to new object
  • (condition) = filter to show only items matching condition
  • {val, …} = update item values
  • del x.key = delete ‘key’ from each item

Some built-in functions headers have been remodeled to be more intuitive with the framework. Most noticeable is the sorted-function, which normally has the key defined as a keyword argument. This was done since it seems more logical to sort items by id by writing “sorted(x.id)” than “sorted(key=lambda x: x.id)”. Similar changes are done for some other useful functions:

  • islice(stop) => islice(arr, start=0, stop, step=1)
  • islice(start, stop, step=1) => islice(arr, start, stop, step)
  • first(N=1) => islice(N)
  • last(N=1) => iter(deque(arr, maxlen=N))
  • yield_from(x) => yield items from x
  • group_by(key) => group items by data key value
  • chain() => combine items into a list

For datetime processing, two useful helper functions are imported by default:

  • date(string) for parsing string into a python datetime-object
  • age(string) for calculating timedelta between now() and date(string)

These are useful for sorting or filtering items based on timestamps. Some of these functions have aliases predefined, such as head(), tail().