User Guide¶
This user guide is intended to walk you through many common tasks that you might want to accomplish using JF. The guide is arranged by topic:
- Getting started
- List of transformations
- Modules
- Input and output types
- Importing your custom modules
- Using JF as a library
Basic usage¶
Filter selected fields
$ cat samples.jsonl | jf '{id: x.id, subject: x.fields.subject}'
{"id": "87086895", "subject": "Swedish children stories"}
{"id": "87114792", "subject": "New Finnish storybooks"}
...
Filter selected items
$ cat samples.jsonl | jf '{id: x.id, subject: x.fields.subject},
(x.id == "87114792")'
{"id": "87114792", "subject": "New Finnish storybooks"}
For more examples, please see Examples.
How does it work¶
JF works by converting json or yaml data structure through a map/filter-pipeline. The pipeline is compiled from a string representing a comma-separated list filters and mappers. The query parser assumes that each function of the pipeline reads items from a generator. The generator is given as the last non-keyword parameter to the function, so “map(conversion)” is interpreted as “map(conversion, input_generator)”. The result from a previous function is given as the input generator for the next function in the pipeline. The pipeline conversion is shown below as pseudocode:
def build_pipeline(input, conversions):
pipeline = input
for convert in conversions:
pipeline = convert(pipeline)
return pipeline
The pipeline generated by the previous function is then iterated and printed to the user. The basic building blocks of a pipeline are
- {val} = map item to new object
- (condition) = filter to show only items matching condition
- {val, …} = update item values
- del x.key = delete ‘key’ from each item
Some built-in functions headers have been remodeled to be more intuitive with the framework. Most noticeable is the sorted-function, which normally has the key defined as a keyword argument. This was done since it seems more logical to sort items by id by writing “sorted(x.id)” than “sorted(key=lambda x: x.id)”. Similar changes are done for some other useful functions:
- islice(stop) => islice(arr, start=0, stop, step=1)
- islice(start, stop, step=1) => islice(arr, start, stop, step)
- first(N=1) => islice(N)
- last(N=1) => iter(deque(arr, maxlen=N))
- yield_from(x) => yield items from x
- group_by(key) => group items by data key value
- chain() => combine items into a list
For datetime processing, two useful helper functions are imported by default:
- date(string) for parsing string into a python datetime-object
- age(string) for calculating timedelta between now() and date(string)
These are useful for sorting or filtering items based on timestamps. Some of these functions have aliases predefined, such as head(), tail().