jf package

Submodules

jf.input module

JF io library

jf.input.colorize_json_error(ex)

Colorize input data syntax error

jf.input.format_xml(parent)

Recursive operation which returns a tree formated as dicts and lists. Decision to add a list is to find the ‘List’ word in the actual parent tag.

>>> tree = etree.fromstring('<doc><a>1</a></doc>')
>>> format_xml(tree)
{'a': '1'}
jf.input.import_error()

Logging function for import errors

jf.input.read_file(fn, openhook=<function hook_compressed>, ordered_dict=False, **kwargs)

Function for converting input file to a data source

jf.input.read_input(args, openhook=<function hook_compressed>, ordered_dict=False, **kwargs)

Read json, jsonl and yaml data from file defined in args

jf.input.yield_json_and_json_lines(inp)

Yield json and json lines

jf.meta module

class jf.meta.JFTransformation(*args, fn=None, **kwargs)

Bases: object

Baseclass for JF transformations

fit(X, y=None)
transform(X, y=None, gen=False, **kwargs)
class jf.meta.Struct(**entries)

Bases: object

Class representation of dict

dict()

Convert item to dict

hide(dct)

Mark item attribute as hidden

update(dct)

Update item with key/values from a dict

class jf.meta.StructEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)

Bases: json.encoder.JSONEncoder

Try to convert everything to json

default(obj)

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)
jf.meta.to_struct(val)

Convert v to a class representing v

jf.meta.to_struct_gen(arr, ordered_dict=False)

Convert all items in arr to struct

jf.ml module

class jf.ml.ColumnSelector(column, default=['unk'])

Bases: object

fit(X, y=None)
transform(X, y=None)
class jf.ml.importResolver

Bases: object

class jf.ml.model_loader(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.ml.persistent_trainer(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.ml.persistent_transformation(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.ml.trainer(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.ml.transform(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

jf.output module

JF python json/yaml query engine

class jf.output.browser(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.output.csv(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.output.excel(*args, **kwargs)

Bases: jf.output.pandas_writer

Convert input to parquet

>>> list(excel("/tmp/test.xlsx").transform([{'a': 1}, {'a': 3}]))
['data written to /tmp/test.xlsx']
class jf.output.ipy(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.output.md(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.output.pandas_writer(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.output.parquet(*args, **kwargs)

Bases: jf.output.pandas_writer

Convert input to parquet

>>> list(parquet("/tmp/test.parq").transform([{'a': 1}, {'a': 3}]))
['data written to /tmp/test.parq']
jf.output.peek(data, count=100)

Slice and memoize data head

jf.output.print_results(data, args)

Print results

class jf.output.profile(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

jf.output.result_cleaner(val)

Cleanup the result

>>> result_cleaner({'a': 1})
{'a': 1}

jf.process module

JF python json/yaml query engine

class jf.process.Col(k=None)

Bases: object

Object representing a column

This object is used to define column selection operations. For example if you want to select the ‘id’ from your data, you would do it as follows:

>>> x = Col()
>>> x.id({"id": 235})
235
class jf.process.Filter(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Filter input data based on a column value

>>> x = Col()
>>> Filter(x.id > 100).transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}])
[{'id': 199, 'a': 2}]
class jf.process.First(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Show only the first (N) value(s)

>>> First().transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}])
[{'id': 99, 'a': 1}]
class jf.process.Firstnlast(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Show first and last (N) items

>>> Firstnlast(2).transform([1,2,3,4,5])
[[1, 2], [4, 5]]
class jf.process.Flatten(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Flatten array

Parameters:args – array to flatten
Returns:array of flattened items
>>> from pprint import pprint
>>> pprint(list(Flatten().transform([{'a': 1, 'b':{'c': 2}}])))
[{'a': 1, 'b.c': 2}]
class jf.process.FlattenItem(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Make item flat

Parameters:
  • it – item
  • root – root node
Returns:

flattened version of the item

>>> FlattenItem().transform("foo")
'foo'
>>> FlattenItem().transform({"a": 1})
{'a': 1}
>>> from pprint import pprint
>>> pprint(FlattenItem().transform({"a": 1, "b":{"c":2}}))
{'a': 1, 'b.c': 2}
>>> list(sorted(FlattenItem().transform({"a": 1, "b":{"c":2}}).items()))
[('a', 1), ('b.c', 2)]
>>> list(sorted(FlattenItem().transform({"a": 1, "b":[1,2]}).items()))
[('a', 1), ('b.0', 1), ('b.1', 2)]
jf.process.Fn(fn)

Wrapper to convert a function to work with column selector

This is used internally to enable nice syntax on the commandline tool

>>> Fn(len)("123")
3
>>> x = Col()
>>> Fn(len)(x.id)({"id": "123"})
3
class jf.process.GenProcessor(igen, filters)

Bases: object

Make a generator pipeline

add_filter(fun)

Add filter to pipeline

process()

Process items

class jf.process.GroupBy(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Group items by value

>>> arr = [{'item': '1', 'v': 2},{'item': '2', 'v': 3},{'item': '1', 'v': 3}]
>>> x = Col()
>>> list(sorted(map(lambda x: len(x['items']), GroupBy(x.item).transform(arr))))
[1, 2]
class jf.process.Hide(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Hide elements from items

>>> Hide("a").transform([{"a": 1, "id": 1}, {"a": 2, "id": 3}])
[{'id': 1}, {'id': 3}]
class jf.process.Identity(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.process.Jfislice(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

jf wrapper for itertools.islice

class jf.process.Last(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Show only the last (N) value(s)

>>> Last().transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}])
[{'id': 199, 'a': 2}]
jf.process.Len(it)
class jf.process.Map(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Apply simple map transformation to input data

>>> x = Col()
>>> list(Map(x.a).transform([{"a": 1}]))
[1]
class jf.process.Pipeline(transformations)

Bases: object

Make a pipeline from the transformations

A pipeline in this context is a list of transformations that are applied, in order, to the input data stream.

transform(data, **kwargs)
class jf.process.Print(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Print (n) values

This prints n values to the stderr, but passes the data through without changes.

>>> Print().transform([1, 2, 3, 4])
[1, 2, 3, 4]
class jf.process.ReduceList(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.process.Sorted(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Sort items based on the column value

>>> x = Col()
>>> Sorted(x.a, reverse=True).transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}])
[{'id': 199, 'a': 2}, {'id': 99, 'a': 1}]
jf.process.Str(it)
jf.process.TitleCase(it)
class jf.process.Transpose(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Transpose input

>>> arr = [{'a': 1, 'b': 2}, {'a': 2, 'b': 3}]
>>> list(sorted(map(lambda x: list(x.items()), Transpose().transform(arr)), key=lambda x: x[0][1]))
[[(0, 1), (1, 2)], [(0, 2), (1, 3)]]
class jf.process.Unique(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Calculate unique according to function

>>> data = [{"a": 5, "b": 123}, {"a": 4, "b": 120}, {"a": 2, "b": 120}]
>>> x = Col()
>>> len(list(Unique(x.b).transform(data)))
2
class jf.process.Update(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

class jf.process.YieldAll(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

Yield all subitems of all item

>>> list(YieldAll(Col().data).transform([{"data": [1,2,3]}]))
[1, 2, 3]
jf.process.age(datecol)

Try to guess the age of datestr

>>> x = Col()
>>> isinstance(age(x.datetime)({"datetime": "2011-04-01T12:12"}), timedelta)
True
jf.process.evaluate_col(col, x)
jf.process.fn_mod(mod)
jf.process.parse_value(val)

Parse value to complex types

jf.query_parser module

JF query parser

This module contains tools for parsing the input query when using the JF command line tool.

jf.query_parser.filter_tree(node)

Filter interesting nodes from a parse tree

jf.query_parser.flatten(tree)

Flatten tree

jf.query_parser.join_tokens(arr)

Join tokens if joined tokens contain the same instructions

jf.query_parser.make_param_list(part)

Make a parameter list from tokens

jf.query_parser.maxdepth(tree)

Calculate tree depth

jf.query_parser.merge_lambdas(arr)

Merge jf lambdas to mappers and filters

jf.query_parser.merge_not(arr, char=', ')

Merge items until character is detected before yielding them.

jf.query_parser.parse_part(function)

Parse a part of pipeline definition

jf.query_parser.parse_query(string)

Parse query string and convert it to a evaluatable pipeline argument

jf.query_parser.tag_keywords(val)

Tag keywords

jf.service module

class jf.service.RESTful(*args, fn=None, **kwargs)

Bases: jf.meta.JFTransformation

jf.sklearn_import module

jf.sklearn_import.import_from(obj_name, module_name)
jf.sklearn_import.import_from_sklearn(obj_name)
jf.sklearn_import.load_sklearn_modules()

Module contents

JF python json/yaml query engine

This module contains the main functions used for using the JF command line query tool

jf.colorize(ex)

Colorize syntax error

jf.query_convert(query)

Convert query for evaluation

jf.run_query(query, data, imports=None, import_from=None, ordered_dict=False)

Run a query against given data