jf package¶

Submodules¶

jf.input module¶

JF io library

jf.input.colorize_json_error(ex)¶: Colorize input data syntax error

jf.input.format_xml(parent)¶

Recursive operation which returns a tree formated as dicts and lists. Decision to add a list is to find the ‘List’ word in the actual parent tag.

>>> tree = etree.fromstring('<doc><a>1</a></doc>')
>>> format_xml(tree)
{'a': '1'}

jf.input.import_error()¶: Logging function for import errors

jf.input.read_file(fn, openhook=<function hook_compressed>, ordered_dict=False, **kwargs)¶: Function for converting input file to a data source

jf.input.read_input(args, openhook=<function hook_compressed>, ordered_dict=False, **kwargs)¶: Read json, jsonl and yaml data from file defined in args

jf.input.yield_json_and_json_lines(inp)¶: Yield json and json lines

jf.meta module¶

class jf.meta.JFTransformation(*args, fn=None, **kwargs)¶

Bases: object

Baseclass for JF transformations

fit(X, y=None)¶

transform(X, y=None, gen=False, **kwargs)¶

class jf.meta.Struct(**entries)¶

Bases: object

Class representation of dict

dict()¶: Convert item to dict

hide(dct)¶: Mark item attribute as hidden

update(dct)¶: Update item with key/values from a dict

class jf.meta.StructEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶

Bases: json.encoder.JSONEncoder

Try to convert everything to json

default(obj)¶

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return JSONEncoder.default(self, o)

jf.meta.to_struct(val)¶: Convert v to a class representing v

jf.meta.to_struct_gen(arr, ordered_dict=False)¶: Convert all items in arr to struct

jf.ml module¶

class jf.ml.ColumnSelector(column, default=['unk'])¶

Bases: object

fit(X, y=None)¶

transform(X, y=None)¶

class jf.ml.importResolver¶: Bases: object

class jf.ml.model_loader(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.ml.persistent_trainer(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.ml.persistent_transformation(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.ml.trainer(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.ml.transform(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

jf.output module¶

JF python json/yaml query engine

class jf.output.browser(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.output.csv(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.output.excel(*args, **kwargs)¶

Bases: jf.output.pandas_writer

Convert input to parquet

>>> list(excel("/tmp/test.xlsx").transform([{'a': 1}, {'a': 3}]))
['data written to /tmp/test.xlsx']

class jf.output.ipy(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.output.md(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.output.pandas_writer(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.output.parquet(*args, **kwargs)¶

Bases: jf.output.pandas_writer

Convert input to parquet

>>> list(parquet("/tmp/test.parq").transform([{'a': 1}, {'a': 3}]))
['data written to /tmp/test.parq']

jf.output.peek(data, count=100)¶: Slice and memoize data head

jf.output.print_results(data, args)¶: Print results

class jf.output.profile(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

jf.output.result_cleaner(val)¶

Cleanup the result

>>> result_cleaner({'a': 1})
{'a': 1}

jf.process module¶

JF python json/yaml query engine

class jf.process.Col(k=None)¶

Bases: object

Object representing a column

This object is used to define column selection operations. For example if you want to select the ‘id’ from your data, you would do it as follows:

>>> x = Col()
>>> x.id({"id": 235})
235

class jf.process.Filter(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Filter input data based on a column value

>>> x = Col()
>>> Filter(x.id > 100).transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}])
[{'id': 199, 'a': 2}]

class jf.process.First(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Show only the first (N) value(s)

>>> First().transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}])
[{'id': 99, 'a': 1}]

class jf.process.Firstnlast(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Show first and last (N) items

>>> Firstnlast(2).transform([1,2,3,4,5])
[[1, 2], [4, 5]]

class jf.process.Flatten(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Flatten array

Parameters:	args – array to flatten
Returns:	array of flattened items

>>> from pprint import pprint
>>> pprint(list(Flatten().transform([{'a': 1, 'b':{'c': 2}}])))
[{'a': 1, 'b.c': 2}]

class jf.process.FlattenItem(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Make item flat

Parameters:	it – item root – root node
Returns:	flattened version of the item

>>> FlattenItem().transform("foo")
'foo'
>>> FlattenItem().transform({"a": 1})
{'a': 1}
>>> from pprint import pprint
>>> pprint(FlattenItem().transform({"a": 1, "b":{"c":2}}))
{'a': 1, 'b.c': 2}
>>> list(sorted(FlattenItem().transform({"a": 1, "b":{"c":2}}).items()))
[('a', 1), ('b.c', 2)]
>>> list(sorted(FlattenItem().transform({"a": 1, "b":[1,2]}).items()))
[('a', 1), ('b.0', 1), ('b.1', 2)]

jf.process.Fn(fn)¶

Wrapper to convert a function to work with column selector

This is used internally to enable nice syntax on the commandline tool

>>> Fn(len)("123")
3
>>> x = Col()
>>> Fn(len)(x.id)({"id": "123"})
3

class jf.process.GenProcessor(igen, filters)¶

Bases: object

Make a generator pipeline

add_filter(fun)¶: Add filter to pipeline

process()¶: Process items

class jf.process.GroupBy(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Group items by value

>>> arr = [{'item': '1', 'v': 2},{'item': '2', 'v': 3},{'item': '1', 'v': 3}]
>>> x = Col()
>>> list(sorted(map(lambda x: len(x['items']), GroupBy(x.item).transform(arr))))
[1, 2]

class jf.process.Hide(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Hide elements from items

>>> Hide("a").transform([{"a": 1, "id": 1}, {"a": 2, "id": 3}])
[{'id': 1}, {'id': 3}]

class jf.process.Identity(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.process.Jfislice(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

jf wrapper for itertools.islice

class jf.process.Last(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Show only the last (N) value(s)

>>> Last().transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}])
[{'id': 199, 'a': 2}]

jf.process.Len(it)¶

class jf.process.Map(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Apply simple map transformation to input data

>>> x = Col()
>>> list(Map(x.a).transform([{"a": 1}]))
[1]

class jf.process.Pipeline(transformations)¶

Bases: object

Make a pipeline from the transformations

A pipeline in this context is a list of transformations that are applied, in order, to the input data stream.

transform(data, **kwargs)¶

class jf.process.Print(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Print (n) values

This prints n values to the stderr, but passes the data through without changes.

>>> Print().transform([1, 2, 3, 4])
[1, 2, 3, 4]

class jf.process.ReduceList(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.process.Sorted(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Sort items based on the column value

>>> x = Col()
>>> Sorted(x.a, reverse=True).transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}])
[{'id': 199, 'a': 2}, {'id': 99, 'a': 1}]

jf.process.Str(it)¶

jf.process.TitleCase(it)¶

class jf.process.Transpose(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Transpose input

>>> arr = [{'a': 1, 'b': 2}, {'a': 2, 'b': 3}]
>>> list(sorted(map(lambda x: list(x.items()), Transpose().transform(arr)), key=lambda x: x[0][1]))
[[(0, 1), (1, 2)], [(0, 2), (1, 3)]]

class jf.process.Unique(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Calculate unique according to function

>>> data = [{"a": 5, "b": 123}, {"a": 4, "b": 120}, {"a": 2, "b": 120}]
>>> x = Col()
>>> len(list(Unique(x.b).transform(data)))
2

class jf.process.Update(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

class jf.process.YieldAll(*args, fn=None, **kwargs)¶

Bases: jf.meta.JFTransformation

Yield all subitems of all item

>>> list(YieldAll(Col().data).transform([{"data": [1,2,3]}]))
[1, 2, 3]

jf.process.age(datecol)¶

Try to guess the age of datestr

>>> x = Col()
>>> isinstance(age(x.datetime)({"datetime": "2011-04-01T12:12"}), timedelta)
True

jf.process.evaluate_col(col, x)¶

jf.process.fn_mod(mod)¶

jf.process.parse_value(val)¶: Parse value to complex types

jf.query_parser module¶

JF query parser

This module contains tools for parsing the input query when using the JF command line tool.

jf.query_parser.filter_tree(node)¶: Filter interesting nodes from a parse tree

jf.query_parser.flatten(tree)¶: Flatten tree

jf.query_parser.join_tokens(arr)¶: Join tokens if joined tokens contain the same instructions

jf.query_parser.make_param_list(part)¶: Make a parameter list from tokens

jf.query_parser.maxdepth(tree)¶: Calculate tree depth

jf.query_parser.merge_lambdas(arr)¶: Merge jf lambdas to mappers and filters

jf.query_parser.merge_not(arr, char=', ')¶: Merge items until character is detected before yielding them.

jf.query_parser.parse_part(function)¶: Parse a part of pipeline definition

jf.query_parser.parse_query(string)¶: Parse query string and convert it to a evaluatable pipeline argument

jf.query_parser.tag_keywords(val)¶: Tag keywords

jf.service module¶

class jf.service.RESTful(*args, fn=None, **kwargs)¶: Bases: jf.meta.JFTransformation

jf.sklearn_import module¶

jf.sklearn_import.import_from(obj_name, module_name)¶

jf.sklearn_import.import_from_sklearn(obj_name)¶

jf.sklearn_import.load_sklearn_modules()¶

Module contents¶

JF python json/yaml query engine

This module contains the main functions used for using the JF command line query tool

jf.colorize(ex)¶: Colorize syntax error

jf.query_convert(query)¶: Convert query for evaluation

jf.run_query(query, data, imports=None, import_from=None, ordered_dict=False)¶: Run a query against given data