jf package¶
Submodules¶
jf.input module¶
JF io library
-
jf.input.
colorize_json_error
(ex)¶ Colorize input data syntax error
-
jf.input.
format_xml
(parent)¶ Recursive operation which returns a tree formated as dicts and lists. Decision to add a list is to find the ‘List’ word in the actual parent tag.
>>> tree = etree.fromstring('<doc><a>1</a></doc>') >>> format_xml(tree) {'a': '1'}
-
jf.input.
import_error
()¶ Logging function for import errors
-
jf.input.
read_file
(fn, openhook=<function hook_compressed>, ordered_dict=False, **kwargs)¶ Function for converting input file to a data source
-
jf.input.
read_input
(args, openhook=<function hook_compressed>, ordered_dict=False, **kwargs)¶ Read json, jsonl and yaml data from file defined in args
-
jf.input.
yield_json_and_json_lines
(inp)¶ Yield json and json lines
jf.meta module¶
-
class
jf.meta.
JFTransformation
(*args, fn=None, **kwargs)¶ Bases:
object
Baseclass for JF transformations
-
fit
(X, y=None)¶
-
transform
(X, y=None, gen=False, **kwargs)¶
-
-
class
jf.meta.
Struct
(**entries)¶ Bases:
object
Class representation of dict
-
dict
()¶ Convert item to dict
-
hide
(dct)¶ Mark item attribute as hidden
-
update
(dct)¶ Update item with key/values from a dict
-
-
class
jf.meta.
StructEncoder
(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)¶ Bases:
json.encoder.JSONEncoder
Try to convert everything to json
-
default
(obj)¶ Implement this method in a subclass such that it returns a serializable object for
o
, or calls the base implementation (to raise aTypeError
).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
-
-
jf.meta.
to_struct
(val)¶ Convert v to a class representing v
-
jf.meta.
to_struct_gen
(arr, ordered_dict=False)¶ Convert all items in arr to struct
jf.ml module¶
-
class
jf.ml.
ColumnSelector
(column, default=['unk'])¶ Bases:
object
-
fit
(X, y=None)¶
-
transform
(X, y=None)¶
-
-
class
jf.ml.
importResolver
¶ Bases:
object
-
class
jf.ml.
model_loader
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.ml.
persistent_trainer
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.ml.
persistent_transformation
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.ml.
trainer
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.ml.
transform
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
jf.output module¶
JF python json/yaml query engine
-
class
jf.output.
browser
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.output.
csv
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.output.
excel
(*args, **kwargs)¶ Bases:
jf.output.pandas_writer
Convert input to parquet
>>> list(excel("/tmp/test.xlsx").transform([{'a': 1}, {'a': 3}])) ['data written to /tmp/test.xlsx']
-
class
jf.output.
ipy
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.output.
md
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.output.
pandas_writer
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.output.
parquet
(*args, **kwargs)¶ Bases:
jf.output.pandas_writer
Convert input to parquet
>>> list(parquet("/tmp/test.parq").transform([{'a': 1}, {'a': 3}])) ['data written to /tmp/test.parq']
-
jf.output.
peek
(data, count=100)¶ Slice and memoize data head
-
jf.output.
print_results
(data, args)¶ Print results
-
class
jf.output.
profile
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
jf.output.
result_cleaner
(val)¶ Cleanup the result
>>> result_cleaner({'a': 1}) {'a': 1}
jf.process module¶
JF python json/yaml query engine
-
class
jf.process.
Col
(k=None)¶ Bases:
object
Object representing a column
This object is used to define column selection operations. For example if you want to select the ‘id’ from your data, you would do it as follows:
>>> x = Col() >>> x.id({"id": 235}) 235
-
class
jf.process.
Filter
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Filter input data based on a column value
>>> x = Col() >>> Filter(x.id > 100).transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}]) [{'id': 199, 'a': 2}]
-
class
jf.process.
First
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Show only the first (N) value(s)
>>> First().transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}]) [{'id': 99, 'a': 1}]
-
class
jf.process.
Firstnlast
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Show first and last (N) items
>>> Firstnlast(2).transform([1,2,3,4,5]) [[1, 2], [4, 5]]
-
class
jf.process.
Flatten
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Flatten array
Parameters: args – array to flatten Returns: array of flattened items >>> from pprint import pprint >>> pprint(list(Flatten().transform([{'a': 1, 'b':{'c': 2}}]))) [{'a': 1, 'b.c': 2}]
-
class
jf.process.
FlattenItem
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Make item flat
Parameters: - it – item
- root – root node
Returns: flattened version of the item
>>> FlattenItem().transform("foo") 'foo' >>> FlattenItem().transform({"a": 1}) {'a': 1} >>> from pprint import pprint >>> pprint(FlattenItem().transform({"a": 1, "b":{"c":2}})) {'a': 1, 'b.c': 2} >>> list(sorted(FlattenItem().transform({"a": 1, "b":{"c":2}}).items())) [('a', 1), ('b.c', 2)] >>> list(sorted(FlattenItem().transform({"a": 1, "b":[1,2]}).items())) [('a', 1), ('b.0', 1), ('b.1', 2)]
-
jf.process.
Fn
(fn)¶ Wrapper to convert a function to work with column selector
This is used internally to enable nice syntax on the commandline tool
>>> Fn(len)("123") 3 >>> x = Col() >>> Fn(len)(x.id)({"id": "123"}) 3
-
class
jf.process.
GenProcessor
(igen, filters)¶ Bases:
object
Make a generator pipeline
-
add_filter
(fun)¶ Add filter to pipeline
-
process
()¶ Process items
-
-
class
jf.process.
GroupBy
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Group items by value
>>> arr = [{'item': '1', 'v': 2},{'item': '2', 'v': 3},{'item': '1', 'v': 3}] >>> x = Col() >>> list(sorted(map(lambda x: len(x['items']), GroupBy(x.item).transform(arr)))) [1, 2]
-
class
jf.process.
Hide
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Hide elements from items
>>> Hide("a").transform([{"a": 1, "id": 1}, {"a": 2, "id": 3}]) [{'id': 1}, {'id': 3}]
-
class
jf.process.
Identity
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.process.
Jfislice
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
jf wrapper for itertools.islice
-
class
jf.process.
Last
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Show only the last (N) value(s)
>>> Last().transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}]) [{'id': 199, 'a': 2}]
-
jf.process.
Len
(it)¶
-
class
jf.process.
Map
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Apply simple map transformation to input data
>>> x = Col() >>> list(Map(x.a).transform([{"a": 1}])) [1]
-
class
jf.process.
Pipeline
(transformations)¶ Bases:
object
Make a pipeline from the transformations
A pipeline in this context is a list of transformations that are applied, in order, to the input data stream.
-
transform
(data, **kwargs)¶
-
-
class
jf.process.
Print
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Print (n) values
This prints n values to the stderr, but passes the data through without changes.
>>> Print().transform([1, 2, 3, 4]) [1, 2, 3, 4]
-
class
jf.process.
ReduceList
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.process.
Sorted
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Sort items based on the column value
>>> x = Col() >>> Sorted(x.a, reverse=True).transform([{"id": 99, "a": 1}, {"id": 199, "a": 2}]) [{'id': 199, 'a': 2}, {'id': 99, 'a': 1}]
-
jf.process.
Str
(it)¶
-
jf.process.
TitleCase
(it)¶
-
class
jf.process.
Transpose
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Transpose input
>>> arr = [{'a': 1, 'b': 2}, {'a': 2, 'b': 3}] >>> list(sorted(map(lambda x: list(x.items()), Transpose().transform(arr)), key=lambda x: x[0][1])) [[(0, 1), (1, 2)], [(0, 2), (1, 3)]]
-
class
jf.process.
Unique
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Calculate unique according to function
>>> data = [{"a": 5, "b": 123}, {"a": 4, "b": 120}, {"a": 2, "b": 120}] >>> x = Col() >>> len(list(Unique(x.b).transform(data))) 2
-
class
jf.process.
Update
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
-
class
jf.process.
YieldAll
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
Yield all subitems of all item
>>> list(YieldAll(Col().data).transform([{"data": [1,2,3]}])) [1, 2, 3]
-
jf.process.
age
(datecol)¶ Try to guess the age of datestr
>>> x = Col() >>> isinstance(age(x.datetime)({"datetime": "2011-04-01T12:12"}), timedelta) True
-
jf.process.
evaluate_col
(col, x)¶
-
jf.process.
fn_mod
(mod)¶
-
jf.process.
parse_value
(val)¶ Parse value to complex types
jf.query_parser module¶
JF query parser
This module contains tools for parsing the input query when using the JF command line tool.
-
jf.query_parser.
filter_tree
(node)¶ Filter interesting nodes from a parse tree
-
jf.query_parser.
flatten
(tree)¶ Flatten tree
-
jf.query_parser.
join_tokens
(arr)¶ Join tokens if joined tokens contain the same instructions
-
jf.query_parser.
make_param_list
(part)¶ Make a parameter list from tokens
-
jf.query_parser.
maxdepth
(tree)¶ Calculate tree depth
-
jf.query_parser.
merge_lambdas
(arr)¶ Merge jf lambdas to mappers and filters
-
jf.query_parser.
merge_not
(arr, char=', ')¶ Merge items until character is detected before yielding them.
-
jf.query_parser.
parse_part
(function)¶ Parse a part of pipeline definition
-
jf.query_parser.
parse_query
(string)¶ Parse query string and convert it to a evaluatable pipeline argument
-
jf.query_parser.
tag_keywords
(val)¶ Tag keywords
jf.service module¶
-
class
jf.service.
RESTful
(*args, fn=None, **kwargs)¶ Bases:
jf.meta.JFTransformation
jf.sklearn_import module¶
-
jf.sklearn_import.
import_from
(obj_name, module_name)¶
-
jf.sklearn_import.
import_from_sklearn
(obj_name)¶
-
jf.sklearn_import.
load_sklearn_modules
()¶
Module contents¶
JF python json/yaml query engine
This module contains the main functions used for using the JF command line query tool
-
jf.
colorize
(ex)¶ Colorize syntax error
-
jf.
query_convert
(query)¶ Convert query for evaluation
-
jf.
run_query
(query, data, imports=None, import_from=None, ordered_dict=False)¶ Run a query against given data