ReturnsĪ Pytorch dataset and a list of tensor names. Parametersĭicts ( list of dicts) – List of dictionaries where each contains the data of one input sample. add_task ( name, metric, label_list, label_column_name=None, label_name=None, task_type=None, text_column_name=None ) ¶ abstract file_to_dicts ( file: str ) → ¶ static log_problematic ( problematic_sample_ids ) ¶ dataset_from_dicts ( dicts, indices=None, return_baskets=False ) ¶Ĭontains all the functionality to turn a list of dict objects into a PyTorch Dataset and a Generates config file from Class and instance attributes (only for sensible config parameters). Save_dir ( str) – Directory where the files are to be saved generate_config ( ) ¶ Information needed to load the same processor. Saves the vocabulary to file and also creates a json file containing all the GNADProcessor) classmethod convert_from_transformers ( tokenizer_name_or_path, task_type, max_seq_len, doc_stride, revision=None, tokenizer_class=None, tokenizer_args=None, use_fast=True ) ¶ save ( save_dir ) ¶ Load_dir – str, directory that contains a ‘processor_config.json’ ReturnsĪn instance of a Processor Subclass (e.g. GNADProcessor) and loads an instance of it. Infers the specific type of Processor from a config file (e.g. Kwargs ( object) – placeholder for passing generic parametersĪn instance of the specified processor. Only works if dev_filename is set to None If None and 0.0 < dev_split < 1.0 the dev setĭev_split ( float) – The proportion of the train set that will sliced. Max_seq_len ( int) – Sequences longer than this will be truncated.ĭev_filename ( str or None) – The name of the file containing the dev data. Processor_name ( str) – The class of processor to be loaded.ĭata_dir ( str) – Directory where data files are located. Loads the class of processor specified by processor name. Note: Enabling multithreading in Rust AND multiprocessing in python might causeĬlassmethod load ( processor_name, data_dir, tokenizer, max_seq_len, train_filename, dev_filename, test_filename, dev_split, **kwargs ) ¶ Multithreading_rust ( bool) – Whether to allow multithreading in Rust, e.g. Proxies ( dict) – proxy configuration to allow downloads of remote datasets. The task name will be used to connect with the related PredictionHead. In a multitask setting this includes multiple tasks, e.g. Usually this includes a single, default task, e.g. Tasks ( dict) – Tasks for which the processor shall extract labels from the input data. Only works if dev_filename is set to Noneĭata_dir ( str) – The directory in which the train, test and perhaps dev files can be found. Test_filename ( str) – The name of the file containing test data.ĭev_split ( float) – The proportion of the train set that will sliced. If None and 0.0 < dev_split < 1.0 the dev set Train_filename ( str) – The name of the file containing training data.ĭev_filename ( str or None) – The name of the file containing the dev data. Max_seq_len ( int) – Samples are truncated after this many tokens. Tokenizer – Used to split a sentence (str) into tokens. Processor ( tokenizer, max_seq_len, train_filename, dev_filename, test_filename, dev_split, data_dir, tasks=, proxies=None, multithreading_rust=True ) ¶ Parameters Random_sample to ease forward-porting to the new random API. Return random floats in the half-open interval [0.0, 1.0).
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |