Dataset refactor (#807)

* add main entry in ms_dataset

* update func get_data_patterns import

* modify return_config_only

* modify return_config_only to dataset_info_only

* udpate version for test

* del get_logger(__name__)

* fix py script loading

* fix loading py and without py

* add subset support

* add hf_datasets_util; refine list_repo_tree_ms; fix private datasets loading issue

* update version to rc5

* fix and support preview for dataset_info_only mode

* fix urlencode

* update to rc7

* loading of dataset_infos.json is deprecated; 2. add some ut

* update version

* add escapechar for read_csv and to_csv

* add params: Source=SDK

* add create_dataset func

* overwrite _get_paths_info

* update & version

* update list_repo_tree name

* add get_module_with_script, fix download imports

* fix py script loading issue in dataset_module_factory

* fix create dataset

* update log info in api
This commit is contained in:
Xingjun.Wang
2024-03-22 17:30:34 +08:00
committed by GitHub
parent 9d2c2708ff
commit 1a66f069c4
10 changed files with 1873 additions and 25 deletions

View File

@@ -4,6 +4,7 @@ datasets>=2.14.5
einops
filelock>=3.3.0
gast>=0.2.2
huggingface_hub
numpy
oss2
pandas