47 Commits

Author SHA1 Message Date
Xingjun.Wang
134fe72f06 hotfix for datasets 3.0.2 (#1046) 2024-10-23 16:25:53 +08:00
Xingjun.Wang
2c4505e13a Adapt new datasets (#1002)
* update ds==3.0

* update

* add http_get_ms func

* del unused code

* fix pr issue and update requirements
2024-09-30 16:46:00 +08:00
tastelikefeet
4c518db424 patch hf hub (#987) 2024-09-14 12:24:19 +08:00
liuyhwangyh
92ce1aa41d fix dataset page bug, framework add transformers dependency (#927)
* fix dataset page bug, framework add transformers dependency
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
2024-07-26 19:03:39 +08:00
Xingjun.Wang
210ab40c54 Upgrade datasets (#921)
* del _datasets_server import in hf_dataset_util

* fix streaming for youku-mplug and adopt latest datasets

* fix download config copy

* update ut

* add youku in test_general_datasets

* update UT for general dataset

* adapt to datasets version: 2.19.0 or later

* add assert for youku data UT

* fix disable_tqdm in some functions for 2.19.0 or later

* update get_module_with_script

* set trust_remote_code is True in load_dataset_with_ctx

* update print info

* update requirements for datasets version restriction

* fix _dataset_info

* add pillow

* update comments

* update comment

* reuse _download function in DataDownloadManager

* remove unused code

* update test_run_modelhub in Human3DAnimationTest

* set datasets>=2.18.0
2024-07-23 22:26:12 +08:00
liuyhwangyh
4e2555c5a3 Release/1.16 merge to master (#922)
* update version

* merge master

* fix fairseq issue

* fix pip compatible issue

* fix bug

* version to 1.16.1

* xformers<0.0.27

* fix matplotlib.cm compatible issue

* remove transformers from frameowrk

* vllm to 0.5.1

* not upgrade pip

* add Pillow to framework

* Update version.py

---------

Co-authored-by: xingjun.wang <xingjun.wxj@alibaba-inc.com>
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
2024-07-22 20:58:32 +08:00
liuyhwangyh
f7a32e48e3 add dataset download (#906)
* add dataset download

* fix cr issue

* fix cv matplotlib issue

* refactor code

* fix ut issue

* remove debug code

* remove unused import

* fix import issue

* sleep 65s before start docker avoid kill and run failed

---------

Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
2024-07-12 19:06:44 +08:00
liuyhwangyh
ff77845772 add command line usage, optimize command line log, optimize packaging… (#888)
* add command line usage, optimize command line log, optimize packaging version compatible

* Update command.md

* fix logging dup

* Refactor install dependencies (#889)

* refactor install dependencies, default only depencies of hub and datasets

* move pandas import to function

* update hub deps

* test

* remove generate ast file

* remove gast dependency

* replace gast with ast

* add dependency version

* remove lap for compile error

* fix comments issue

* add install ollama

---------

Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>

---------

Co-authored-by: mulin.lyh <mulin.lyh@taobao.com>
2024-06-30 20:15:03 +08:00
xingjun.wang
09779f1735 set datasets <2.19.0 2024-04-19 19:40:00 +08:00
xingjun.wang
7b8e10123b fix dataset hf utils 2024-04-18 23:13:26 +08:00
Xingjun.Wang
1a66f069c4 Dataset refactor (#807)
* add main entry in ms_dataset

* update func get_data_patterns import

* modify return_config_only

* modify return_config_only to dataset_info_only

* udpate version for test

* del get_logger(__name__)

* fix py script loading

* fix loading py and without py

* add subset support

* add hf_datasets_util; refine list_repo_tree_ms; fix private datasets loading issue

* update version to rc5

* fix and support preview for dataset_info_only mode

* fix urlencode

* update to rc7

* loading of dataset_infos.json is deprecated; 2. add some ut

* update version

* add escapechar for read_csv and to_csv

* add params: Source=SDK

* add create_dataset func

* overwrite _get_paths_info

* update & version

* update list_repo_tree name

* add get_module_with_script, fix download imports

* fix py script loading issue in dataset_module_factory

* fix create dataset

* update log info in api
2024-03-22 17:30:34 +08:00
xingjun.wxj
2a991a5c6b update datasets version
Update datasets version.
compatibility check: 2.14.5, 2.14.6, 2.15.0
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/14916111
2023-12-06 16:25:20 +08:00
xingjun.wxj
a8e9e0a48f set datasets==2.14.6 2023-12-01 17:33:07 +08:00
xingjun.wxj
6833bdabfc set datasets==2.14.6
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/14593950
2023-11-17 10:46:58 +08:00
mulin.lyh
23f1f474bf Merge branch 'master-github' into master-merge-github925
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/14164566
2023-09-26 21:15:41 +08:00
tastelikefeet
f2e9ea4199 Replace code with swift wheel (#467) 2023-08-29 17:27:18 +08:00
wenmeng zhou
e3abfe0564 Update framework.txt (#453)
remove version restrict for numpy and pandas
2023-08-09 18:00:44 +08:00
Xingjun.Wang
41c2c95083 dataset fix version (#413)
* modify datasets version
2023-07-25 22:18:25 +08:00
mulin.lyh
3683a4386f [to #50607174]feat: support parallel download large model file
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/13113235
* [to #50607174]feat: support parallel download large model file
2023-06-30 15:30:00 +08:00
xingjun.wxj
0af8752024 Unfreeze datasets version setting
1. Unfreeze datasets version setting
2. Fix some bugs in class NativeIterableDataset
3. Remove auto-converting (image/audio) object in col:FILE
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/13080190
2023-06-28 15:50:57 +08:00
mulin.lyh
25a7398aea docker file py38 and py37 compatible merge
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12722824
* debug

* add missing deps

* add stanza

* fix numpy issue

* numpy version

* tf case

* numpy <= 1.22.0

* fix tf 1.15.5 case

* add python38 and python37 compatible docker file

* cv add fvcore

* add easycv

* remove debug code

* fix style issue

* fix python3.7 python3.8 compatible issue

* comment

* modify run config

* scipy<=1.7.3

* revert scipy<=1.7.3

* fix compatible bug

* fix compatible bug

* add easycv

* update version: 1.6.1rc0
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12754942
fix python3.8 and python3.8 compatible issue

* numpy version change

* numpy version

* numpy version

* fix numpy version

* fix numpy version

* update version: 1.6.1rc0
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12754942
fix python3.8 and python3.8 compatible issue

* restore setup.py

* restore setup.py

* add build base image

* fix style

* numpy version fix

* optimize build image, split base and modelscope image

* fix style issue

* modify example path

* modify examples folder

* add SETUPTOOLS_USE_DISTUTILS=stdlib comments
2023-06-08 10:01:58 +08:00
xingjun.wxj
5b99bb24fe update version: 1.6.1rc0
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12754942
2023-05-25 20:28:36 +08:00
mulin.lyh
6abdce929d fix py38 issue
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12660340
2023-05-17 22:11:23 +08:00
hemu.zp
1d28c5b730 mv mmdet to cv
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12614719
2023-05-15 11:48:29 +08:00
hemu.zp
ca85447363 remove pai-easycv requirement
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12564234
2023-05-11 14:31:18 +08:00
mulin.lyh
d772d96cf5 [to #49147498]feat: support python3.8 2023-05-06 18:01:02 +08:00
hemu.zp
b3e7eca817 lmit mmdet version
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12339472
2023-04-15 17:08:55 +08:00
yuze.zyz
2650d37094 Fix the log level and return value of push_to_hub
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12338218
2023-04-15 16:42:35 +08:00
hemu.zp
a4bafd8b25 Add requirements for jsonplus
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12331254
2023-04-14 14:05:36 +08:00
hemu.zp
4e78f611e6 Add sortedcontainers to requirements
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12310786
2023-04-12 22:26:56 +08:00
jiangnana.jnn
6f9d59f469 fix jsonplus, support ndarray
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12056076

* fix jsonplus

* add unittest

* fix json dump np float
2023-03-21 18:10:10 +08:00
yuze.zyz
4a0cb67e66 fix numpy dependency
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11978446
2023-03-14 13:31:22 +08:00
mulin.lyh
e54694690f [to #46993990]feat: run ci cases base on code diff to reduct ci test time 2023-02-06 08:00:19 +00:00
Yingda Chen
af62b3e9ad limit pyarrow version 2023-01-31 11:35:09 +08:00
yingda.chen
2e029a1eda update requirment and fix some typos
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11479462
2023-01-31 01:23:56 +00:00
xingjun.wxj
5a01eca834 [to #42322933] fix datasets version incompatible issue
最近release的datasets==2.9.0版本在IterableDataset等类和方法上,与老版本(<=2.8.0)不兼容;修改requirements将其版本限制在2.7.0~2.8.0

Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11476999

* fix datasets==2.8.0 and datasets==2.9.0 incompatible issue
2023-01-29 11:13:09 +00:00
xingjun.wxj
43edddd31f [to #42322933] msdataset module refactor and add 1230 features
1. 优化本地数据集加载链路  
2. local与remote解耦,无网络环境下也可以使用SDK  
3. 升级hf datasets及其相关依赖到最新版(2.7.0+)
4. 解决元数据感知不到数据文件变更的问题  
5. 系统分层设计
6. 本地缓存管理问题  
7. 优化error log输出信息  
8. 支持streaming load	
* a. 支持数据文件为zip格式的streaming
* b. 支持Image/Text/Audio/Biodata等格式数据集的iter
* c. 兼容训练数据在meta中的历史数据集的streaming load
* d. 支持数据文件为文件夹格式的streaming load

9. finetune任务串接进一步规范
* a. 避免出现to_hf_dataset这种使用,将常用的tf相关的func封装起来  
* b. 去掉了跟hf混用的一些逻辑,统一包装到MsDataset里面

10. 超大数据集场景优化
* a. list oss objects: 直接拉取meta中的csv mapping,不需要做 list_oss_objects的api调用(前述提交已实现)
* b. 优化sts过期加载问题(前述提交已实现)

11. 支持dataset_name格式为:namespace/dataset_name的输入方式

参考Aone链接: https://aone.alibaba-inc.com/v2/project/1162242/task/46262894
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11264406
2023-01-10 07:01:34 +08:00
wenmeng.zwm
070ec00720 remove opencv-python from framework requirements and remove easynlp from nlp default requirements
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11119626
2022-12-16 14:27:23 +08:00
wenmeng.zwm
4dd99b8f6e Revert "move opencv dependency from framwork to cv "
This reverts commit e970a6eb43.
2022-11-30 18:29:03 +08:00
yingda.chen
e970a6eb43 move opencv dependency from framwork to cv
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10708368
2022-11-12 20:09:48 +08:00
班扬
79c44a6810 add event tracking 2022-11-01 15:41:01 +08:00
班扬
84032f90e3 add event tracking 2022-11-01 15:34:58 +08:00
Yingda Chen
3464324f6b [to #42322933] limit datasets version for now 2022-10-31 22:15:37 +08:00
mulin.lyh
7257f6c6fb [to #45631658]feat support eas deploy
服务端文档链接(可能需要登录):
https://test.modelscope.cn/api/v1/deployer/docs
        Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10478609
2022-10-24 15:12:48 +08:00
wenmeng.zwm
275f8b4323 Revert "[to #45071449] fix setup error "
This reverts commit a26e6e3869.
2022-10-14 13:55:09 +08:00
wenmeng.zwm
a26e6e3869 [to #45071449] fix setup error
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10196007
2022-10-12 17:33:03 +08:00
wenmeng.zwm
ba74cdf97e [to #43878347] Rename runtime.txt to framework.txt
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10000642

    * rename runtime.txt  to framework.txt
2022-09-03 12:10:16 +08:00