Xingjun.Wang
71a54c1a51
support latest datasets version ( #1163 )
2025-01-02 17:06:46 +08:00
Xingjun.Wang
134fe72f06
hotfix for datasets 3.0.2 ( #1046 )
2024-10-23 16:25:53 +08:00
Xingjun.Wang
2c4505e13a
Adapt new datasets ( #1002 )
...
* update ds==3.0
* update
* add http_get_ms func
* del unused code
* fix pr issue and update requirements
2024-09-30 16:46:00 +08:00
tastelikefeet
4c518db424
patch hf hub ( #987 )
2024-09-14 12:24:19 +08:00
liuyhwangyh
92ce1aa41d
fix dataset page bug, framework add transformers dependency ( #927 )
...
* fix dataset page bug, framework add transformers dependency
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com >
2024-07-26 19:03:39 +08:00
Xingjun.Wang
210ab40c54
Upgrade datasets ( #921 )
...
* del _datasets_server import in hf_dataset_util
* fix streaming for youku-mplug and adopt latest datasets
* fix download config copy
* update ut
* add youku in test_general_datasets
* update UT for general dataset
* adapt to datasets version: 2.19.0 or later
* add assert for youku data UT
* fix disable_tqdm in some functions for 2.19.0 or later
* update get_module_with_script
* set trust_remote_code is True in load_dataset_with_ctx
* update print info
* update requirements for datasets version restriction
* fix _dataset_info
* add pillow
* update comments
* update comment
* reuse _download function in DataDownloadManager
* remove unused code
* update test_run_modelhub in Human3DAnimationTest
* set datasets>=2.18.0
2024-07-23 22:26:12 +08:00
liuyhwangyh
4e2555c5a3
Release/1.16 merge to master ( #922 )
...
* update version
* merge master
* fix fairseq issue
* fix pip compatible issue
* fix bug
* version to 1.16.1
* xformers<0.0.27
* fix matplotlib.cm compatible issue
* remove transformers from frameowrk
* vllm to 0.5.1
* not upgrade pip
* add Pillow to framework
* Update version.py
---------
Co-authored-by: xingjun.wang <xingjun.wxj@alibaba-inc.com >
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com >
2024-07-22 20:58:32 +08:00
liuyhwangyh
f7a32e48e3
add dataset download ( #906 )
...
* add dataset download
* fix cr issue
* fix cv matplotlib issue
* refactor code
* fix ut issue
* remove debug code
* remove unused import
* fix import issue
* sleep 65s before start docker avoid kill and run failed
---------
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com >
2024-07-12 19:06:44 +08:00
liuyhwangyh
ff77845772
add command line usage, optimize command line log, optimize packaging… ( #888 )
...
* add command line usage, optimize command line log, optimize packaging version compatible
* Update command.md
* fix logging dup
* Refactor install dependencies (#889 )
* refactor install dependencies, default only depencies of hub and datasets
* move pandas import to function
* update hub deps
* test
* remove generate ast file
* remove gast dependency
* replace gast with ast
* add dependency version
* remove lap for compile error
* fix comments issue
* add install ollama
---------
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com >
---------
Co-authored-by: mulin.lyh <mulin.lyh@taobao.com >
2024-06-30 20:15:03 +08:00
xingjun.wang
09779f1735
set datasets <2.19.0
2024-04-19 19:40:00 +08:00
xingjun.wang
7b8e10123b
fix dataset hf utils
2024-04-18 23:13:26 +08:00
Xingjun.Wang
1a66f069c4
Dataset refactor ( #807 )
...
* add main entry in ms_dataset
* update func get_data_patterns import
* modify return_config_only
* modify return_config_only to dataset_info_only
* udpate version for test
* del get_logger(__name__)
* fix py script loading
* fix loading py and without py
* add subset support
* add hf_datasets_util; refine list_repo_tree_ms; fix private datasets loading issue
* update version to rc5
* fix and support preview for dataset_info_only mode
* fix urlencode
* update to rc7
* loading of dataset_infos.json is deprecated; 2. add some ut
* update version
* add escapechar for read_csv and to_csv
* add params: Source=SDK
* add create_dataset func
* overwrite _get_paths_info
* update & version
* update list_repo_tree name
* add get_module_with_script, fix download imports
* fix py script loading issue in dataset_module_factory
* fix create dataset
* update log info in api
2024-03-22 17:30:34 +08:00
xingjun.wxj
2a991a5c6b
update datasets version
...
Update datasets version.
compatibility check: 2.14.5, 2.14.6, 2.15.0
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/14916111
2023-12-06 16:25:20 +08:00
xingjun.wxj
a8e9e0a48f
set datasets==2.14.6
2023-12-01 17:33:07 +08:00
xingjun.wxj
6833bdabfc
set datasets==2.14.6
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/14593950
2023-11-17 10:46:58 +08:00
mulin.lyh
23f1f474bf
Merge branch 'master-github' into master-merge-github925
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/14164566
2023-09-26 21:15:41 +08:00
tastelikefeet
f2e9ea4199
Replace code with swift wheel ( #467 )
2023-08-29 17:27:18 +08:00
wenmeng zhou
e3abfe0564
Update framework.txt ( #453 )
...
remove version restrict for numpy and pandas
2023-08-09 18:00:44 +08:00
Xingjun.Wang
41c2c95083
dataset fix version ( #413 )
...
* modify datasets version
2023-07-25 22:18:25 +08:00
mulin.lyh
3683a4386f
[to #50607174 ]feat: support parallel download large model file
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/13113235
* [to #50607174 ]feat: support parallel download large model file
2023-06-30 15:30:00 +08:00
xingjun.wxj
0af8752024
Unfreeze datasets version setting
...
1. Unfreeze datasets version setting
2. Fix some bugs in class NativeIterableDataset
3. Remove auto-converting (image/audio) object in col:FILE
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/13080190
2023-06-28 15:50:57 +08:00
mulin.lyh
25a7398aea
docker file py38 and py37 compatible merge
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12722824
* debug
* add missing deps
* add stanza
* fix numpy issue
* numpy version
* tf case
* numpy <= 1.22.0
* fix tf 1.15.5 case
* add python38 and python37 compatible docker file
* cv add fvcore
* add easycv
* remove debug code
* fix style issue
* fix python3.7 python3.8 compatible issue
* comment
* modify run config
* scipy<=1.7.3
* revert scipy<=1.7.3
* fix compatible bug
* fix compatible bug
* add easycv
* update version: 1.6.1rc0
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12754942
fix python3.8 and python3.8 compatible issue
* numpy version change
* numpy version
* numpy version
* fix numpy version
* fix numpy version
* update version: 1.6.1rc0
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12754942
fix python3.8 and python3.8 compatible issue
* restore setup.py
* restore setup.py
* add build base image
* fix style
* numpy version fix
* optimize build image, split base and modelscope image
* fix style issue
* modify example path
* modify examples folder
* add SETUPTOOLS_USE_DISTUTILS=stdlib comments
2023-06-08 10:01:58 +08:00
xingjun.wxj
5b99bb24fe
update version: 1.6.1rc0
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12754942
2023-05-25 20:28:36 +08:00
mulin.lyh
6abdce929d
fix py38 issue
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12660340
2023-05-17 22:11:23 +08:00
hemu.zp
1d28c5b730
mv mmdet to cv
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12614719
2023-05-15 11:48:29 +08:00
hemu.zp
ca85447363
remove pai-easycv requirement
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12564234
2023-05-11 14:31:18 +08:00
mulin.lyh
d772d96cf5
[to #49147498 ]feat: support python3.8
2023-05-06 18:01:02 +08:00
hemu.zp
b3e7eca817
lmit mmdet version
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12339472
2023-04-15 17:08:55 +08:00
yuze.zyz
2650d37094
Fix the log level and return value of push_to_hub
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12338218
2023-04-15 16:42:35 +08:00
hemu.zp
a4bafd8b25
Add requirements for jsonplus
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12331254
2023-04-14 14:05:36 +08:00
hemu.zp
4e78f611e6
Add sortedcontainers to requirements
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12310786
2023-04-12 22:26:56 +08:00
jiangnana.jnn
6f9d59f469
fix jsonplus, support ndarray
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/12056076
* fix jsonplus
* add unittest
* fix json dump np float
2023-03-21 18:10:10 +08:00
yuze.zyz
4a0cb67e66
fix numpy dependency
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11978446
2023-03-14 13:31:22 +08:00
mulin.lyh
e54694690f
[to #46993990 ]feat: run ci cases base on code diff to reduct ci test time
2023-02-06 08:00:19 +00:00
Yingda Chen
af62b3e9ad
limit pyarrow version
2023-01-31 11:35:09 +08:00
yingda.chen
2e029a1eda
update requirment and fix some typos
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11479462
2023-01-31 01:23:56 +00:00
xingjun.wxj
5a01eca834
[to #42322933 ] fix datasets version incompatible issue
...
最近release的datasets==2.9.0版本在IterableDataset等类和方法上,与老版本(<=2.8.0)不兼容;修改requirements将其版本限制在2.7.0~2.8.0
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11476999
* fix datasets==2.8.0 and datasets==2.9.0 incompatible issue
2023-01-29 11:13:09 +00:00
xingjun.wxj
43edddd31f
[to #42322933 ] msdataset module refactor and add 1230 features
...
1. 优化本地数据集加载链路
2. local与remote解耦,无网络环境下也可以使用SDK
3. 升级hf datasets及其相关依赖到最新版(2.7.0+)
4. 解决元数据感知不到数据文件变更的问题
5. 系统分层设计
6. 本地缓存管理问题
7. 优化error log输出信息
8. 支持streaming load
* a. 支持数据文件为zip格式的streaming
* b. 支持Image/Text/Audio/Biodata等格式数据集的iter
* c. 兼容训练数据在meta中的历史数据集的streaming load
* d. 支持数据文件为文件夹格式的streaming load
9. finetune任务串接进一步规范
* a. 避免出现to_hf_dataset这种使用,将常用的tf相关的func封装起来
* b. 去掉了跟hf混用的一些逻辑,统一包装到MsDataset里面
10. 超大数据集场景优化
* a. list oss objects: 直接拉取meta中的csv mapping,不需要做 list_oss_objects的api调用(前述提交已实现)
* b. 优化sts过期加载问题(前述提交已实现)
11. 支持dataset_name格式为:namespace/dataset_name的输入方式
参考Aone链接: https://aone.alibaba-inc.com/v2/project/1162242/task/46262894
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11264406
2023-01-10 07:01:34 +08:00
wenmeng.zwm
070ec00720
remove opencv-python from framework requirements and remove easynlp from nlp default requirements
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/11119626
2022-12-16 14:27:23 +08:00
wenmeng.zwm
4dd99b8f6e
Revert "move opencv dependency from framwork to cv "
...
This reverts commit e970a6eb43 .
2022-11-30 18:29:03 +08:00
yingda.chen
e970a6eb43
move opencv dependency from framwork to cv
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10708368
2022-11-12 20:09:48 +08:00
班扬
79c44a6810
add event tracking
2022-11-01 15:41:01 +08:00
班扬
84032f90e3
add event tracking
2022-11-01 15:34:58 +08:00
Yingda Chen
3464324f6b
[to #42322933 ] limit datasets version for now
2022-10-31 22:15:37 +08:00
mulin.lyh
7257f6c6fb
[to #45631658 ]feat support eas deploy
...
服务端文档链接(可能需要登录):
https://test.modelscope.cn/api/v1/deployer/docs
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10478609
2022-10-24 15:12:48 +08:00
wenmeng.zwm
275f8b4323
Revert "[to #45071449 ] fix setup error "
...
This reverts commit a26e6e3869 .
2022-10-14 13:55:09 +08:00
wenmeng.zwm
a26e6e3869
[to #45071449 ] fix setup error
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10196007
2022-10-12 17:33:03 +08:00
wenmeng.zwm
ba74cdf97e
[to #43878347 ] Rename runtime.txt to framework.txt
...
Link: https://code.alibaba-inc.com/Ali-MaaS/MaaS-lib/codereview/10000642
* rename runtime.txt to framework.txt
2022-09-03 12:10:16 +08:00