mirror of
https://github.com/gaomingqi/Track-Anything.git
synced 2025-12-14 15:37:50 +01:00
upload tutorials in steps
This commit is contained in:
2
.gitignore
vendored
2
.gitignore
vendored
@@ -9,4 +9,4 @@ debug_images/
|
||||
*.npy
|
||||
images/
|
||||
test_sample/
|
||||
doc/
|
||||
result/
|
||||
|
||||
@@ -28,6 +28,9 @@
|
||||
<!-- ![avengers]() -->
|
||||
|
||||
## :rocket: Updates
|
||||
|
||||
- 2023/05/02: We uploaded tutorials in steps :world_map:. Check [HERE](./doc/tutorials.md) for more details.
|
||||
|
||||
- 2023/04/29: We improved inpainting by decoupling GPU memory usage and video length. Now Track-Anything can inpaint videos with any length! :smiley_cat: Check [HERE](https://github.com/gaomingqi/Track-Anything/issues/4#issuecomment-1528198165) for our GPU memory requirements.
|
||||
|
||||
- 2023/04/25: We are delighted to introduce [Caption-Anything](https://github.com/ttengwang/Caption-Anything) :writing_hand:, an inventive project from our lab that combines the capabilities of Segment Anything, Visual Captioning, and ChatGPT.
|
||||
@@ -36,7 +39,7 @@
|
||||
|
||||
- 2023/04/14: We made Track-Anything public!
|
||||
|
||||
## :world_map: Video Tutorials
|
||||
## :world_map: Video Tutorials ([Try Track-Anything in Steps](./doc/tutorials.md))
|
||||
|
||||
https://user-images.githubusercontent.com/30309970/234902447-a4c59718-fcfe-443a-bd18-2f3f775cfc13.mp4
|
||||
|
||||
|
||||
119
doc/tutorials.md
Normal file
119
doc/tutorials.md
Normal file
@@ -0,0 +1,119 @@
|
||||
## Welcome to Track-Anything Tutorials
|
||||
|
||||
Here we illustrate how to use Track-Anything as an interactive tool to segment, track, and inpaint anything in videos.
|
||||
|
||||
In the current version, Track-Anything works under a linear procedure of :one: [video selection](#step1), :two: [tracking preparation](#step2), :three: [tracking](#step3), and :four: [inpainting](#step4).
|
||||
|
||||
### <span id="step1">1 Video Selection</span>
|
||||
When starting Track-Anything, the panel looks like:
|
||||
|
||||
<div align=center>
|
||||
<img src="./tutorial_imgs/video-selection.png" width="93%"/>
|
||||
</div>
|
||||
|
||||
**Recommended steps in this stage**:
|
||||
**1-1**. Select one video from your local space or examples.
|
||||
**1-2**. Click "***Get video info***" to unlock other controllers.
|
||||
|
||||
### <span id="step2">2 Tracking Preparation</span>
|
||||
After video selection, all controllers are unlocked and the panel looks like:
|
||||
|
||||
<div align=center>
|
||||
<img src="./tutorial_imgs/tracking-preparation.png" width="93%"/>
|
||||
</div>
|
||||
|
||||
**Recommended steps in this stage**:
|
||||
|
||||
**2-1**. Select ***Track End Frame*** (the last frame by default), via sliders (rough selection) and tunning buttons (precise selection).
|
||||
**2-2**. Select ***Track Start Frame*** (***Image Selection***, the first frame by default) to add masks, via sliders (rough selection) and tunning buttons (precise selection).
|
||||
|
||||
<div align=center>
|
||||
<img src="./tutorial_imgs/2-1.png" width="69%"/>
|
||||
</div>
|
||||
|
||||
- **Note**: Typing indices is also supported, but after typing, click somewhere on the panel (besides image and video part) to refresh the shown frame.
|
||||
- **Note**: Follow the order of 2-1, 2-2 to make sure the image shown is the start frame.
|
||||
|
||||
**2-3**. Select one object/region on the ***Track Start Frame***, via adding positive / negative points:
|
||||
|
||||
- **2-3-1**. Add one POSITIVE point on the target region. After this, one mask presents:
|
||||
|
||||
<div align=center>
|
||||
<img src="./tutorial_imgs/2-3-1.png" width="99%"/>
|
||||
</div>
|
||||
|
||||
- **2-3-2**. If mask looks good, go to step 2-3-5. If not, go to step 2-3-3.
|
||||
|
||||
- **2-3-3**. If mask does not fully cover the target region, add one POSITIVE point on the lack part. In contrast, if mask covers the background, add one NEGATIVE point on the overcovered background. After adding pos/neg point, the mask is updated:
|
||||
|
||||
<div align=center>
|
||||
<img src="./tutorial_imgs/2-3-3-1.png" width="99%"/>
|
||||
</div>
|
||||
|
||||
<div align=center>
|
||||
<img src="./tutorial_imgs/2-3-3-2.png" width="99%"/>
|
||||
</div>
|
||||
|
||||
- **2-3-4**. If mask looks good, go to step 2-3-5. If not, go to step 2-3-3.
|
||||
|
||||
- **2-3-5**. Click "***Add Mask***".
|
||||
|
||||
- **Note**: If mask cannot be refined after many adds, click "***Clear Clicks***" to restart from step 2-3-1.
|
||||
|
||||
- **Note**: After each "***Add Mask***", one item appears on the Dropdown List below, more operations about this controller is given in [Tracking](#step3):
|
||||
|
||||
|
||||
<div align=center>
|
||||
<img src="./tutorial_imgs/2-3-5.png" width="99%"/>
|
||||
</div>
|
||||
|
||||
- **Note**: Click "***Remove Mask***" to remove all masks from the list.
|
||||
|
||||
**2-3**. If add another object/region, go to 2-2. If not, go to [Tracking](#step3).
|
||||
|
||||
|
||||
**Note**: ALL masks have to be added on the ***Track Start Frame*** only.
|
||||
|
||||
### <span id="step3">3 Tracking</span>
|
||||
|
||||
Track-Anything only tracks the objects shown in the Dropdown List.
|
||||
|
||||
**Recommended steps in this stage**:
|
||||
|
||||
**3-1**. Confirm the objects on the list.
|
||||
|
||||
**3-2**. Click "***Tracking***".
|
||||
|
||||
After step 3-2, tracking is performed (for seconds or minutes, depending on video resolution and length), and results will be shown on the right video panel:
|
||||
|
||||
<div align=center>
|
||||
<img src="./tutorial_imgs/3-2.png" width="99%"/>
|
||||
</div>
|
||||
|
||||
### <span id="step4">4 Inpainting</span>
|
||||
|
||||
Track-Anything only "removes" the tracked objects from the input video.
|
||||
|
||||
**Recommended steps in this stage**:
|
||||
|
||||
**4-1**. Complete steps 3-1 and 3-2 to get tracking results.
|
||||
|
||||
**4-2**. Select "***Resize Ratio***" to down-scale the video.
|
||||
- **Why down-scale?** Unlike tracking, inpainting cost much more GPU memory. Down-scale can effectively avoid Out-Of-Memory (OOM) error. The estimated GPU memory requirements are as below:
|
||||
|
||||
|Resolution|50 frames|100 frames|1000 frames|
|
||||
| :--- | :----: | :----: | :----: |
|
||||
|1920 x 1080|OOM|OOM|OOM|
|
||||
|1280 x 720|30GB|46GB|46GB|
|
||||
|720 x 480|13GB|21GB|21GB|
|
||||
|640 x 480|11GB|19GB|19GB|
|
||||
|320 x 240|4GB|4.5GB|4.5GB|
|
||||
|160 x 120|2.5GB|3GB|3GB|
|
||||
|
||||
**4-3**. Click "***Inpainting***".
|
||||
|
||||
After step 4-3, inpainting is performed (for seconds or minutes, depending on video resolution and length), and results will be shown on the panel below:
|
||||
|
||||
<div align=center>
|
||||
<img src="./tutorial_imgs/4-3.png" width="99%"/>
|
||||
</div>
|
||||
@@ -363,7 +363,7 @@ if __name__ == '__main__':
|
||||
base_inpainter = BaseInpainter(checkpoint, device)
|
||||
# 3/3: inpainting (frames: numpy array, T, H, W, 3; masks: numpy array, T, H, W)
|
||||
# ratio: (0, 1], ratio for down sample, default value is 1
|
||||
inpainted_frames = base_inpainter.inpaint(frames[:1000], masks[:1000], ratio=0.1) # numpy array, T, H, W, 3
|
||||
inpainted_frames = base_inpainter.inpaint(frames[:300], masks[:300], ratio=0.6) # numpy array, T, H, W, 3
|
||||
|
||||
# save
|
||||
for ti, inpainted_frame in enumerate(inpainted_frames):
|
||||
|
||||
Reference in New Issue
Block a user