In this paper, we propose a novel framework to gener-ate high-quality segmentation results in a two-stage style,aiming at video instance segmentation task which requiressimultaneous detection, segmentation and tracking of in-stances. To address this multi-task efficiently, we opt to firstselect high-quality detection proposals in each frame. Thecategories of the proposals are calibrated with the globalcontext of video. Then, each selected proposal is extendedtemporally by a bi-directional Instance-Pixel Dual-Tracker(IPDT) which synchronizes the tracking on both instance-level and pixel-level. The instance-level module concen-trates on distinguishing the target instance from other ob-jects while the pixel-level module focuses more on the lo-cal feature of the instance. Our proposed method achieveda competitive result of mAP 45.0% on the Youtube-VOSdataset, ranking the 3rd in Track 2 of the 2nd Large-scaleVideo Object Segmentation Challenge.