Vibration-based condition monitoring and fault diagnosis techniques are the keys to enhancing the reliability, safety and automation level of wind turbine systems. It has been recognized that the deep learning approaches are continuously achieving the state-of-the-art performance in this field. However, the actual restrictions, such as imbalanced fault dataset and low density in the sense of data value, prevent these approaches from being widely deployed in real wind turbine systems, since large sets of high-quality data are often required for effective training in deep learning approaches. To settle these problems, focal loss is introduced into deep learning for effectively discounting the effect of easy negatives. The vibration fault data of a wind turbine test rig are collected for case studies. The results prove that the proposed methodology is feasible and efficient, achieving high robustness and performance obtained from different data qualities.