Alvaro Bueno
09/13/2017
Trying to Estimate the value of a target variable with the help of the current variables.
Manual QC is then done to check for issues easily detected with the human eye. Depending on the content, selected points of the asset or the entire duration is checked. % of assets that fail this is small.
Looking at the data on manual QC failures, certain factors affected the likelihood of an asset failing QC.
A key goal of the model is to identify all defective assets even if this results in extra manual checks. Hence, we tuned the model for low false-negative rate (i.e. fewer uncaught defects) at the cost of increased false-positive rate.
we have a lot more data on “pass” assets than “fail” assets. We tackled this by using cost-sensitive training that heavily penalizes misclassification of the minority class.
Video assets from episodes within the same season of a show are mostly defective or mostly non-defective. It’s likely that assets in a batch were created or packaged around the same time and/or with the same equipment, and hence with similar defects.
To fine tune, offline validation of the model was performed by passively making predictions on incoming assets and comparing with actual results from manual QC.