Model -assisted tag – for better or for worse – dan rose he
https://images.squarespace-cdn.com/content/v1/5ee8617eedf4d13dcedda79e/1600160516398-ZT0REEFL3LTY9B0FHCD1/Screen+Shot+2020-09-15+at+11.01.30.png?format=1500w
Data collection is for many projects that no doubt the most expensive part of the project. Data labeling such as images and text parts is a difficult and tedious work without much scaling. If a project it requires up -to -date or fresh updated data constantly updated, then this may be a high cost that can challenge the entire business case of a different great project.
There are some strategies although to reduce data labeling costs. I have previously written about Active learning; A data collection strategy that focuses on the advantage of labeling the most important data first given the weakest confidence models. This is a great strategy, but in most cases you still need to label a lot of data.
To speed up the labeling process, the model -aided labeling strategy has come out. The idea is simply for you to train a one in parallel with the labeling and while he begins to see a model in the data, he will suggest labels on the label. In that way, the label in many cases can simply approve the previously suggested label.
Model -aided labeling can be done as by training a model only for the purpose of labeling, but can also be done by placing the current production pattern in the label loop and leaving it suggest the labels.
But is the labeling modeled only a safe way to get the data labeled faster? Or is there weaknesses in the strategy? I have worked intensively with the model -aided labeling and knows for sure that there are both good and bad and if you are not careful you can end up doing more harm than well with this strategy. If you manage it correctly, it can work wonders and save you a ton of resources.
So let’s take a look at the good and the bad.
Good
The first and foremost advantage is that it is faster for the person working with the labeling to work with preliminary data. Approval of the label with a single click for most cases and you only have to manually select a label once at a time is simply faster. Especially when working with large documents or models with many possible labels, speed can increase significantly.
Another benefit really useful with the model -aided labeling is that you get an idea early about the models of weaknesses. You will get a proper meaning which cases are difficult to understand the model and are usually misused. This reflects on the results you have to expect in production and as a result YouTube chance to improve or work around these weak points. When you see weak points in the model that often suggests a lack of data or quality volume in these areas. So it also provides an overview of what kind of data you should ask to be labeled the most.
Evil
Now for the evils. As I mentioned, the bad can be pretty bad. The biggest issue with the model -aided labeling is that you are risk reducing the quality of your data. So even though you get more data labeled with less quality, you can end up with a model that performs worse than you will not use the model -aided labeling.
So how can the model -aided labeling reduce the quality of the data? Actually actually very simple. People tend to prefer defaults. The second you enter the autopilot will begin to make mistakes by having more likely to choose the predetermined or suggested label. I’ve seen this time and time again. The largest source of errors in the labeling tends to accept wrong suggestions. So you have to be very careful when you suggest tags.
Another weakness can be if the quality of pre-tanking is simply as low as it takes the label more time to correct how much it would have to begin with an empty answer. So you will have to be careful not to enable money-tasting too early.
Some tips for labeling aided by model
I have some tips to be more successful with the model -aided labeling.
The first advice is to set a target for data quality. You will never get 100% accurate data anyway, so you will have to accept a number of wrong labels. If you can set a target that is acceptable to train the model from, you can monitor if the model -aided labeling is begging to do more harm than well. This also works very much as a approximation of expectations for your team in general.
I would also suggest making samples without labeling money to measure if there is a difference between the results you get with and without labeling money. You simply do this by turning off the help model for an example one of ten cases. Easy is easy and will tell many truths.
Finally I will suggest one of my favorites. Programming Models are very useful for the model -aided labeling. Possible models are bayesian and as a result provide uncertainty in scales instead of scalar (a number) and make it much easier to know if the pre-ethics are likely to be correct or not.