Autonomous Driving (Case Study)

Mar 22, 2021

Post Series

Autonomous Driving

You are employed by a startup building self-driving cars. You are in charge of detecting road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. As an example, the above image contains a pedestrian crossing sign and red traffic lights

Your 100,000 labeled images are taken using the front-facing camera of your car. This is also the distribution of data you care most about doing well on. You think you might be able to get a much larger dataset off the internet, that could be helpful for training even if the distribution of internet data is not the same.

You are just getting started on this project. What is the first thing you do? Assume each of the steps below would take about an equal amount of time (a few days).

Answer: Spend a few days training a basic model and see what mistakes it makes.

->우선 초기의 모델을 빠르게 만들고 학습시켜 본 후 반복적으로 Mistake를 찾고 개선하는 것이 더 효율적이다.

Your goal is to detect road signs (stop sign, pedestrian crossing sign, construction ahead sign) and traffic signals (red and green lights) in images. The goal is to recognize which of these objects appear in each image. You plan to use a deep neural network with ReLU units in the hidden layers.

For the output layer, a softmax activation would be a good choice for the output layer because this is a multi-task learning problem. (True/False?)

Answer: False

->Softmax는 여러개의 Class에 대해서 1개의 정답만 가질 때 사용함.

You are carrying out error analysis and counting up what errors the algorithm makes. Which of these datasets do you think you should manually go through and carefully examine, one image at a time?

Anwer: 500 images on which the algorithm made a mistake

->Error analysis는 mislabeled된 결과에 대해서 진행하고, 통계를 낸다.

After working on the data for several weeks, your team ends up with the following data:
- 100,000 labeled images taken using the front-facing camera of your car.
- 900,000 labeled images of roads downloaded from the internet.
- Each image’s labels precisely indicate the presence of any specific road signs and traffic signals or combinations of them. For example,
y^{(i)}y(i) = [1 0 0 1 0] (vertically)

means the image contains a stop sign and a red traffic light.

Because this is a multi-task learning problem, you need to have all your y^{(i)}y(i) vectors fully labeled. If one example is equal to [0 ? 1 1 ?] (vertically) then the learning algorithm will not be able to use that example. (True/False?)

Answer: False

-> Multi-task learning에서는 Loss function의 특성 상 일부의 라벨이 비어있어도 작동한다.

The distribution of data you care about contains images from your car’s front-facing camera; which comes from a different distribution than the images you were able to find and download off the internet. How should you split the dataset into train/dev/test sets?

Answer: Choose the training set to be the 900,000 images from the internet along with 80,000 images from your car’s front-facing camera. The 20,000 remaining images will be split equally in dev and test sets.

->Front-facing camera가 모델의 목표이므로 Dev/Test set으로 설정한다. 또한 Dev/Test에는 많은 양의 데이터는 필요 없으니 나머지는 다 Training set으로 넣어 모델의 성능을 올리는것이 바람직하다.

Assume you’ve finally chosen the following split between of the data:

You also know that human-level error on the road sign and traffic signals classification task is around 0.5%. Which of the following are True? (Check all that apply).

Answer:

You have a large avoidable-bias problem because your training error is quite a bit higher than the human-level error.
You have a large data-mismatch problem because your model does a lot better on the training-dev set than on the dev set

-> 0.5%의 Bayes error에 비해 Training error가 너무 크므로 Avoidable Bias다. Training Dev와 Dev/Test의 Error 차이가 많이 나므로 data-mismatch이다.

Based on the table from the previous question, a friend thinks that the training data distribution is much easier than the dev/test distribution. What do you think?

Answer: There’s insufficient information to tell if your friend is right or wrong.

->Train-dev 와 Dev/Test의 error차이가 크므로 Different Distribution이기 때문에 파악할 수 없다.

You decide to focus on the dev set and check by hand what are the errors due to. Here is a table summarizing your discoveries:

In this table, 4.1%, 8.0%, etc. are a fraction of the total dev set (not just examples your algorithm mislabeled). For example, about 8.0/15.3 = 52% of your errors are due to foggy pictures.

The results from this analysis implies that the team’s highest priority should be to bring more foggy pictures into the training set so as to address the 8.0% of errors in that category. True/False?

Additional Note: there are subtle concepts to consider with this question, and you may find arguments for why some answers are also correct or incorrect. We recommend that you spend time reading the feedback for this quiz, to understand what issues that you will want to consider when you are building your own machine learning project.

Answer: False because it depends on how easy it is to add foggy data. If foggy data is very hard and costly to collect, it might not be worth the team’s effort.

-> 만약 개선할 경우 가장 많은 Error rate를 줄여주는 것은 맞으나, 개선이 현실적으로 어려울 경우 팀의 생산성을 덜어뜨리게 된다.

You can buy a specially designed windshield wiper that help wipe off some of the raindrops on the front-facing camera. Based on the table from the previous question, which of the following statements do you agree with?

Answer: 2.2% would be a reasonable estimate of the maximum amount this windshield wiper could improve performance.

->설명생략

You decide to use data augmentation to address foggy images. You find 1,000 pictures of fog off the internet, and “add” them to clean images to synthesize foggy days, like this:

Which of the following statements do you agree with?

Answer:So long as the synthesized fog looks realistic to the human eye, you can be confident that the synthesized data is accurately capturing the distribution of real foggy images (or a subset of it), since human vision is very accurate for the problem you’re solving.

After working further on the problem, you’ve decided to correct the incorrectly labeled data on the dev set. Which of these statements do you agree with? (Check all that apply).

Answer:

You should also correct the incorrectly labeled data in the test set, so that the dev and test sets continue to come from the same distribution.
You do not necessarily need to fix the incorrectly labeled data in the training set, because it's okay for the training set distribution to differ from the dev and test sets. Note that it is important that the dev set and test set have the same distribution.

->설명생략

So far your algorithm only recognizes red and green traffic lights. One of your colleagues in the startup is starting to work on recognizing a yellow traffic light. (Some countries call it an orange light rather than a yellow light; we’ll use the US convention of calling it yellow.) Images containing yellow lights are quite rare, and she doesn’t have enough data to build a good model. She hopes you can help her out using transfer learning. What do you tell your colleague?

Answer: She should try using weights pre-trained on your dataset, and fine-tuning further with the yellow-light dataset.

-> Image 를 분석하는 Low-level feature를 공유하고 , Traffic light를 인식하는것 또한 공통되기 때문에 Transfer Learning이 가능하다.

Another colleague wants to use microphones placed outside the car to better hear if there are other vehicles around you. For example, if there is a police vehicle behind you, you would be able to hear their siren. However, they don’t have much to train this audio system. How can you help?

Answer: Neither transfer learning nor multi-task learning seems promising.

->원래 하던 작업과 전혀 관계가 없으므로 아무것도 사용 불가능.

To recognize red and green lights, you have been using this approach:

(A) Input an image (x) to a neural network and have it directly learn a mapping to make a prediction as to whether there’s a red light and/or green light (y).

A teammate proposes a different, two-step approach:

(B) In this two-step approach, you would first (i) detect the traffic light in the image (if any), then (ii) determine the color of the illuminated lamp in the traffic light.

Between these two, Approach B is more of an end-to-end approach because it has distinct steps for the input end and the output end. True/False?

Answer: False

-> A가 End2End Deep learning에 더 가깝다.

Approach A (in the question above) tends to be more promising than approach B if you have a __ (fill in the blank).

Answer: Large Training set

PyojunCode

Autonomous Driving (Case Study)

Autonomous Driving

연관글