How you can detect cracks in concrete bridges using deep learning

Using deep learning methods, we tried to tackle the problem of detecting concrete cracks in bridges. The results exceeded our expecations.

Deevid De Meyer - 21/2/2022

Most bridges in Belgium were built in the sixties and seventies. This means a lot of those bridges are coming to their end of life or are in need of renovations. Inspecting concrete bridges for cracks and other failures is a very complex and time-consuming task. Often, cracks occur in places that are difficult to get to, but some bridges are on dangerous locations like over or near a motorway or train tracks.

That is why more and more road authorities are looking at autonomous bridge inspections executed by drones. Theses drones, flying around the concrete structures, send a live-feed to an AI algorithm that inspects the images and classifies them into crack- and non-crack-regions and exports the results to the user.

Exactly this AI algorithm is the subject of my internship at Brainjar. They gave me the opportunity to learn in detail about deep learning methods for object detection and improve my skills in the AI world and use them for real-life applications.

Looking at the data

The first step in the development of a satisfactory solution is analysing the data from the client. We received 300 images of different bridge components with one or more cracks. As you can see there is a large variability between the images, and the cracks are not always very clear. This will give difficulties when training a neural network, especially when the dataset is very small like now.

That is why I also looked at existing crack databases, to have more and simpler data. The database of Çağlar Fırat Özgenel (source: has two classes with 20.000 227x227 RGB images each. With this dataset I can start training my neural network.

I tried two deep learning methods for solving the crack detection problem: classification and segmentation.


Classification with a sliding window
Classification with a sliding window

By dividing the large client images into smaller (and more monotonic) image segments, you can classify each subimage as “crack” or “no crack”. This way you obtain a rough location of the cracks.

I use a small custom-built convolutional neural network implemented in Keras, with BatchNormalisation and Relu-activation after each convolutional layer. The final layer is a softmax prediction layer, to allow a future extension to multiple kinds of surface damage.

The network is trained using the larger existing database, with the image input size of the neural network being 128x128x3. Of course, performance metrics for this dataset are very high, with a cross-entropy loss of 0.001 and an accuracy of 99%. The real quality test is classifying the more complicated client images.

Unfortunately, these give bad results. Everything that is darker than the rest of the image, or has a lot of edges, is detected as well. This is to be expected, as the training images are much simpler and much more monotonic. Hopefully the segmentation network will give better results…



An example of an encoder-decoder segmentation neural network (Source:

A segmentation neural network classifies every pixel of the images. The output of the network will be a label map the size of the original image. Hence, each pixel is being labeled as “crack” or “no crack”.

Most of the time encoder-decoder structures (shown on the figure) are used to obtain these maps, and that is also what I will be using for my network. The encoder repeatedly uses convolution layers combined with pooling layers, to obtain lower-dimensional images. Afterwards, the decoder uses upsampling or deconvolution layers to restore the original dimensionality of the images.

I chose to adapt UNET, originally used for segmenting biological cells. As input I resize the client images to square images of 512x512, and thus distorting them.

A more suitable metric for segmentation is the dice coefficient, which can also be used as the loss function for training. The dice coefficient is a similarity measure between two images, in our case the ground-truth labels and the predicted labels.

The main advantage of using segmentation instead of a standard classification network is the testing time. Because you input a full image instead of predicting hundreds of smaller segments, the computation time is reduced immensely, from one hour per full-size image to a few seconds. This allows a future integration with drones that need real-time test results.

Now for the results: The segmentation network exceeded everyone’s expectations. Not only are the cracks localised very accurately, almost all the background noise is ignored! This method really proves its performance. Even random images found on Google or images that were too difficult for us to annotate, are classified without any problem!


At the start of the project, I had absolutely no experience with deep learning and neural networks. I knew the basics, but had never implemented a network myself before. After many failures, months of coding and lots of bad dreams about cracked bridges later, the results were finally there. The qualitative performance of segmentation for detecting cracks in concrete has proven to be as good or better than state-of-the-art crack detection methods. Now the drones can be released…