A friend of mine works in the video game industry; specifically with sound. One day we were discussing the massive library of sound effects that had to be organized and searched through for each door creak or sword impact in a game. The conversation went on regarding how to automate this process. Shouldn’t the sounds be added automatically when the level is scripted? Regardless, we can save that discussion for a different page. Due to my interests at the time, I just heard about his hoard of sound effects and thought about how they could be used in a deep learning scenario. The idea of exploring the latent space between two or more sound effects was intriguing. I asked, and he soon provided me with a few different types of impact sounds (i.e. sword vs sword, fist vs wood, stone vs wood, etc). The reason we agreed on impact sounds is that they were typically short (quicker to process/train with) and they were somewhat uniform (between 2 to 3 seconds each, rather than widely variable durations).