“A turning point for audio forensics” : An interview with Fabio Cervi of Earshot

In late March 2025, a mass grave was discovered in Tel Al‑Sultan, in the Gaza Strip. It contained the bodies of fifteen rescue workers, as well as ambulances and a fire truck that had been crushed and then buried. The discovery drew media attention and revealed the gravity of what had happened a week earlier, during the night of 23 March.

Despite survivors’ testimonies, no precise account of the events managed to take hold in the public sphere. The Israeli army admitted to firing on these vehicles, while describing them as “suspicious” and referring to an “exchange of fire.” It announced the opening of an internal investigation, which concluded a few weeks later that a series of “professional errors” had been made.

It is in this context — destroyed evidence, conflicting narratives, and the survivors’ accounts discredited — that Earshot and Forensic Architecture were commissioned to analyse three recordings from that night: one video and two phone calls. Their investigation, Israeli Executions of Palestinian Aid Workers, published on 24 February 2026, reconstructs the sequence of events minute by minute. It reveals the scale of the massacre and confirms that executions at point‑blank range did indeed take place.

The investigation is also distinctive for the central role that audio plays in the 3D reconstruction — a turning point for audio‑ballistics, according to Fabio Cervi of Earshot. He is the one we chose to interview in this first instalment of “Resolution”, the new monthly publication on digital investigation edited by Index.

Published on 05.03.2026

To begin, can you briefly introduce yourself and explain what Earshot is?

My name is Fabio Cervi. I am an audio investigator at Earshot, an NGO specialised in sound‑based investigations. I joined the project in 2020, and the organisation was officially created in 2023. Earshot was founded by Lawrence Abu Hamdan, director, and Caline Matar, deputy director.

I am a musician and an architect by training. I have taken specialised coursework in audio forensics and synthetic audio creation. At Earshot, my role is to develop investigative methods through audio analysis and through the visualisation of audio in 3D modelling and simulation softwares.

Earshot is an investigative agency that analyses cases of corporate violence and state violence through sound. Our mandate is to work directly with affected communities, and we reconstruct cases from recordings and testimonies that are centred on the sonic experience of events. Since our founding in 2023, we have carried out more than 40 investigations – in Palestine, France, the United States, the United Kingdom, India and Cameroon – and we have worked with organisations such as Al Jazeera, the BBC, Middle East Eye, B’Tselem, Index and Forbidden Stories.

You have just published an investigation with Forensic Architecture about the massacre carried out by the Israeli army in Tel al‑Sultan, in the Gaza Strip, in March 2025. Can you explain what it reveals and why audio was such a decisive source in this case?

In Tel al‑Sultan, the Israeli army targeted medical personnel: paramedics were killed and the vehicles were buried, so much of the material evidence was literally erased. This case was important for many reasons, but our contribution was crucial because the attack took place at night.

We worked from three recordings: a video filmed by paramedic Refaat Radwan, who was driving one of the vehicles in the convoy, and two phone calls made to the Palestine Red Crescent Society (PRCS) headquarters, one by Ashraf Abu Libda and the other by Asaad al‑Nasasra. Refaat’s video contains more than five minutes of continuous gunfire; this was the most important source. One of the calls takes place immediately after Refaat’s video; the other is more than two hours after the beginning of the attack. Because everything happens at night, the image is almost completely black: we see Refaat getting out of the ambulance, throwing himself to the ground and lying there to take cover, and then we see almost nothing. From that point on, audio becomes the only accessible source of evidence.

We also interviewed the only two survivors of the attack, Asaad al‑Nassasra and Munther Abed, who gave us a very detailed account of that night. Audio analysis was decisive in corroborating their statements and in showing that their testimony provides an accurate and reliable description of what happened.

Earshot is distinctive in that you specialise in sound, whereas at Index we mostly work from video and still images. In our investigations, the question of visual reliability is central: for instance, we take image resolution into account in order to calculate a margin of error in the 3D models we build from those images. What kind of methodological precautions do you take with sound – which is often perceived as more subjective – to guarantee the same level of rigour and reliability?

Part of our work really exists to dispel this notion that audio is a more subjective form of evidence, and that what is implied in that is that it should take a kind of second seat in legal claims. It is true that sometimes our range of methods – things like audio verification or audio enhancement, especially when we work with language – reaches its limits: at some point you have to interpret what is coming through.

You mentioned the pixelated nature of visual evidence. Audio has the same limitations, but the limitations of audio bring a totally different set of conditions compared to the pixel of an image. Audio has a sample rate: it can only record sound – and by sound I mean compressions and rarefactions of air – at particular intervals in time. That sample rate can go up to 48,000 hertz, which means that every second a microphone is capturing 48,000 little moments of compression and rarefaction of air which, put together, is what we then interpret as sound. So within one second, compared to the 25 frames per second of a frame rate, you can already see that sound, quantitatively, is loaded with forensic material.

Earshot’s mission is to show that this density of information opens up a very wide methodological field, which allows us to expand what is considered admissible in judicial settings, and to challenge our categories of evidence. In the Tel al‑Sultan investigation, this took the form of advanced audio ballistics: we analysed more than 910 gunshots over a period of more than two hours.

We really went into detail on every single one of these gunshots and analysed the sounds. By not just looking at the seconds but going into the milliseconds, we could understand that each gunshot is actually constituted by multiple sounds. There is the sound of a supersonic bullet, followed by the actual sound of the gun, which is the sound of the hot gas expanding from its muzzle – what we would colloquially call a gunshot. By entering this temporal scale, we could break down all the constituent sounds that make up a gunshot and understand how these sounds contain spatial information: the whereabouts of the shooter and the direction in which the shooter is firing.

Annotation of the 844 gunshots heard in Refaat’s video in the 5 minutes and 30 seconds of audible gunfire. 93% of these gunshots (789 gunshots) were fired towards Refaat’s camera. Credit : Earshot

By looking at these over 900 gunshots, we could build a pattern of positions and movement of the Israeli soldiers relative to the cameras that were recording them. Part of how we were able to do this was by identifying the sound of echoes of gunshots following the sound of the muzzle blast. Between half a second and up to a second after the muzzle blast, we were hearing an echo of the muzzle blast, and we understood that the gunshot was being reflected from a surface near where the massacre was happening.

Together with Forensic Architecture, we were able to identify the coordinates and positions of all the paramedics and recording devices, as well as the remaining structures in the area – a few remaining structures, concrete walls and so on. By identifying this echo after many of the gunshots, we were able to understand which surface was causing the echo. This particular area in Gaza was subjected to massive clearance and destruction, but in spite of this, and actually because of it, we were able to clearly understand which surface was causing these echoes and, as a result, how a change in the echoes meant a particular pattern of movement away or towards that surface and away or towards the recording device.

In other words, you can literally hear the soldiers moving. We found that within the five minutes of gunfire captured in Refaat’s video, for the first four minutes the soldiers were firing from a static position about 50 metres south‑east of Refaat, and that they were firing over 93 percent of the gunshots towards the ambulances and towards Refaat. The pattern then shows a shift after these first four minutes: the echoes we hear in the first four minutes begin to change, arriving later and later after the gunshot. This means that the soldiers are moving away from the surface that is reflecting the gunshot and towards the camera, increasing the path of the sound reflection and therefore the time it takes to arrive at the camera. What we are hearing is literally the movement of the soldiers approaching the paramedics. These elements match point by point Asaad’s testimony, in which he describes the soldiers first firing from an elevated sand berm and then coming down towards the paramedics, shooting as they advance until they kill them at short range.

Simulation of the sound of a gunshot fired from a position increasingly close to Refaat. Credit : Earshot

Could you explain more concretely how you were able, technically, to locate the soldiers at point‑blank range from the paramedics using only audio?

After Refaat’s video ends, Ashraf’s phone call continues for around fifty seconds and records another series of gunshots. Some of these shots produce very distinctive echoes, which come back with a slight delay and tell us which surfaces they are reflecting from. By analysing these echoes, we were able to show that they were reflecting off surfaces very close to Ashraf’s phone. Within this immediate perimeter, the only such surfaces were the emergency vehicles, which very strongly indicates that the shooters were positioned between the ambulances, just a few metres away from the medics.

The layout and orientation of the vehicles created a kind of architecture around them: by measuring the intervals between the different echoes, we could determine which ambulance the shots were bouncing off and, from there, the most likely position of the soldier at the moment of firing. For one of these shots, we were able to place the shooter just a few metres from Ashraf. These gunshots coincide with the moment when his voice disappears from the recording, which makes them, in all likelihood, the shots that killed him.

Simulation of the sound of gunshots fired from in-between the emergency vehicles, revealing the position of Israeli soldiers as close as 1 to 4 metres to the aid workers. Credit : Earshot

In Gaza, journalists are prevented from working, and many of the primary sources we have are testimonies like the ones you collected. How do you situate your work in that context?

Since the beginning of the genocide in Gaza, we have worked on many different cases. What we are seeing is a shift in who performs the work of documentation. Because of the limits on access and the refusal to recognise the credibility of local journalists – who are themselves targeted – medical personnel increasingly take on a documentation role, in the way journalists normally would.

You can see this very clearly in Refaat’s video. He is filming, and one of his colleagues asks him why he is doing it. He replies: “If something happens, this is the evidence – I have to do it.” This case is part of a broader series in which we analyse audio evidence that exists precisely because the usual channels – journalists, international organisations – are blocked or discredited.

Today, images are easily manipulated – even 3D reconstructions have been used by the Israeli army as tools of propaganda. Can sound be seen as a medium that helps us get past these manipulations?

I want to answer both yes and no. Yes, because sound often becomes a last resort for documentation: practically speaking, you can record for a long time, with little battery and storage, and in a more discreet way than video. In extreme conditions, it is simply a more accessible tool for collecting evidence.

But at the same time, we are also seeing sound being used for propaganda purposes. One example: we analysed a piece of propaganda circulated by Israel that shows a woman presenting herself as a doctor in Gaza, accusing Hamas of having bombed the hospital where she is. While she is speaking, you hear explosions in the background. By analysing the resonance of these sounds, we were able to show that the explosions had been added in post‑production: their reverberation did not match the reverberation of her voice in the room where she was filmed.

In that case, sound allowed us to demonstrate the fabricated nature of the video and to expose the limits of this kind of propaganda. My Arabic‑speaking colleagues also pointed out that her accent did not match that of a local resident. So sound analysis, combined with linguistic expertise, allows us to redefine what we consider evidence, beyond what we can immediately see or understand in the image.

Do you see this investigation as a turning point for audio forensics? And how has it been received publicly, especially in the United Kingdom?

Yes, we do see this investigation as a turning point for audio forensics and audio ballistics. Before our work, media coverage brought in audio ballistics experts, but their analyses remained superficial and limited: they concluded that a little over one hundred shots had been fired. After a year spent analysing these recordings with Forensic Architecture and with the survivors, we were able to count more than nine hundred shots, and, crucially, to produce statistics. We showed that 93 percent of these shots – more than 780 rounds – were fired directly at the paramedics and their vehicles. None of these figures or dynamics were present in the public debate before our work.

In addition, our analysis allowed us to reconstruct, minute by minute, the positions and movements of the Israeli soldiers, up until the moment when they move in between the ambulances and the medics. This is how we were able to place one soldier less than a metre away from Ashraf Abu Libda and to identify the most likely moment of his execution.

The investigation will be presented in the UK Parliament on March 24th. Bringing the work into that space is part of a broader effort to requalify audio as a central source of documentation and proof. Our investigation is closely tied to the testimony of witnesses, especially Asaad al‑Nasasra and Munther Abed. Even very fine‑grained details – such as the sound of a vehicle manoeuvring in an attempt to escape – line up exactly with what the survivors describe at specific points in the recordings. This work also helps to validate the witnesses themselves as primary, credible sources, in a context where their accounts are so often discredited.


Related resources :


Every two weeks, get our latest publications directly in your inbox. Subscribes to INDEX's newsletter:

Support us →