Federated analytics is a novel approach to data analysis that ensures data privacy and confidentiality. It differs from traditional centralized databasing- where data is brought into one location- through its use of ‘parametrized querying’. This approach allows information from distributed datasets to be combined without gathering the data itself in one location. In this way, federated analytics enables data scientists to learn from everything and everyone without learning about anyone.
In federated analytics, only the parameters of the analysis method are exchanged between data-hosting sites: so-called 'parametrized querying'. This avoids the need to bring data into one location, making the logistics of otherwise difficult data exchanges much, much simpler.
One of the simplest ways to understand federated analytics is through the example of computing data averages. Say you want to calculate the average of salaries for employees at a certain level of seniority from several countries. In a traditional analytics setup, you’d first have every employees' data- including their salary data- fed from every local node (or country server) to a centralized data server. You'd then need someone to connect to a database linked to the central server, pull out all employees' data, filter the dataset for the employees needed and finally compute the average of their salary values- cumbersome and, needless to say, mired in tremendous organizational bureaucracy.
In a federated analytics model, you’d do none of that. Instead, you would have a distributed computing setup that allows each node in the network- or country server- to perform calculations on the data it contains. The calculations, rather than the data itself, are then fed by local servers to a central server for further calculation, aggregation, visualization or reporting.
The querying node- which could be any node not just the central one- need only query the distributed databases in other local nodes for their averages. No raw data leaves any of the local servers and questions about data privacy, handling, storage or access never arise! Along with the legitimate concerns and reputational risks associated with these questions, the significant costs often associated with these non-business practices also disappear. Because query logic, like filtering, computing and returning local averages, is machine-run, the downstream risks of human error in manually performing or calculating these also cease to exist.
Federated analytics makes analysis of data that may otherwise be too large, confidential, or complex to share seamless. Moving from centralized to federated analytics is an investment that fully obviates the increasingly daunting costs, risks, and constraints around compatibility and control that traditional centralized databasing runs more and more into today, thereby stymying decision-making that increasingly needs to draw upon deeper insight from a cross-section of organizational data that is seamlessly combined.
In the move from centralized to federated analytics, collaboration and computational complexity merit foremost consideration. Participating institutions need to agree on data and computational standards, as well as the best distributed processing strategy. Additionally, data administrators need to invest in the technology, infrastructure, and standards that will enable a distributed computing paradigm to manage the heterogeneity of their nodes.
Done right, Federated Analytics brings more efficiency to data processing and puts power back in the hands of data owners. In doing so, it helps organizations unlock the true power of combined data analytics- while avoiding the costs and politics of cost, risk, compatibility and control- so that businesses can get the most value out of their organizational and people data.
Comments