COVID-19 data privacy

The San Francisco COVID-19 data sharing policy balances the need for both transparency and privacy.

The City is committed to transparency and keeping the public informed.

The City is also committed to keeping residents' private health information secure.

We must balance these commitments. So, we ensure none of the COVID data we publish put resident privacy at risk.

This page summarizes our full privacy and publishing guidelines.

Transparency

Sharing data with the public is an important part of the City’s response to the COVID-19 pandemic.

We share COVID-19 data so everyone has access to high-quality and current information. We strive to be one of the most transparent jurisdictions in the country. This is why we publish live datasets available to the public that update daily.

Privacy

Your name and health information is private. The City follows federal and state healthcare privacy laws. We never share protected health information with the public.

Before releasing any data, we complete a thorough analysis to consider the risks. The City only shares information in ways that protect resident privacy.

One of the main risks of sharing data is that someone could use it to identify a specific person.

To prevent this, we consider:

Population size
Counts of cases or tests
Linking datasets to each other

Population size
The underlying population for any data must be large enough that no one can be identified. Releasing data for the entire City is the best way to ensure this. With over 880,000 residents, it is highly unlikely that citywide data could be used to identify any person. If we must share data on smaller populations, the population of the subgroup must be 1,000 residents or more.

Counts of cases and tests
The count of cases or tests (or other data of interest) is high enough to protect privacy. For example, we report on cases by gender identity once a category has at least 5 cases. This ensures that privacy is not threatened by small numbers.

Dataset links
The data cannot be linked to other publicly accessible data in a way that identifies a case.

We first assess how a dataset could be linked to other public datasets. We then assess these linked datasets together. We want to ensure that no one could combine data from many datasets to identify an individual.

Examples of datasets that pass privacy tests

Case data by neighborhood

We analyzed releasing COVID-19 case data by neighborhood over time. We did not want to risk resident privacy.

We assessed whether:

there are enough residents in each neighborhood
how many other neighborhood datasets could be linked to this dataset

We determined that the risk that any one individual could be identified in the data is low.

Cases by age group

Our analysis found that this data for the entire City was not putting resident privacy at risk. Each age group contains over 15,000 residents. There are no other datasets that could be linked to this data.

Examples of datasets that do not pass privacy tests

We do not publish new cases by neighborhood and other cross sections. For example, we do not publish new cases by neighborhood and age.

In this case, the risk of identifying a particular person is too high. These small cross sections are more risky and could put resident privacy in jeopardy.

We do not release this type of data. Instead, we share as much raw data as we can to ensure residents and journalists are informed without risking anyone’s privacy.

We may revisit our privacy policy when total case numbers fall. This may present new privacy questions. Read our full privacy policy for more information.

Other ways to learn about smaller populations

The San Francisco Department of Public Health (SFDPH) releases research reports on sub-populations. A report is different from the raw data sharing we do on the COVID tracker. A report shares analysis results without publishing the underlying raw data. This lets the City share key findings on smaller populations and protect resident privacy.

Learn more about SFDPH reports.