Data glossary - Media Helping Media https://mediahelpingmedia.org Free journalism and media strategy training resources Mon, 24 Mar 2025 19:34:25 +0000 en-GB hourly 1 https://mediahelpingmedia.org/wp-content/uploads/2022/01/cropped-MHM_Logo-32x32.jpeg Data glossary - Media Helping Media https://mediahelpingmedia.org 32 32 Data journalism glossary https://mediahelpingmedia.org/advanced/data-journalism-glossary/ Mon, 24 Mar 2025 12:02:14 +0000 https://mediahelpingmedia.org/?p=5385 The following words and terms are commonly used in data journalism. Data journalists might want to familiarise themselves with them.

The post Data journalism glossary first appeared on Media Helping Media.

]]>
Image of a network interface card created with Gemini Imagen 3 AI by Media Helping MediaThe following words and terms are commonly used in data journalism. Data journalists might want to familiarise themselves with them.

Often used words and phrases

  • Algorithm:
    • A set of rules or instructions that a computer follows to solve a problem or perform a task. In data journalism, algorithms can be used for various purposes. Link: Algorithm
  • API (Application Programming Interface):
    • A digital tool that lets you pull data directly from a website or database, often used by journalists to access updated datasets. Link: API
  • Choropleth map:
    • A map shaded in different colours to show how a number or rate changes by area (e.g., COVID-19 cases by county). Link: Choropleth map
  • Computational thinking:
    • The process of breaking down complex problems into smaller, manageable parts, and then creating algorithms to solve them. Link: Computational thinking
  • Correlation:
    • A relationship between two variables (note: correlation doesn’t mean causation). Link: Correlation
  • CSV (Comma-Separated Values):
    • A common, simple file format for datasets which is basically a spreadsheet saved as plain text. Link: CSV
  • Data analysis:
    • Examining data to identify trends, patterns, and relationships. Link: Data analysis
  • Data bias:
    • When data is skewed or incomplete journalists need to be alert to this to avoid misleading the audience. Link: Data bias
  • Data cleansing (or wrangling):
    • The process of fixing messy data in order to correct errors, fill in missing info, and format it so it’s ready for analysis. Link: Data cleansing
  • Data ethics:
    • Principles and guidelines for the responsible collection, analysis, and dissemination of data, with a focus on privacy, security, and fairness. Link: Data ethics
  • Data journalism:
    • The practice of using data to find, create, and tell news stories. It involves collecting, analysing, and visualising data to inform the public. Link: Data journalism
  • Data leak (or breach):
    • When private or sensitive data is released, intentionally or accidentally, newsrooms often investigate these. Link: Data leak or breach
  • Data literacy:
    • The ability to understand, interpret, and communicate data effectively. This includes critical thinking, statistical reasoning, and the ability to identify biases. Link: Data literacy
  • Data mining:
    • The process of extracting valuable information and patterns from large datasets. Link: Data mining
  • Data scraping:
    • Data scraping is the automated process of extracting data from websites or other sources and saving it into a structured format. Link: Data scraping
  • Data transparency:
    • Being open about how the data was handled, what assumptions were made, and what might be missing.
  • Data visualisation:
    • Representing data visually through charts, graphs, maps, and other graphical formats. Link: Data visualisation
  • Dataset:
    • Or data-set is a collection of related data, like a spreadsheet or table, often the starting point for a data story. Link: Dataset
  • Deduplication:
    • Removing repeated entries in a dataset to avoid counting the same thing twice. Link: Data deduplication
  • Descriptive statistics:
    • Simple summaries of data, such as averages, medians, and percentages, that help explain your findings. Link: Descriptive statistics
  • FOIA (Freedom of Information Act) Request:
  • Geospatial data:
    • Data that includes location information which is essential for making maps or analysing patterns by area. Link: Geospatial data
  • Heat map:
    • A graphic that uses colour intensity to show concentrations of activity or numbers. Link: Heat map
  • Interactive graphics:
    • Visuals that let readers explore data such as maps you can zoom in on or filters to compare regions.
  • Interactive visualisation:
  • JSON (JavaScript Object Notation):
    • A format often used by websites and APIs to structure data. Journalists may need to convert this into tables. Link: JSON
  • Machine learning:
    • Computer systems analysing data to find patterns. Used in investigative journalism for things like identifying fake accounts. Link: Machine learning
  • Margin of error:
    • A measure of how much uncertainty there is in survey results. This is particularly important when reporting on political opinion polls. Link: Margin of error
  • Natural Language Processing (NLP):
    • A way to automatically analyse large amounts of text such as searching through thousands of documents for themes. Link: NLP
  • Normalisation:
    • Adjusting numbers to make fair comparisons such as calculating rates per 100,000 people instead of raw numbers. Link: Normalisation
  • Open data:
    • Data published by governments, organisations, or researchers that’s free for anyone to use in their reporting. Link: Open data
  • Outlier:
    • A data point that sticks out because it’s much higher or lower than the rest. Sometimes these lead to important news stories. Link: Outlier
  • Parsing:
    • Breaking down complex information (such as addresses or dates) into standardised parts for easier analysis. Link: Parsing
  • Regression analysis:
    • A more advanced statistical method to explore relationships between variables. This is sometimes used in deep journalistic investigations. Link: Regression analysis
  • Sampling bias:
    • This exists when the group surveyed or studied doesn’t represent the larger population. This can distort results and conclusions. Link: Sampling bias
  • SQL (Structured Query Language):
    • A coding language for searching through large databases. This is helpful for investigative journalism projects. Link: SQL
  • Spreadsheet:
    • A basic tool such as Excel or Google Sheets that most journalists use to store, sort, and analyse data. Link: Spreadsheet
  • Statistical analysis:
    • Using statistical methods to analyse data, including things such as finding the mean, median, and mode, and also finding standard deviations. Link: Statistical and data analysis
  • Structured data:
    • Data organised in rows and columns (such as Excel spreadsheets) that’s easy to sort and analyse. Link: Structured data analysis
  • Time series data:
    • Data collected over time. This is useful for spotting trends, such as changes in crime rates or housing prices. Link: Time series database
  • Tooltip:
    • A small pop-up box in a graphic that appears when readers hover over a data point to reveal details. Link: Tooltip
  • Unstructured data:
    • Data that doesn’t come in neat tables, such as PDFs, social media posts, or interview transcripts. Link: Unstructured data
  • Web scraping:
    • The process of automatically extracting data from websites. Link: Web scrapin

Related articles

Data journalism – resources and tools

What is data journalism?

Good journalism has always been about data

 

The post Data journalism glossary first appeared on Media Helping Media.

]]>