Data and Variables

Lecture 2

2024-05-16

Some logistics

Reminders

  • Drop/add ends tomorrow
  • Class does not meet Fridays–next meeting is Monday
  • The course syllabus is online.

Slides

Data and Variables

Sociological data

Sociological data: Qualitative

  • Interpretations, stories, narratives, observations

  • Uses:

    • deep understanding of particular cases, theory building, where measurement doesn’t work
  • Methods:

    • interviews, ethnographies, content analysis, image analysis, etc

Sociological data: Quantitative (this class)

  • Measured quantities. Counts, closed-ended surveys, proportions, outcomes.

  • Uses:

    • Describing measurable things about the world, theory testing
  • Methods/sources:

    • Surveys, administrative data, digital trace data, etc

EADA data (from last time)

Step 1: Data -> data frame

Observations

  • Each instance that you observe data about (often 1 observation = 1 person, but not always)
  • Rows in your data frame

Variables

  • Aspects that can be different for different cases
  • Columns in your data frame
  • Values the variable can take are called response options (especially in the context of surveys)

Types of variables

Categorical

  • Response options have names, not numbers
  • Mathematical operations (addition, subtraction, division) do not make sense
  • The book treats these subtypes as the same–we will too for math’s sake, but sometimes the difference does play into interpretations

Categorical: Nominal

  • Simple categories
  • Can’t say one category is “bigger” or “smaller” than another
  • Examples: gender identity, race, ethnicity, economic sector

Categorical: Ordinal

  • You can say that one is “bigger” or “more” than another
  • Examples: education (no hs diploma, hs diploma, some college, 2 year degree, 4 year degree, advanced degree), income (low, middle, high)

Categorical: Binary

  • Has just two levels: yes/no; present/not present
  • Examples: has college degree/does not have college degree; not ever married/married

Numeric

  • Response options are numbers where the values are meaningful (ie, not arbitrary)
  • Mathematical operations (addition, subtraction, etc) make sense

Numeric: Continuous

  • Any value within a range is possible–decimals are fine
  • Examples: Temperature, height, elevation, stock price

Numeric: Discrete

  • Only certain values (usually round numbers) “make sense” and are used
  • Examples: Age, years of education

Exercise: EADA data variable types

  • Determine the type of each variable in this data set.

For next time:

  • Read the syllabus, if you haven’t already
  • Email me your exercise answers