AP Computer Science Principles
Unit 2.2 – Data Compression and Storage
1. What Is Data Compression?
Data compression is the process of reducing the number of bits used to represent information.
We compress data to:
- Save storage space
- Transmit data faster over networks
- Reduce file size for images, audio, video, and documents
Compression is essential because all digital data—text, audio, video, and images—takes up space,
and smaller files move faster across the internet.
2. Types of Compression
There are two main types of compression: lossless and lossy.
2.1 Lossless Compression
Definition: Compression that reduces file size without losing any data. When decompressed, the file is restored perfectly.
Key Features:
- No data lost
- Reversible
- Used when accuracy is critical
Examples:
- ZIP files
- PNG images
- GIF images
- FLAC audio
- Text files
Common Techniques:
- Run-Length Encoding (RLE): Compress repeated characters
Example: AAAAABB → A5B2
- Huffman Coding: Assigns shorter binary codes to frequently used characters
2.2 Lossy Compression
Definition: Compression that permanently removes some data to greatly reduce file size. Restored versions are not identical.
Key Features:
- Some information is lost
- Irreversible
- Much smaller file sizes than lossless
- Works best when perfect accuracy isn't necessary
Examples:
- JPEG images
- MP3 audio
- YouTube and streaming video formats
- WebP images
Why Lossy Works Well:
- Removes tiny color differences
- Removes quiet background sounds
- Removes redundant video frames
3. When to Use Lossless vs. Lossy Compression
| Situation |
Use |
Reason |
| Text files |
Lossless |
Every character must be preserved exactly |
| Medical/scientific images |
Lossless |
Accuracy and precision required |
| ZIP archives |
Lossless |
Must restore original files |
| Online photos |
Lossy |
Smaller size more important than perfect quality |
| Streaming video/music |
Lossy |
Faster transmission and lower bandwidth |
| Editing images/audio |
Lossless first, lossy at the end |
Preserves quality during editing |
4. How Data Is Stored
All digital data—images, text, sound, video—is stored as binary (0s and 1s).
Storage Units:
- Bit = 0 or 1
- Byte = 8 bits
- Kilobyte (KB) ≈ 1,000 bytes
- Megabyte (MB) ≈ 1,000 KB
- Gigabyte (GB) ≈ 1,000 MB
- Terabyte (TB) ≈ 1,000 GB
Typical File Sizes:
- Text file (1 page): ~4 KB
- Smartphone photo: 2–4 MB
- MP3 song: 3–6 MB
- HD movie: 2–4 GB
5. How Data Is Transmitted
Data is transmitted across networks in small, manageable chunks called packets.
Each Packet Contains:
- Data being sent
- Destination address
- Sender’s address
- Error-checking information
- Packet number (for reassembly)
Key Principles:
- Packets may travel different routes to reach the same destination.
- The receiving device reorders packets into the original data.
- Missing or corrupt packets are requested again.
- This makes the internet reliable even when parts fail.
6. Why Compression Helps Transmission
Compression reduces file size, which:
- Makes uploads and downloads faster
- Reduces streaming lag and buffering
- Uses less bandwidth
- Lowers data usage on Wi-Fi and cellular networks
Example: A 10 MB photo compressed to 2 MB transfers five times faster.
7. Big Takeaways for AP CSP
- Lossless = perfect restoration (ZIP, PNG, text)
- Lossy = some data lost, smaller file size (JPEG, MP3)
- All data stored as binary: bits and bytes
- Data travels as packets through the internet
- Compression improves speed, storage, and bandwidth efficiency