Data Compression Quiz 1-3

Lossless Compression : TIFF or Tagged Image File Format are lossless images files meaning that they do not need to compress or lose any image quality or information (although there are options for compression), allowing for very high-quality images but also larger file sizes.

The 'lassen volcano' will be lossless compression due to the complicated nature of the image which means it requires more time to render and uses many different colors compared the the green and blue squares which are small and just one color.

Lossy Compression : Lossy compression refers to compression in which some of the data from the original file (JPEG) is lost. The process is irreversible, once you convert to lossy, you can't go back. And the more you compress it, the more degradation occurs. JPEGs and GIFs are both lossy image formats.

The 'green square' and the 'blue square' will be lossy compression ue to the less complicated nature to the image. These are just two colors which requires less rendering.

Q1 : Advantage of lossless over lossy compression

Skill : 1.D

Unit 2.2 Daily Video 1

</div> </div> </div>

Which of the following is an advantage of a lossless compression algorithm over a lossy compression algorithm?

(A) A lossless compression algorithm can guarantee that compressed information is kept secure, while a lossy compression algorithm cannot.

(B) A lossless compression algorithm can guarantee reconstruction of original data, while a lossy compression algorithm cannot.

(C) A lossless compression algorithm typically allows for faster transmission speeds than does a lossy compression algorithm.

(D) A lossless compression algorithm typically provides a greater reduction in the number of bits stored or transmitted than does a lossy compression algorithm.

(A) Wrong - Incorrect. The ability to keep data secure is not a primary function of a compression algorithm.

(B) Correct - Correct. Lossless compression algorithms are guaranteed to be able to reconstruct the original data, while lossy compression algorithms are not.

(C) Wrong - Incorrect. In situations where transmission time is maximally important, lossy compression algorithms are typically chosen, as lossy compression typically provides a greater reduction in file size.

(D) Wrong - Incorrect. Lossless compression algorithms usually achieve less reduction in the number of bits stored or transmitted than do lossy compression algorithms.

Q2 Compression algorithm for storing a data file

Skill : 1.D

Unit 2.2 Daily Video 1

</div> </div> </div>

A user wants to save a data file on an online storage site. The user wants to reduce the size of the file, if possible, and wants to be able to completely restore the file to its original version. Which of the following actions best supports the user’s needs?

(A) Compressing the file using a lossless compression algorithm before uploading it

(B) Compressing the file using a lossy compression algorithm before uploading it

(C) Compressing the file using both lossy and lossless compression algorithms before uploading it

(D) Uploading the original file without using any compression algorithm

(A) Correct - Correct. Lossless compression algorithms allow for complete reconstruction of the original data and typically reduce the size of the data.

(B) Wrong - Incorrect. While a lossy compression algorithm will reduce the size of the data, it does not allow for complete reconstruction of the original data.

(C) Wrong - Incorrect. Applying lossy compression the to file will prevent the user from restoring it to its original version.

(D) Wrong - Incorrect. Uploading the original file allows complete reconstruction of the original data but does not reduce the size of the file.

Q3 True statement about compression

Skill : 1.D

Unit 2.2 Daily Video 1

</div> </div> </div>

A programmer is developing software for a social media platform. The programmer is planning to use compression when users send attachments to other users. Which of the following is a true statement about the use of compression?

(A) Lossless compression of video files will generally save more space than lossy compression of video files.

(B) Lossless compression of an image file will generally result in a file that is equal in size to the original file.

(C) Lossy compression of an image file generally provides a greater reduction in transmission time than lossless compression does.

(D) Sound clips compressed with lossy compression for storage on the platform can be restored to their original quality when they are played.

(A) Wrong - Incorrect. Lossy data compression algorithms can usually provide a greater reduction in the space required than lossless compression algorithms can.

(B) Wrong - Incorrect. Lossless compression usually results in a file that is smaller in size than the original file.

(C) Correct - Correct. Although fewer bits may be stored, information is not necessarily lost when lossy compression is applied to an image.

(D) Wrong - Incorrect. Lossy compression algorithms allow only an approximation of the original data to be reconstructed.

Extracting Information from Data Quiz 1-6

Q1 Challenge due to lack of unique ID

Skill : 5.D

Unit 2.3 Daily Video 1-2

</div> </div> </div>

A researcher is analyzing data about students in a school district to determine whether there is a relationship between grade point average and number of absences. The researcher plans on compiling data from several sources to create a record for each student.

The researcher has access to a database with the following information about each student.

Last name
First name
Grade level (9, 10, 11, or 12)
Grade point average (on a 0.0 to 4.0 scale)

The researcher also has access to another database with the following information about each student.

First name
Last name
Number of absences from school
Number of late arrivals to school

Upon compiling the data, the researcher identifies a problem due to the fact that neither data source uses a unique ID number for each student. Which of the following best describes the problem caused by the lack of unique ID numbers?

(A) Students who have the same name may be confused with each other.

(B) Students who have the same grade point average may be confused with each other.

(C) Students who have the same grade level may be confused with each other.

(D) Students who have the same number of absences may be confused with each other.

(A) Correct - Correct. A unique identifier would be required in order to distinguish between two students with the same first and last names.

(B) Wrong - Incorrect. It is expected that many students in the school district have the same grade point average as each other. These students can be distinguished from each other using their first and last name, except in cases where two students have the same first and last name.

(C) Wrong - Incorrect. It is expected that many students in the school district have the grade level as each other. These students can be distinguished from each other using their first and last names, except in cases where two students have the same first and last name.

(D) Wrong - Incorrect. It is expected that many students in the school district have the same number of absences as each other. These students can be distinguished from each other using their first and last name, except in cases where two students have the same first and last name.

Q2 Challenge in analyzing data from many counties

Skill : 5.D

Unit 2.3 Daily Video 1-2

</div> </div> </div>

A team of researchers wants to create a program to analyze the amount of pollution reported in roughly 3,000 counties across the United States. The program is intended to combine county data sets and then process the data. Which of the following is most likely to be a challenge in creating the program?

(A) A computer program cannot combine data from different files.

(B) Different counties may organize data in different ways.

(C) The number of counties is too large for the program to process.

(D) The total number of rows of data is too large for the program to process.

(A) Wrong - Incorrect. Computer programs can accept and process multiple data files as input.

(B) Correct - Correct. It will be a challenge to clean the data from the different counties to make the data uniform. The way pollution data is captured and organized may vary significantly from county to county.

(C) Wrong - Incorrect. Even if the number of data sets is large, they can all be processed with a computer program.

(D) Wrong - Incorrect. Even if the data sets are large, they can be processed with a computer program.

Q3 Challenges with city data entered by users

Skill : 5.D

Unit 2.3 Daily Video 1-2

</div> </div> </div>

A student is creating a Web site that is intended to display information about a city based on a city name that a user enters in a text field. Which of the following are likely to be challenges associated with processing city names that users might provide as input?

Select two answers.

(A) Users might attempt to use the Web site to search for multiple cities.

(B) Users might enter abbreviations for the names of cities.

(C) Users might misspell the name of the city.

(D) Users might be slow at typing a city name in the text field.

(A) Wrong - Incorrect. A user entering data into the Web site to search for multiple cities does not directly affect the quality of the data. If the Web site is working as intended, users should be able to use it as many times as they want.

(B) Correct - Correct. Different users may abbreviate city names differently. This may require the student to clean the data to make it uniform before it can be processed.

(C) Correct - Correct. Misspelled city names will not be an exact match to information stored by the Web site. This may require the student to clean the data to make it uniform before it can be processed.

(D) Wrong - Incorrect. A user’s typing speed does not directly affect the quality of the data. Until a city name is entered, the Web site cannot search for information.

Q4 Determine artist with the most concert attendees

Skill : 5.B

Unit 2.3 Daily Video 1-2

</div> </div> </div>

A database of information about shows at a concert venue contains the following information.

Name of artist performing at the show
Date of show
Total dollar amount of all tickets sold

Which of the following additional pieces of information would be most useful in determining the artist with the greatest attendance during a particular month?

(A) Average ticket price

(B) Length of the show in minutes

(C) Start time of the show

(D) Total dollar amount of food and drinks sold during the show

(A) Correct - Correct. The attendance for a particular show can be calculated dividing the total dollar amount of all tickets sold by the average ticket price.

(B) Wrong - Incorrect. The length of the show is not useful for determining attendance at a show.

(C) Wrong - Incorrect. The start time of the show is not useful for determining attendance at a show.

(D) Wrong - Incorrect. The total dollar amount of food and drinks sold during a show may be correlated with the attendance at the show, but cannot be used to determine the exact number of attendees.

Q5 Information determined using dashboard metadata

Skill : 5.B

Unit 2.3 Daily Video 1-2

</div> </div> </div>

A camera mounted on the dashboard of a car captures an image of the view from the driver’s seat every second. Each image is stored as data. Along with each image, the camera also captures and stores the car’s speed, the date and time, and the car’s GPS location as metadata. Which of the following can best be determined using only the data and none of the metadata?

(A) The average number of hours per day that the car is in use

(B) The car’s average speed on a particular day

(C) The distance the car traveled on a particular day

(D) The number of bicycles the car passed on a particular day

(A) Wrong - Incorrect. The average number of hours per day of use would be based on speed, date, and time, which are part of the metadata.

(B) Wrong - Incorrect. The calculation of average speed would be based on speed and time, which are part of the metadata.

(C) Wrong - Incorrect. The calculation of distance traveled on a particular day would be based on GPS location, date, and time, which are part of the metadata.

(D) Correct - Correct. Determining the number of bicycles the car encountered would require the use of image recognition software to examine the images collected by the camera. The images are the data collected and no metadata would be required.

Q6 Information from student work habit survey

Skill : 5.B

Unit 2.3 Daily Video 1-2

</div> </div> </div>

A teacher sends students an anonymous survey in order to learn more about the students’ work habits. The survey contains the following questions.

On average, how long does homework take you each night (in minutes) ?
On average, how long do you study for each test (in minutes) ?
Do you enjoy the subject material of this class (yes or no) ?

Which of the following questions about the students who responded to the survey can the teacher answer by analyzing the survey results?

I. Do students who enjoy the subject material tend to spend more time on homework each night than the other students do? II. Do students who spend more time on homework each night tend to spend less time studying for tests than the other students do? III. Do students who spend more time studying for tests tend to earn higher grades in the class than the other students do?

(A) I only

(B) III only

(C) I and II

(D) I and III

(A) Wrong - Incorrect. Question II can be answered because the teacher can detect a correlation between responses to questions 1 and 2 on the survey.

(B) Wrong - Incorrect. Question I can be answered because the teacher can detect a correlation between responses to questions 1 and 3 on the survey. Question II can be answered because the teacher can detect a correlation between responses to questions 1 and 2 on the survey. Question III cannot be answered because the survey is anonymous and the teacher cannot compare student grades with the responses to the survey questions.

(C) Correct - Correct. Question I can be answered because the teacher can detect a correlation between responses to questions 1 and 3 on the survey. Question II can be answered because the teacher can detect a correlation between responses to questions 1 and 2 on the survey. Question III cannot be answered because the survey is anonymous and the teacher cannot compare student grades with the responses to the survey questions.

(D) Wrong - Incorrect. Question II can be answered because the teacher can detect a correlation between responses to questions 1 and 2 on the survey. Question III cannot be answered because the survey is anonymous and the teacher cannot compare student grades with the responses to the survey questions.

</div>