Analysis for Prices of Food Items, Hygiene Items, Fuel and Winter Items

Tarek Dib

Date: March 19, 2014

Introduction

The data was provided by CARE International Lebanon. This is a comprehensive analysis of the prices of food, hygiene, fuel and winterization items. The main purpose of the study is to compare prices of these items among shops. Visualization is used extensively in the analysis to illustrate the power of visualization that can be used in R to tell a story. This is a small data set with only 11 shops, mainly from the Shouf District in Mount Lebanon. However, the analysis can be extended to thousands of shops with certain changes. This study would help decision makers to select specific shops to encourage the Syrian refugees to buy their items from. Shop names were not listed in this study for privacy issues. However a list of shop names were created for illustration. I hope you enjoy the journey of this analysis as much as I did!

Data Manipulation and Structure

# Read data set into R
y <- read.csv("careViz.csv")
str(y)

## 'data.frame':    11 obs. of  52 variables:
##  $ start_time               : Factor w/ 11 levels "2014-03-12T09:32:04.366+02",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ end_time                 : Factor w/ 11 levels "2014-03-12T09:50:14.280+02",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ today                    : Factor w/ 1 level "2014-03-12": 1 1 1 1 1 1 1 1 1 1 ...
##  $ imei                     : num  3.58e+14 3.58e+14 3.58e+14 3.58e+14 3.58e+14 ...
##  $ phonenumber              : logi  NA NA NA NA NA NA ...
##  $ section1.governorate     : Factor w/ 1 level "Mount Lebanon": 1 1 1 1 1 1 1 1 1 1 ...
##  $ section1.district        : Factor w/ 1 level "Chouf": 1 1 1 1 1 1 1 1 1 1 ...
##  $ section1.partner         : Factor w/ 2 levels "no","yes": 1 1 1 1 2 1 1 1 1 1 ...
##  $ section1.shop_size       : Factor w/ 2 levels "large","medium": 2 2 2 2 2 2 1 2 2 1 ...
##  $ section1.survey_date     : Factor w/ 1 level "2014-03-12": 1 1 1 1 1 1 1 1 1 1 ...
##  $ section2.lemon           : int  0 0 750 1000 0 0 0 1000 1000 750 ...
##  $ section2.lettuce         : int  0 0 1000 1500 0 0 0 1250 1000 1500 ...
##  $ section2.cabbage         : int  0 0 500 750 0 0 0 750 750 750 ...
##  $ section2.tomato          : int  0 0 1000 1250 0 0 0 1500 1500 1500 ...
##  $ section2.cucumber        : int  0 0 1500 1500 0 0 0 2000 2500 1500 ...
##  $ section2.eggs            : int  5500 6500 0 7000 6750 6500 6500 7000 7000 6500 ...
##  $ section2.bread           : int  1250 1250 0 1250 1500 1500 1500 1500 1500 1500 ...
##  $ section2.milk            : int  6250 6500 0 6500 6750 6500 6900 7000 7000 6500 ...
##  $ section2.cheese          : int  1750 2000 0 1750 2000 2000 2000 2000 2000 2000 ...
##  $ section2.oil             : int  12000 11000 0 12000 12500 6500 12500 9500 12500 13000 ...
##  $ section2.rice            : int  1750 2000 0 1750 2000 20007 1750 1750 2000 2000 ...
##  $ section2.wheat           : int  1250 1500 0 2000 1750 1500 1650 1750 1750 1750 ...
##  $ section2.pasta           : int  500 500 0 500 7500 750 450 750 2250 1250 ...
##  $ section2.beans           : int  4500 0 0 4750 5000 2500 4900 0 4750 5500 ...
##  $ section2.lentils         : int  2000 2500 0 2250 2500 2500 2500 2000 2500 3000 ...
##  $ section2.beef            : int  1500 1750 0 2000 1900 2000 1750 1750 1750 2000 ...
##  $ section2.tuna            : int  2000 2000 0 2500 2500 1750 2200 2250 2750 1750 ...
##  $ section2.sardine         : int  1000 1000 0 1250 1250 1250 1200 1250 1500 1500 ...
##  $ section2.tomato_paste    : int  3000 3000 0 3750 4500 5000 3900 2250 2500 2750 ...
##  $ section2.sugar           : int  1250 0 0 1250 1200 1200 1000 1200 1000 1250 ...
##  $ section2.salt            : int  250 250 0 500 500 500 350 500 500 500 ...
##  $ section3.toilet_paper    : int  1000 1000 0 1000 1000 1000 875 1000 1250 1000 ...
##  $ section3.toothpaste      : int  1250 2000 0 2500 1500 1000 1300 1250 1500 1250 ...
##  $ section3.detergent_soap  : int  6000 0 0 2750 2000 0 2600 3000 0 0 ...
##  $ section3.dishes_detergent: int  1000 2000 0 2000 2000 2000 1990 3000 2000 2000 ...
##  $ section3.san_napkins     : int  1750 0 0 0 0 2000 3350 0 3500 0 ...
##  $ section3.soap_pieces     : int  1250 5000 0 3500 2500 2500 4000 0 2500 2500 ...
##  $ section3.soap_bar        : int  1000 1250 0 1500 1500 0 2150 1500 1750 1500 ...
##  $ section3.disinfectant    : int  3500 1500 0 1000 6000 0 1750 4000 2500 5500 ...
##  $ section3.shampoo         : int  3500 5000 0 5000 3500 4000 3400 3500 5000 3500 ...
##  $ section3.diapers         : int  11000 20000 0 0 11000 0 22500 15000 0 0 ...
##  $ section4.gas             : int  1900 2000 0 2000 0 0 0 0 0 0 ...
##  $ section4.petrol          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ section5.blanket         : int  0 0 0 0 0 0 35000 0 0 0 ...
##  $ section5.stove           : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meta.instanceID          : Factor w/ 11 levels "uuid:050ca6aa-c3a2-4eaa-b015-2e51cd398001",..: 7 10 5 4 1 6 9 3 8 2 ...
##  $ X_id                     : int  2829823 2830070 2830071 2830134 2830427 2831109 2831503 2832135 2832787 2833104 ...
##  $ X_uuid                   : Factor w/ 11 levels "050ca6aa-c3a2-4eaa-b015-2e51cd398001",..: 7 10 5 4 1 6 9 3 8 2 ...
##  $ X_submission_time        : Factor w/ 11 levels "2014-03-12T07:50:48",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ X_index                  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ X_parent_table_name      : logi  NA NA NA NA NA NA ...
##  $ X_parent_index           : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...

# Fictional shop names just for illustration.
shops <- c("s1", "s2", "s3", "s4", "s5", "s6", "s7", "s8", "s9", "s10", "s11")
# Merge shops into the data frame
y <- cbind(shops, y)
# indices of the Shops where lemon is availble
idx1 = which(y[, 12] != 0)
# Lemon prices at the shops where lemon is available
lemonPrices = y[idx1, 12]
# Shops where lemon is available
shops.lemon = y$shops[idx1]

# A subset of the food items only
y.food <- y[, 12:32]
# Change column names of the food items
colnames(y.food) = c("Lemon", "Lettuce", "Cabbage", "Tomato", "Cucumber", "Eggs", 
    "Bread", "Milk", "Cheese", "Oil", "Rice", "Wheat", "Pasta", "Beans", "Lentils", 
    "Beef", "Tuna", "Sardine", "Tomato Paste", "Sugar", "Salt")
# A subset of hygiene items
y.hygiene <- y[, 33:42]
# Change hygiene item names
colnames(y.hygiene) = c("toilet paper", "toothpaste", "detergent soap", "dishes detergent", 
    "san_napkins", "soap pieces", "soap bar", "disinfectant", "shampoo", "diapers")
# A subset of fuel items
y.fuel <- y[, 43:44]
# Change fuel item names
colnames(y.fuel) <- c("gas", "petrol")
# A subset of winterization items
y.winterization <- y[45:46]
# Change winterization item names
colnames(y.winterization) = c("blanket", "stove")
# Merge the columns of each of the category items into one column
foodItemPrices <- unlist(y.food, use.names = F)
hygieneItemPrices <- unlist(y.hygiene, use.names = F)
fuelItemPrices <- unlist(y.fuel, use.names = F)
winterizationItemPrices <- unlist(y.winterization, use.names = F)
# Items
foodItems = rep(names(y.food), each = 11)
hygieneItems = rep(names(y.hygiene), each = 11)
fuelItems = rep(names(y.fuel), each = 11)
winterizationItems = rep(names(y.winterization), each = 11)

# Shops Create a replicated shop vector
Shop = rep(shops, ncol(y[, 12:46]))
# Select shops to be used in the food item only data frame
Shop.food = Shop[1:length(foodItems)]
# Food data frame (Only shops where these items are available)
df.food = data.frame(Shop.food, foodItems, foodItemPrices)
Shop.foodAvail <- df.food$Shop.food[which(df.food$foodItemPrices != 0)]
foodItemAvail <- df.food$foodItems[which(df.food$foodItemPrices != 0)]
food.Price <- df.food$foodItemPrices[which(df.food$foodItemPrices != 0)]
df.food.Avail <- data.frame(Shop.foodAvail, foodItemAvail, food.Price)

# Select shops to be used in the hygiene item only data frame
Shop.hygiene = Shop[(length(foodItems) + 1):(length(foodItems) + length(hygieneItems))]
# Hygiene Data Frame (Only shops where these items are available)
df.hygiene = data.frame(Shop.hygiene, hygieneItems, hygieneItemPrices)
Shop.hygieneAvail <- df.hygiene$Shop.hygiene[which(df.hygiene$hygieneItemPrices != 
    0)]
hygieneItemAvail <- df.hygiene$hygieneItems[which(df.hygiene$hygieneItemPrices != 
    0)]
hygiene.Price <- df.hygiene$hygieneItemPrices[which(df.hygiene$hygieneItemPrices != 
    0)]
df.hygiene.Avail <- data.frame(Shop.hygieneAvail, hygieneItemAvail, hygiene.Price)


# Select shops to be used in the fuel item only data frame
Shop.fuel = Shop[(length(hygieneItems) + 1):(length(hygieneItems) + length(fuelItems))]
# Data Frame
df.fuel = data.frame(Shop.fuel, fuelItems, fuelItemPrices)

# Table of Shops with available fuel items
Shop.fuelAvail <- df.fuel$Shop.fuel[which(df.fuel$fuelItemPrices != 0)]
fuelItemsAvail <- df.fuel$fuelItems[which(df.fuel$fuelItemPrices != 0)]
fuel.Price <- df.fuel$fuelItemPrices[which(df.fuel$fuelItemPrices != 0)]
data.frame(Shop.fuelAvail, fuelItemsAvail, fuel.Price)

##   Shop.fuelAvail fuelItemsAvail fuel.Price
## 1             s1            gas       1900
## 2             s2            gas       2000
## 3             s4            gas       2000


# Select shops to be used in the winterization item only data frame
Shop.winterization = Shop[(length(fuelItems) + 1):(length(fuelItems) + length(winterizationItems))]
# Data Frame
df.winterization = data.frame(Shop.winterization, winterizationItems, winterizationItemPrices)
# Table of Shops with available winterization items
Shop.winterizationAvail <- df.winterization$Shop.winterization[which(df.winterization$winterizationItemPrices != 
    0)]
winterizationItemsAvail <- df.winterization$winterizationItems[which(df.winterization$winterizationItemPrices != 
    0)]
winterization.Price <- df.winterization$winterizationItemPrices[which(df.winterization$winterizationItemPrices != 
    0)]
data.frame(Shop.winterizationAvail, winterizationItemsAvail, winterization.Price)

##   Shop.winterizationAvail winterizationItemsAvail winterization.Price
## 1                      s7                 blanket               35000


# data frame of food grouped by shop size
ShopSize <- rep(y$section1.shop_size, 21)
food <- data.frame(ShopSize, df.food)

A list of shops s1 through s11 were created for illustrating the distribution of item prices of the different categories. These shops are fictional and do not exist; however, it is easier to illustrate the message and to discuss the information carried within these data.

From the above two tables, gas was only found in Shops s1, s2 and s4. The price of 1Kg of cooking gas ranges from LBP1900 in shop s1 to LBP2000 in shops s2 and s4. Thus, there is a very samll variability in the price of 1Kg of cooking gas among these shops. On the other hand, blankets are only found in shop s7 at a price of LBP35,000.

Exploratory Data Analysis and Distribution Analysis

Scatter plot of Item Prices

par(mfrow = c(2, 2), oma = c(3, 0, 1, 0), cex = 1.2)
plot(foodItemPrices, pch = 19, col = "green", ylab = "Food Item Prices (LBP)")
plot(hygieneItemPrices, pch = 19, col = "blue", ylab = "Hygiene Item Prices (LBP)")
plot(fuelItemPrices, pch = 19, col = "orange", ylab = "Fuel Item Prices (LBP)")
plot(winterizationItemPrices, pch = 19, ylab = "Winterization Item Prices (LBP)")
title("Item Prices", outer = TRUE)

plot of chunk unnamed-chunk-2

# mtext(side=1, 'Centered Subtitle', outer=TRUE)

Boxplots to study distibution of Food Item prices within and among Shops

par(mar = c(7, 6, 5, 5), mfrow = c(2, 1))
boxplot(df.food.Avail$food.Price ~ foodItemAvail, las = 2, col = "blue", main = "Distribution of Food Item Prices among Shops where food Items are Available", 
    xlab = "", ylab = "")
mtext(c("Food Items", "Food Prices"), side = c(1, 2), line = c(5, 4), cex = 1.4)
boxplot(df.food.Avail$food.Price ~ Shop.foodAvail, col = "blue", main = "Distribution of Available Food Item Prices within Each Shop", 
    xlab = "", ylab = "", las = 1)
mtext(c("Shops", "Food Prices"), side = c(1, 2), line = c(4, 4), cex = 1.4)

plot of chunk unnamed-chunk-3

The first figure shown above illustrates the distribution of available food items among shops. There does not seem to be a high variability in prices among shops, with the exception of oil, tomato paste and rice prices. The table below shows that the price of oil ranges from LBP6,500 to LBP13,000, the price of tomato paste ranges from LBP2,250 to LBP5,000 per 1.4Kg of canned tomato paste, and the price of rice ranges from 1,750 to 20,007 LBP. A price of 20,007 per 1 Kg of rice seems to be mistakenly entered. Therefor, the 20,007 is an outlier and may need to removed from the data set. From the second figure, we may conclude that the 20,007 was entered for Shop s6. Moreover, of all the food items, oil seems to be the most expensive. A summary statistics table for the prices of these items is shown below.

The second figure shown above illustrates the distribution of food prices within each of the 11 shops. As expected, the variability of prices in food items is somehow high, with the exception of Shop s3. Shop s3 carries only vegatable food items (lemon, lettuce, tomato, cucumber and cabbage) as shown in the table below.

Analyis of Oil, Tomato Paste and Rice Prices

oilPrices <- df.food.Avail$food.Price[df.food.Avail$foodItemAvail == "Oil"]
tomatoPastePrices <- df.food.Avail$food.Price[df.food.Avail$foodItemAvail == 
    "Tomato Paste"]
RicePrices <- df.food.Avail$food.Price[df.food.Avail$foodItemAvail == "Rice"]
OTPRPrices <- data.frame(oilPrices, tomatoPastePrices, RicePrices)
summary(OTPRPrices)

##    oilPrices     tomatoPastePrices   RicePrices   
##  Min.   : 6500   Min.   :2250      Min.   : 1750  
##  1st Qu.:11250   1st Qu.:2812      1st Qu.: 1750  
##  Median :12250   Median :3375      Median : 2000  
##  Mean   :11450   Mean   :3515      Mean   : 3701  
##  3rd Qu.:12500   3rd Qu.:4350      3rd Qu.: 2000  
##  Max.   :13000   Max.   :5000      Max.   :20007

levels(factor(df.food.Avail$foodItemAvail[df.food.Avail$Shop.foodAvail == "s3"]))

## [1] "Cabbage"  "Cucumber" "Lemon"    "Lettuce"  "Tomato"

Boxplots to study the distribution of Hygiene Item Prices within and among Shops

par(mar = c(7, 6, 5, 5), mfrow = c(2, 1))
boxplot(df.hygiene.Avail$hygiene.Price ~ df.hygiene.Avail$hygieneItemAvail, 
    las = 2, col = "blue", main = "Distribution of Hygiene Item Prices among Shops where these Items are Available", 
    xlab = "", ylab = "")
mtext(c("Hygiene Items", "Hygiene Prices"), side = c(1, 2), line = c(7, 4), 
    cex = 1.4)
boxplot(df.hygiene.Avail$hygiene.Price ~ df.hygiene.Avail$Shop.hygieneAvail, 
    col = "blue", main = "Distribution of Available Hygiene Item Prices within each Shop", 
    xlab = "", ylab = "", las = 1)
mtext(c("Shops", "Hygiene Prices"), side = c(1, 2), line = c(4, 4), cex = 1.4)

plot of chunk unnamed-chunk-5

The first figure shown above illustrates the distribution of available hygiene items among shops. There does not seem to be high variability among the prices of hygiene items with the exception of diapers prices, and which seems to be the most expensive. The price of diapers range from 11,000 to LBP22,500 as shown in the table below.

The second figure shown above illustrates the distribution of hygiene prices within each of the 11 shops. Shop s3 does not carry any hygiene items. Shops s1, s2, s5, s7, s8 and s11 carries diapers. This what explains the outliers shown in the figure. The table below shows the prices of diapers in each of the mentioned 6 shops.

Analysis of Diaper Prices among Shops

diapersPrices = df.hygiene.Avail$hygiene.Price[df.hygiene.Avail$hygieneItemAvail == 
    "diapers"]
diapersShops <- df.hygiene.Avail$Shop.hygieneAvail[which(df.hygiene.Avail$hygieneItemAvail == 
    "diapers")]
data.frame(diapersShops, diapersPrices)

##   diapersShops diapersPrices
## 1           s1         11000
## 2           s2         20000
## 3           s5         11000
## 4           s7         22500
## 5           s8         15000
## 6          s11         15000

Overall Food Item Prices Grouped by Shop Size

ShopSizeAvail <- food$ShopSize[which(food$foodItemPrices != 0)]
food.Avail <- data.frame(ShopSizeAvail, df.food.Avail)
boxplot(food.Avail$food.Price ~ food.Avail$ShopSize, main = "Food Prices grouped by Shop Size", 
    col = rainbow(2), ylab = "Prices", xlab = "Shop Size")

plot of chunk unnamed-chunk-7

# Confidence Interval
mean(food.Avail$food.Price) + c(-1, 1) * qt(0.975, length(food.Avail$food.Price) - 
    1) * sd(food.Avail$food.Price)/sqrt(length(food.Avail$food.Price))

## [1] 2502 3404

The above figure suggests that there is no significant differnce in food prices between medium and large sized shops. Moreover, the 95% confidence interval is relatively narrow (2502-3404 LBP).

Example of a Pie Chart

The pie chart below is just for illustration!

lbls <- shops.lemon
lePrices <- lemonPrices
lbls <- paste(lbls, ": ", sep = "", lePrices)
pie(lemonPrices, labels = lbls, col = rainbow(length(lbls)), main = "Shops with Lemons Available", 
    radius = 0.6)

plot of chunk unnamed-chunk-8