0
$\begingroup$

(I asked this question before, here)

I have the following data frame (dcity):

City Income_Group Number_of_Neigbourhoods Metropolis High 7 Metropolis Mid 6 Metropolis Low 12 Central City High 9 Central City Mid 14 Central City Low 18 Star City High 2 Star City Mid 7 Star City Low 21 Gotham High 9 Gotham Mid 11 Gotham Low 19 ... 

City: my data frame has about 100 different cities

Income_Group: only three types of income groups High, Mid, and Low

Number_of_Neigbourhoods: these are numeric values

I am using the following function to show my data

ggplot(dcity)+aes(Income_Group, Number_of_Neigbourhoods)+ geom_bar(stat="identity") 

[1] <–Looks like this

Is this the right way to showcase my data, namely a barplot?

(If possible)Is there a way to integrate the name of the cities? I tried this:

ggplot(dcity)+aes(Income_Group, Number_of_Neigbourhoods, fill= City)+ geom_bar(stat="identity") 

[1] <–Looks like this

It spat out the same graph as before, but rainbow coloured with the names of the cities, not exactly useful.

$\endgroup$
1
  • 1
    $\begingroup$ As you have abandoned your SO question, please delete it, as otherwise you are cross-posting and have pre-empted migration. "Please note, however, that cross-posting is not encouraged on SE sites. Choose one best location to post your question. Later, if it proves better suited on another site, it can be migrated." stats.stackexchange.com/help/on-topic $\endgroup$ Commented Oct 13, 2015 at 9:19

1 Answer 1

3
$\begingroup$

Your use of R I take to be incidental here, as the question is on what graphics would help. Conversely I am no kind of R expert and cannot offer advice on R code.

I think you are right that the rainbow graph does not help. As reproduced I can't even see the names of the cities, so at best it is "data art".

Without some detail on your precise goals in analysing these data, we are a little in the dark, but my suggestions are as follows. Similarly, the data are manifestly fictional, so insights from people knowing your area well are ruled out. Showing your real data would let you and us see what really works well (or poorly).

  1. In comparing cities of different sizes (presumably), the fractions proportions (0-1 scale) or percents (0-100 scale) of low, middle and high are much more interesting than the raw counts.

  2. Showing all the names of the cities is a natural impulse but 100 names or so is a challenge. A clever interactive graphic in which names pop up on a mouse over is within the state of the art. Labelling only some cities with extreme values is another option. My own rule of thumb is that about 30 names often work reasonably well, as far as showing names so that are all readable is concerned. Hence if there is a natural way to split the data into about 3 groups that might work reasonably. That might be a regional division, or it might be best to split the poorest, middle and richest thirds of the cities, according to some precise criterion.

  3. Many people will want to suggest a stacked (divided) bar chart here. Another possibility is a Cleveland dot chart. I also suggest a variant on that, which I call a two-way bar chart. Here is an example made using Stata. It needs more work: e.g. alphabetical order by city is a poor choice unless "look up" is an important role for your graph. It works best for up to about 20 cases.

enter image description here

  1. As the fractions of low, middle, high necessary add to unity, you have just two pieces of information there as (e.g.) low + middle = 1 - high and so two known fractions imply the third. Hence a triangular, ternary, trilinear, triaxial graph is a possibility (many other names exist). See e.g. this Wikipedia entry as a start. A more unusual graph would be to plot cities on a graph with

    (fraction high - fraction low) vs fraction middle

    or any other difference between fractions versus the third fraction. This graph (suggested in a talk I gave, although the materials require Stata to read: Stata users only) shares with a triangular graph the fact that it preserves the information in the three fractions. You could not comfortably show all the names on a non-interactive graph, but might need to select interesting cities only.

Note October 2015: I intend to write up this last idea in the next 12 months.

$\endgroup$

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.